本文对ICLR2022 Conference的论文进行分类汇总,主要聚焦于强化学习/博弈论/多智能体等领域。
Oral Presentations
Reinforcement Learning
Keywords: Reinforcement Learning Theory, Invariant Representation, Rich Observation Reinforcement Learning, Exogenous Noise, Inverse Dynamics
Keywords: unsupervised skill learning, reward-free RL, mutual information, DIAYN
One-sentence Summary: We show that mutual information skill learning is optimal in one sense but not optimal in another sense.
Keywords: reinforcement learning, observation space, out-of-distribution generalization, visuomotor control, robotics, manipulation
One-sentence Summary: Appropriately designing the observation space of a vision-based manipulator and regularizing its representations leads to clear gains in learning stability and out-of-distribution generalization.
Keywords: Agent Design, Morphology Optimization, Reinforcement Learning
One-sentence Summary: We learn a transform-and-control policy to both design and control an agent.
Keywords: meta-learning, meta-gradients, meta-reinforcement learning
One-sentence Summary: We propose an algorithm for meta-learning with gradients that bootstraps the meta-learner from itself or another update rule.
Game Theory
Multiagent
Spotlight Presentations
Reinforcement Learning
Keywords: reinforcement learning theory, deployment efficiency, linear MDP
One-sentence Summary: We propose a formal theoretical formulation for depolyment-efficient reinforcement learning; establish lower bounds for deployment complexity and study near-optimal deployment-efficient algorithms in linear MDP setting.
Keywords: Reinforcement Learning, Equivariance, Robotic Manipulation
One-sentence Summary: This paper proposes equivariant DQN and equivariant SAC that significantly improve the sample efficiency of RL in robotic manipulation.
Keywords: Reinforcement learning, representation learning
One-sentence Summary: We show that RL agents experience representation collapse in sparse reward environments and propose an auxiliary task that prevents this from happening and outperforms the state of the art on the Atari benchmark.
Keywords: Reinforcement Learning, Contrastive Learning, Representation Learning, Transformer, Deep Reinforcement Learning
One-sentence Summary: A new loss and an improved architecture to efficiently train attentional models in reinforcement learning.
Keywords: Reinforcement Learning, Programmatic Reinforcement Learning, Compositional Reinforcement Learning, Program Synthesis, Differentiable Architecture Search
One-sentence Summary: We present a differentiable program architecture search framework to synthesize interpretable, generalizable, and compositional programs for controlling reinforcement learning applications.
Keywords: Reinforcement Learning, Sparsity, Pruning, Lottery Ticket Hypothesis
One-sentence Summary: We investigate the mechanisms underlying the lottery ticket effect in Deep RL and show that the derived mask extracts minimal task representations.
Keywords: reinforcement learning, altruistic behavior in AI, multi-agent systems
One-sentence Summary: We propose and investigate unsupervised training of agents to behave altruistically towards others by actively maximizing others' choice.
Keywords: Reinforcement Learning, Sparse Rewards, Learning from Demonstrations
One-sentence Summary: Reinforcement learning in sparse reward environments using offline guidance.
Keywords: Model-Based Reinforcement Learning, Offline Reinforcement Learning, Uncertainty Quantification
Keywords: Deep reinforcement learning, uncertainty estimation, inverse-variance, heteroscedastic
One-sentence Summary: The sample efficiency and performance of model-free DRL is improved by estimating the predictive uncertainty of the targets using probabilistic ensembles and down-weighting the uncertain samples using batch inverse-variance weighting.
Keywords: Reinforcement Learning, Long-Term Credit Assignment, Reward Redistribution, Return Decomposition
One-sentence Summary: We propose randomized return decomposition, a novel reward redistribution algorithm, which establishes a surrogate optimization problem to scale up learning in long-horizon tasks.
Keywords: Pessimistic Bootstrapping, Bootstrapped Q-functions, Uncertainty Estimation, Offline Reinforcement Learning
One-sentence Summary: We propose pessimistic bootstrapping as a purely uncertainty-driven algorithm for offline Reinforcement Learning.
Keywords: model-based reinforcement learning, reinforcment learning, objective mismatch, value function, sensitivity
One-sentence Summary: We propose the Value-gradient weighted Model loss, a method for value-aware model learning in challenging settings, such as small model capacity and the presence of distracting state dimensions.
Keywords: Offline Reinforcement Learning, Offline Constrained Reinforcement Learning, Stationary Distribution Correction Estimation
One-sentence Summary: We present an offline constrained RL algorithm, which estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound.
Keywords: Transfer RL, Graphical models, Efficient adaptation
One-sentence Summary: Efficient policy adaptation across domains by learning a parsimonious graphical representation that encodes changes in a compact way.
Keywords: Q-learning, offline RL, regularization
One-sentence Summary: We show that implicit regularization effects can lead to poor performance in value-based offline RL and propose an explicit regularizer to mitigate these effects.
One-sentence Summary: Temporally coordinated exploration in reinforcement learning using Generative Planning Method.
Keywords: AlphaZero, MuZero, reinforcement learning
One-sentence Summary: We redesign AlphaZero to keep improving even when training with a small number of simulations.
Keywords: Reinforcement learning, Constrained Markov decision processes, Constrained policy optimization, Bayesian model-based RL
One-sentence Summary: Solving constrained Markov decision processes with Bayesian model-based reinforcement learning.
Keywords: RL, HRL, reinforcement learning, hierarchical reinforcement learning, affordances, hierarchical affordances
One-sentence Summary: We introduce a method that achieves superior performance in complex hierarchical tasks by utilizing a notion of subtask dependency grounded in the present state.
Keywords: exploration, mode-switching, reinforcement learning, Atari
One-sentence Summary: A fresh look at the question of *when* to switch into exploration mode, and for how long.
Keywords: reinforcement learning, language understanding, text-based games
One-sentence Summary: We propose a multi-stage approach to playing text games that improves the score on Zork1 from around 40 to 103.
Keywords: Reinforcement Learning, Robotics, Locomotion Control, Multi-Modal Transformer
One-sentence Summary: We introduce a novel end-to-end Reinforcement Learning approach called LocoTransformer, leveraging both visual inputs and proprioceptive states, for locomotion control in both simulation and with real robots.
Keywords: Hindsight Information Matching, Decision Transformer, State-Marginal Matching, Hindsight Experience Replay, Reinforcement Learning
One-sentence Summary: We generalize hindsight algorithms in RL, and propose Distributional Decision Transformer for information matching.
Keywords: systematicity, graph reasoning
Keywords: model-based reinforcement learning, deep reinforcement learning, tree based search, MCTS
Keywords: domain randomization, sim-to-real transfer, learning theory
One-sentence Summary: We propose theoretical frameworks for sim-to-real transfer and domain randomization, and provide bounds on the sub-optimality gap of the policy returned by domain randomization.
Keywords: reward-free exploration, model-based reinforcement learning, learning theory
One-sentence Summary: We propose near-optimal exploration algorithms for reward-free exploration with plug-in solver.
Keywords: Reward Learning, Inverse Reinforcement Learning, Reinforcement Learning, Comparing Reward Functions
One-sentence Summary: We propose a method for quantifying the similarity of learned reward functions without performing policy learning and evaluation.
Keywords: Robotics, Reinforcement Learning, Hierarchical, Latent Variable Models, Skills, Transfer
One-sentence Summary: An approach to learn reusable and transferable skills from data via a hierarchical latent mixture policy, which can significantly improve sample efficiency and asymptotic performance on downstream RL tasks
Game Theory
Multiagent
Keywords: Multi-agent Reinforcement Learning, Predictive State Representation, Dynamic Interaction Graph
One-sentence Summary: We propose a new algorithm for MARL under a multi-agent predictive state representation model, where we incorporate a dynamic interaction graph; we provide the theoretical guarantees of our model and run various experiments to support our algorithm.
Keywords: reinforcement learning, altruistic behavior in AI, multi-agent systems
One-sentence Summary: We propose and investigate unsupervised training of agents to behave altruistically towards others by actively maximizing others' choice.
Keywords: Multi-agent reinforcement learning, Sparse coordination graphs, Deep coordination graph
One-sentence Summary: We propose a novel method for learning sparse coordination graphs that can be theoretically justified and can significantly reduce communication overhead and improve learning performance of deep coordination graphs.
Keywords: trajectory prediction, motion forecasting, transformers, latent variable models
One-sentence Summary: New Transformer-based architecture for socially consistent motion forecasting. Achieves SotA performance on NuScenes at a fraction of the compute of competing methods.
Keywords: emergent communication, multi-agent reinforcement learning, representation learning
One-sentence Summary: This work argues the importance of scaling up the emergent communication framework and investigates the impact of three scaling up aspects, namely the dataset, task complexity, and population size.
Keywords: model-based reinforcement learning, deep reinforcement learning, tree based search, MCTS
Keywords: Contextual Bandits, Exploration Strategy, Neural Networks
Poster Presentations
Reinforcement Learning
Keywords: reinforcement learning theory, markov decision process theory
Keywords: Reinforcement learning Theory, Offline reinforcement learning, PAC Bounds
One-sentence Summary: We study model-based offline Reinforcement Learning with general function approximation without a full coverage assumption on the offline data distribution.
Keywords: bandits, lower bound, reinforcement learning theory
One-sentence Summary: We give general framework that turns upper and lower bounds in non-conservative settings to bounds in conservative settings.
Keywords: reinforcement learning, autonomous, reset-free reinforcement learning, continual reinforcement learning
Keywords: reinforcement learning, curriculum learning, boosting, residual learning
One-sentence Summary: A novel approach for curriculum RL that increases the representativeness of the functional space as new, increasingly complex, tasks from the curriculum are presented to the agent.
Keywords: reinforcement learning, imitation learning, Markov Decision Process, continuous control
One-sentence Summary: For deterministic experts, you can do imitation learning by calling an RL solver once, with a stationary reward signal.
Keywords: Reinforcement learning, Generalization, Regularization
One-sentence Summary: We propose a simple yet effective layer increasing the generalization abilities of reinforcement learning agents
Keywords: Reinforcement learning
One-sentence Summary: We propose a doubly (sample and computationally) efficient RL method (Dr.Q) in which a small ensemble of dropout Q-functions is used.
Keywords: Reinforcement Learning, Meta-Learning
One-sentence Summary: We present HFR, a relabeling method that can be applied to meta-reinforcement learning to boost sample efficiency and performance.
Keywords: Reinforcement Learning, Value Mapping, Reward Decomposition
One-sentence Summary: We present a general convergent class of RL algorithms based on combining arbitrary value mappings and reward decomposition.
Keywords: Model-based reinforcement learning, reinforcement learning, model learning
One-sentence Summary: We combine real-world data and a learned model for data-efficient reinforcement learning with reduced model-bias.
Keywords: Reinforcement Learning, Offline Learning, Episodic Memory Control
One-sentence Summary: We propose a new offline RL method which uses expectile value learning and memory-based planning.
Keywords: lifelong learning, continual learning, reinforcement learning, composition, modularity, compositionality
One-sentence Summary: We explore the problem of lifelong RL of functionally composable knowledge, and develop an algorithm that demonstrates zero-shot and forward transfer, avoidance of forgetting, and backward transfer in discrete 2-D and robotic manipulation domains.
Keywords: reinforcement learning, varying action space, relational reasoning
One-sentence Summary: Learning action interdependence for reinforcement learning under a varying action space.
Keywords: exploration, reinforcement learning
One-sentence Summary: We design a practical randomized exploration method to address the sample efficiency issue in online reinforcement learning.
Keywords: Reinforcement learning, Quality-Diversity, Evolutionary algorithms
One-sentence Summary: We propose EDO-CS, a new Evolutionary Diversity Optimization algorithm with Clustering-based Selection that can achieve a set of policies with both high quality and diversity efficiently.
Keywords: Reinforcement Learning, Provable Adversarial Robustness, Randomized Smoothing
One-sentence Summary: A provable adversarial robustness technique for reinforcement learning.
Keywords: variational Bayes, oracle guiding, reinforcement learning, decision making, probabilistic modeling, game, Mahjong
One-sentence Summary: We propose a variational Bayes framework leveraging oracle (hindsight) information available in training to improve deep reinforcement learnin
Keywords: Ensemble Based Reinforcement Learning, Ensemble Diversity
One-sentence Summary: Maximizing diversity in neural network improves performance ensemble based reinforcement learning
Keywords: certified robustness, poisoning attacks, reinforcement learning
One-sentence Summary: We propose the first framework for certifiying robustness of offline reinforcement learning against poisoning attacks.
Keywords: Reinforcement Learning, Lifelong learning, Multi task learning, Transfer learning, Logical composition, Deep Reinforcement Learning
One-sentence Summary: A framework with theoretical guarantees for an agent to quickly generalize over a task space by autonomously determining whether a new task can be solved zero-shot using existing skills, or whether a task-specific skill should be learned few-shot.
Keywords: imitation learning, reinforcement learning, expert data, hidden confounding, causal inference, covariate shift
One-sentence Summary: We use expert data with unobserved confounders for both imitation and reinforcement learning. Such hidden confounding is prone to a shifted distribution, which may severely hurt performance unless accounted for.
Keywords: Synthetic Environments, Synthetic Data, Meta-Learning, Reinforcement Learning, Evolution Strategies, Reward Shaping
One-sentence Summary: We propose an evolution-based approach to meta-learn synthetic neural environments and reward neural networks for reinforcement learning.
Keywords: reinforcement learning theory, multi-agent RL, Markov games, general-sum games
One-sentence Summary: We present new algorithms for several learning goals in multi-player general-sum Markov games, with mild PAC sample complexity in terms of the number of players.
Keywords: meta-RL, meta-reinforcement learning, skill-based meta-reinforcement learning, meta-learning, skill-based RL
Keywords: Representation learning, model-based reinforcement learning
One-sentence Summary: We introduce Learning via Retracing, a novel self-supervised framework based on temporal cycle-consistency assumption of the transition dynamics, for improved learning of the representation (and the dynamics model) in RL tasks.
Keywords: Multitask Reinforcement Learning, Modular Reinforcement Learning, Transfer Learning, Transformer, Structural Embedding
One-sentence Summary: We present a modular Multi-task Reinforcement Learning method for inhomogeneous control tasks incorporating structural embedding of morphology.
Keywords: reinforcement learning, convergence of reinforcement learning algorithm, monte carlo exploring starts
One-sentence Summary: We prove that the Monte Carlo Exploring Starts algorithm converges for optimal policy feed-forward MDPs.
Keywords: Reinforcement Learning
One-sentence Summary: Cotinual meta-reinforcement learning accelerates task learning, via repeated meta off-policy search.
Keywords: Distributional RL
Keywords: Deep Reinforcement Learning, Offline Reinforcement Learning, Batch Reinforcement Learning, Continuous Control
One-sentence Summary: Offline RL method with only dataset actions.
Keywords: Multi-Agent Reinforcement Learning, trust-region method, policy gradient method
One-sentence Summary: This paper introduces the first trust region method for multi-agent reinforcement learning that enjoys theoretically-justified monotonic improvement guarantee and demonstrates the state-of-the-art performance on Mujoco benchmarks.
Keywords: Image-based RL, Data augmentation in RL, Continuous Control
One-sentence Summary: We proposed a model-free off-policy algorithm for image-based continuous control that significantly outperforms previous methods both in sample and time complexity.
Keywords: deep reinforcement learning, deep learning, representation learning
Keywords: multi-agent, reinforcement learning, intrinsic rewards, exploration
Keywords: Deep Reinforcement Learning, Goal-oriented Reinforcement Learning, Graph Structure, Exploration
One-sentence Summary: In this paper, we propose G2RL, a new goal-oriented RL that leverages the state-transition graph for effective exploration and efficient training.
Keywords: context-dependent Reinforcement Learning, model-based reinforcement learning, hierarchical Dirichlet process
Keywords: task-oriented dialogue, pre-trained language model, offline reinforcement learning
Keywords: reinforcement learning, acquisition function, information gain
One-sentence Summary: We draw a connection between Bayesian Optimal Experiment Design and RL to develop an acquisition function to guide data collection in model based RL leading to improved sample efficiency.
Keywords: offline RL
One-sentence Summary: Characterization of scenarios where offline reinforcement learning outperforms behavioral cloning
Keywords: offline reinforcement learning, model-based reinforcement learning, behavior policy, Meta-reinforcement learning
One-sentence Summary: This paper proposes a novel offline Meta-RL algorithm with regularization, which has provable performance improvement and outperforms the existing baselines empirically.
Keywords: representations, out-of-distribution, generalization, deep learning, reinforcement learning
One-sentence Summary: We study the role of pretrained representations for the out-of-distribution generalization of RL agents.
Keywords: reinforcement learning, deep reinforcement learning, offline reinforcement learning
One-sentence Summary: Experimentally evaluating when and why supervised learning solves offline RL
Keywords: Reinforcement Learning, Hierarchical Reinforcement Learning, Inverse Reinforcement Learning
One-sentence Summary: Training transition policies via distribution matching