【ICLR2022】强化学习/博弈论/多智能体顶会论文汇总

实验室官方助手

声明：本文转载自知乎: 【ICLR2022】强化学习/博弈论/多智能体顶会论文汇总, 仅供学习交流

作者： Lil Morty

本文对ICLR2022 Conference的论文进行分类汇总，主要聚焦于强化学习/博弈论/多智能体等领域。

Oral Presentations

Reinforcement Learning

Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics

Keywords: Reinforcement Learning Theory, Invariant Representation, Rich Observation Reinforcement Learning, Exogenous Noise, Inverse Dynamics

The Information Geometry of Unsupervised Reinforcement Learning

Keywords: unsupervised skill learning, reward-free RL, mutual information, DIAYN
One-sentence Summary: We show that mutual information skill learning is optimal in one sense but not optimal in another sense.

Vision-Based Manipulators Need to Also See from Their Hands

Keywords: reinforcement learning, observation space, out-of-distribution generalization, visuomotor control, robotics, manipulation
One-sentence Summary: Appropriately designing the observation space of a vision-based manipulator and regularizing its representations leads to clear gains in learning stability and out-of-distribution generalization.

Transform2Act: Learning a Transform-and-Control Policy for Efficient Agent Design

Keywords: Agent Design, Morphology Optimization, Reinforcement Learning
One-sentence Summary: We learn a transform-and-control policy to both design and control an agent.

Bootstrapped Meta-Learning

Keywords: meta-learning, meta-gradients, meta-reinforcement learning
One-sentence Summary: We propose an algorithm for meta-learning with gradients that bootstraps the meta-learner from itself or another update rule.

Game Theory

Multiagent

Spotlight Presentations

Reinforcement Learning

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

Keywords: reinforcement learning theory, deployment efficiency, linear MDP
One-sentence Summary: We propose a formal theoretical formulation for depolyment-efficient reinforcement learning; establish lower bounds for deployment complexity and study near-optimal deployment-efficient algorithms in linear MDP setting.

SO(2)-Equivariant Reinforcement Learning

Keywords: Reinforcement Learning, Equivariance, Robotic Manipulation
One-sentence Summary: This paper proposes equivariant DQN and equivariant SAC that significantly improve the sample efficiency of RL in robotic manipulation.

Understanding and Preventing Capacity Loss in Reinforcement Learning

Keywords: Reinforcement learning, representation learning
One-sentence Summary: We show that RL agents experience representation collapse in sparse reward environments and propose an auxiliary task that prevents this from happening and outperforms the state of the art on the Atari benchmark.

CoBERL: Contrastive BERT for Reinforcement Learning

Keywords: Reinforcement Learning, Contrastive Learning, Representation Learning, Transformer, Deep Reinforcement Learning
One-sentence Summary: A new loss and an improved architecture to efficiently train attentional models in reinforcement learning.

Programmatic Reinforcement Learning without Oracles

Keywords: Reinforcement Learning, Programmatic Reinforcement Learning, Compositional Reinforcement Learning, Program Synthesis, Differentiable Architecture Search
One-sentence Summary: We present a differentiable program architecture search framework to synthesize interpretable, generalizable, and compositional programs for controlling reinforcement learning applications.

On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning

Keywords: Reinforcement Learning, Sparsity, Pruning, Lottery Ticket Hypothesis
One-sentence Summary: We investigate the mechanisms underlying the lottery ticket effect in Deep RL and show that the derived mask extracts minimal task representations.

Learning Altruistic Behaviours in Reinforcement Learning without External Rewards

Keywords: reinforcement learning, altruistic behavior in AI, multi-agent systems
One-sentence Summary: We propose and investigate unsupervised training of agents to behave altruistically towards others by actively maximizing others' choice.

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Keywords: Reinforcement Learning, Sparse Rewards, Learning from Demonstrations
One-sentence Summary: Reinforcement learning in sparse reward environments using offline guidance.

Revisiting Design Choices in Offline Model Based Reinforcement Learning

Keywords: Model-Based Reinforcement Learning, Offline Reinforcement Learning, Uncertainty Quantification

Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation

Keywords: Deep reinforcement learning, uncertainty estimation, inverse-variance, heteroscedastic
One-sentence Summary: The sample efficiency and performance of model-free DRL is improved by estimating the predictive uncertainty of the targets using probabilistic ensembles and down-weighting the uncertain samples using batch inverse-variance weighting.

Learning Long-Term Reward Redistribution via Randomized Return Decomposition

Keywords: Reinforcement Learning, Long-Term Credit Assignment, Reward Redistribution, Return Decomposition
One-sentence Summary: We propose randomized return decomposition, a novel reward redistribution algorithm, which establishes a surrogate optimization problem to scale up learning in long-horizon tasks.

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Keywords: Pessimistic Bootstrapping, Bootstrapped Q-functions, Uncertainty Estimation, Offline Reinforcement Learning
One-sentence Summary: We propose pessimistic bootstrapping as a purely uncertainty-driven algorithm for offline Reinforcement Learning.

Value Gradient weighted Model-Based Reinforcement Learning

Keywords: model-based reinforcement learning, reinforcment learning, objective mismatch, value function, sensitivity
One-sentence Summary: We propose the Value-gradient weighted Model loss, a method for value-aware model learning in challenging settings, such as small model capacity and the presence of distracting state dimensions.

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

Keywords: Offline Reinforcement Learning, Offline Constrained Reinforcement Learning, Stationary Distribution Correction Estimation
One-sentence Summary: We present an offline constrained RL algorithm, which estimates the stationary distribution corrections of the optimal policy with respect to returns, while constraining the cost upper bound.

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

Keywords: Transfer RL, Graphical models, Efficient adaptation
One-sentence Summary: Efficient policy adaptation across domains by learning a parsimonious graphical representation that encodes changes in a compact way.

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

Keywords: Q-learning, offline RL, regularization
One-sentence Summary: We show that implicit regularization effects can lead to poor performance in value-based offline RL and propose an explicit regularizer to mitigate these effects.

Generative Planning for Temporally Coordinated Exploration in Reinforcement Learning

One-sentence Summary: Temporally coordinated exploration in reinforcement learning using Generative Planning Method.

Policy improvement by planning with Gumbel

Keywords: AlphaZero, MuZero, reinforcement learning
One-sentence Summary: We redesign AlphaZero to keep improving even when training with a small number of simulations.

Constrained Policy Optimization via Bayesian World Models

Keywords: Reinforcement learning, Constrained Markov decision processes, Constrained policy optimization, Bayesian model-based RL
One-sentence Summary: Solving constrained Markov decision processes with Bayesian model-based reinforcement learning.

Possibility Before Utility: Learning And Using Hierarchical Affordances

Keywords: RL, HRL, reinforcement learning, hierarchical reinforcement learning, affordances, hierarchical affordances
One-sentence Summary: We introduce a method that achieves superior performance in complex hierarchical tasks by utilizing a notion of subtask dependency grounded in the present state.

When should agents explore?

Keywords: exploration, mode-switching, reinforcement learning, Atari
One-sentence Summary: A fresh look at the question of *when* to switch into exploration mode, and for how long.

Multi-Stage Episodic Control for Strategic Exploration in Text Games

Keywords: reinforcement learning, language understanding, text-based games
One-sentence Summary: We propose a multi-stage approach to playing text games that improves the score on Zork1 from around 40 to 103.

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers

Keywords: Reinforcement Learning, Robotics, Locomotion Control, Multi-Modal Transformer
One-sentence Summary: We introduce a novel end-to-end Reinforcement Learning approach called LocoTransformer, leveraging both visual inputs and proprioceptive states, for locomotion control in both simulation and with real robots.

Distributional Decision Transformer for Hindsight Information Matching

Keywords: Hindsight Information Matching, Decision Transformer, State-Marginal Matching, Hindsight Experience Replay, Reinforcement Learning
One-sentence Summary: We generalize hindsight algorithms in RL, and propose Distributional Decision Transformer for information matching.

Keywords: systematicity, graph reasoning

Planning in Stochastic Environments with a Learned Model

Keywords: model-based reinforcement learning, deep reinforcement learning, tree based search, MCTS

Understanding Domain Randomization for Sim-to-real Transfer

Keywords: domain randomization, sim-to-real transfer, learning theory
One-sentence Summary: We propose theoretical frameworks for sim-to-real transfer and domain randomization, and provide bounds on the sub-optimality gap of the policy returned by domain randomization.

Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver

Keywords: reward-free exploration, model-based reinforcement learning, learning theory
One-sentence Summary: We propose near-optimal exploration algorithms for reward-free exploration with plug-in solver.

Dynamics-Aware Comparison of Learned Reward Functions

Keywords: Reward Learning, Inverse Reinforcement Learning, Reinforcement Learning, Comparing Reward Functions
One-sentence Summary: We propose a method for quantifying the similarity of learned reward functions without performing policy learning and evaluation.

Learning transferable motor skills with hierarchical latent mixture policies

Keywords: Robotics, Reinforcement Learning, Hierarchical, Latent Variable Models, Skills, Transfer
One-sentence Summary: An approach to learn reusable and transferable skills from data via a hierarchical latent mixture policy, which can significantly improve sample efficiency and asymptotic performance on downstream RL tasks

Game Theory

Multiagent

Keywords: Multi-agent Reinforcement Learning, Predictive State Representation, Dynamic Interaction Graph
One-sentence Summary: We propose a new algorithm for MARL under a multi-agent predictive state representation model, where we incorporate a dynamic interaction graph; we provide the theoretical guarantees of our model and run various experiments to support our algorithm.

Learning Altruistic Behaviours in Reinforcement Learning without External Rewards

Keywords: reinforcement learning, altruistic behavior in AI, multi-agent systems
One-sentence Summary: We propose and investigate unsupervised training of agents to behave altruistically towards others by actively maximizing others' choice.

Context-Aware Sparse Deep Coordination Graphs

Keywords: Multi-agent reinforcement learning, Sparse coordination graphs, Deep coordination graph
One-sentence Summary: We propose a novel method for learning sparse coordination graphs that can be theoretically justified and can significantly reduce communication overhead and improve learning performance of deep coordination graphs.

Latent Variable Sequential Set Transformers for Joint Multi-Agent Motion Prediction

Keywords: trajectory prediction, motion forecasting, transformers, latent variable models
One-sentence Summary: New Transformer-based architecture for socially consistent motion forecasting. Achieves SotA performance on NuScenes at a fraction of the compute of competing methods.

Emergent Communication at Scale

Keywords: emergent communication, multi-agent reinforcement learning, representation learning
One-sentence Summary: This work argues the importance of scaling up the emergent communication framework and investigates the impact of three scaling up aspects, namely the dataset, task complexity, and population size.

Planning in Stochastic Environments with a Learned Model

Keywords: model-based reinforcement learning, deep reinforcement learning, tree based search, MCTS

EE-Net: Exploitation-Exploration Neural Networks in Contextual Bandits

Keywords: Contextual Bandits, Exploration Strategy, Neural Networks

Poster Presentations

Reinforcement Learning

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Keywords: reinforcement learning theory, markov decision process theory

Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

Keywords: Reinforcement learning Theory, Offline reinforcement learning, PAC Bounds
One-sentence Summary: We study model-based offline Reinforcement Learning with general function approximation without a full coverage assumption on the offline data distribution.

A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning

Keywords: bandits, lower bound, reinforcement learning theory
One-sentence Summary: We give general framework that turns upper and lower bounds in non-conservative settings to bounds in conservative settings.

Autonomous Reinforcement Learning: Formalism and Benchmarking

Keywords: reinforcement learning, autonomous, reset-free reinforcement learning, continual reinforcement learning

Boosted Curriculum Reinforcement Learning

Keywords: reinforcement learning, curriculum learning, boosting, residual learning
One-sentence Summary: A novel approach for curriculum RL that increases the representativeness of the functional space as new, increasingly complex, tasks from the curriculum are presented to the agent.

Imitation Learning by Reinforcement Learning

Keywords: reinforcement learning, imitation learning, Markov Decision Process, continuous control
One-sentence Summary: For deterministic experts, you can do imitation learning by calling an RL solver once, with a stationary reward signal.

Local Feature Swapping for Generalization in Reinforcement Learning

Keywords: Reinforcement learning, Generalization, Regularization
One-sentence Summary: We propose a simple yet effective layer increasing the generalization abilities of reinforcement learning agents

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Keywords: Reinforcement learning
One-sentence Summary: We propose a doubly (sample and computationally) efficient RL method (Dr.Q) in which a small ensemble of dropout Q-functions is used.

Hindsight Foresight Relabeling for Meta-Reinforcement Learning

Keywords: Reinforcement Learning, Meta-Learning
One-sentence Summary: We present HFR, a relabeling method that can be applied to meta-reinforcement learning to boost sample efficiency and performance.

Orchestrated Value Mapping for Reinforcement Learning

Keywords: Reinforcement Learning, Value Mapping, Reward Decomposition
One-sentence Summary: We present a general convergent class of RL algorithms based on combining arbitrary value mappings and reward decomposition.

On-Policy Model Errors in Reinforcement Learning

Keywords: Model-based reinforcement learning, reinforcement learning, model learning
One-sentence Summary: We combine real-world data and a learned model for data-efficient reinforcement learning with reduced model-bias.

Offline Reinforcement Learning with Value-based Episodic Memory

Keywords: Reinforcement Learning, Offline Learning, Episodic Memory Control
One-sentence Summary: We propose a new offline RL method which uses expectile value learning and memory-based planning.

Modular Lifelong Reinforcement Learning via Neural Composition

Keywords: lifelong learning, continual learning, reinforcement learning, composition, modularity, compositionality
One-sentence Summary: We explore the problem of lifelong RL of functionally composable knowledge, and develop an algorithm that demonstrates zero-shot and forward transfer, avoidance of forgetting, and backward transfer in discrete 2-D and robotic manipulation domains.

Know Your Action Set: Learning Action Relations for Reinforcement Learning

Keywords: reinforcement learning, varying action space, relational reasoning
One-sentence Summary: Learning action interdependence for reinforcement learning under a varying action space.

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning

Keywords: exploration, reinforcement learning
One-sentence Summary: We design a practical randomized exploration method to address the sample efficiency issue in online reinforcement learning.

Evolutionary Diversity Optimization with Clustering-based Selection for Reinforcement Learning

Keywords: Reinforcement learning, Quality-Diversity, Evolutionary algorithms
One-sentence Summary: We propose EDO-CS, a new Evolutionary Diversity Optimization algorithm with Clustering-based Selection that can achieve a set of policies with both high quality and diversity efficiently.

Policy Smoothing for Provably Robust Reinforcement Learning

Keywords: Reinforcement Learning, Provable Adversarial Robustness, Randomized Smoothing
One-sentence Summary: A provable adversarial robustness technique for reinforcement learning.

Variational oracle guiding for reinforcement learning

Keywords: variational Bayes, oracle guiding, reinforcement learning, decision making, probabilistic modeling, game, Mahjong
One-sentence Summary: We propose a variational Bayes framework leveraging oracle (hindsight) information available in training to improve deep reinforcement learnin

Maximizing Ensemble Diversity in Deep Reinforcement Learning

Keywords: Ensemble Based Reinforcement Learning, Ensemble Diversity
One-sentence Summary: Maximizing diversity in neural network improves performance ensemble based reinforcement learning

COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks

Keywords: certified robustness, poisoning attacks, reinforcement learning
One-sentence Summary: We propose the first framework for certifiying robustness of offline reinforcement learning against poisoning attacks.

Generalisation in Lifelong Reinforcement Learning through Logical Composition

Keywords: Reinforcement Learning, Lifelong learning, Multi task learning, Transfer learning, Logical composition, Deep Reinforcement Learning
One-sentence Summary: A framework with theoretical guarantees for an agent to quickly generalize over a task space by autonomously determining whether a new task can be solved zero-shot using existing skills, or whether a task-specific skill should be learned few-shot.

On Covariate Shift of Latent Confounders in Imitation and Reinforcement Learning

Keywords: imitation learning, reinforcement learning, expert data, hidden confounding, causal inference, covariate shift
One-sentence Summary: We use expert data with unobserved confounders for both imitation and reinforcement learning. Such hidden confounding is prone to a shifted distribution, which may severely hurt performance unless accounted for.

Learning Synthetic Environments and Reward Networks for Reinforcement Learning

Keywords: Synthetic Environments, Synthetic Data, Meta-Learning, Reinforcement Learning, Evolution Strategies, Reward Shaping
One-sentence Summary: We propose an evolution-based approach to meta-learn synthetic neural environments and reward neural networks for reinforcement learning.

When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

Keywords: reinforcement learning theory, multi-agent RL, Markov games, general-sum games
One-sentence Summary: We present new algorithms for several learning goals in multi-player general-sum Markov games, with mild PAC sample complexity in terms of the number of players.

Skill-based Meta-Reinforcement Learning

Keywords: meta-RL, meta-reinforcement learning, skill-based meta-reinforcement learning, meta-learning, skill-based RL

Learning State Representations via Retracing in Reinforcement Learning

Keywords: Representation learning, model-based reinforcement learning
One-sentence Summary: We introduce Learning via Retracing, a novel self-supervised framework based on temporal cycle-consistency assumption of the transition dynamics, for improved learning of the representation (and the dynamics model) in RL tasks.

Structure-Aware Transformer Policy for Inhomogeneous Multi-Task Reinforcement Learning

Keywords: Multitask Reinforcement Learning, Modular Reinforcement Learning, Transfer Learning, Transformer, Structural Embedding
One-sentence Summary: We present a modular Multi-task Reinforcement Learning method for inhomogeneous control tasks incorporating structural embedding of morphology.

On the Convergence of the Monte Carlo Exploring Starts Algorithm for Reinforcement Learning

Keywords: reinforcement learning, convergence of reinforcement learning algorithm, monte carlo exploring starts
One-sentence Summary: We prove that the Monte Carlo Exploring Starts algorithm converges for optimal policy feed-forward MDPs.

CoMPS: Continual Meta Policy Search

Keywords: Reinforcement Learning
One-sentence Summary: Cotinual meta-reinforcement learning accelerates task learning, via repeated meta off-policy search.

Distributional Reinforcement Learning with Monotonic Splines

Keywords: Distributional RL

Offline Reinforcement Learning with In-sample Q-Learning

Keywords: Deep Reinforcement Learning, Offline Reinforcement Learning, Batch Reinforcement Learning, Continuous Control
One-sentence Summary: Offline RL method with only dataset actions.

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

Keywords: Multi-Agent Reinforcement Learning, trust-region method, policy gradient method
One-sentence Summary: This paper introduces the first trust region method for multi-agent reinforcement learning that enjoys theoretically-justified monotonic improvement guarantee and demonstrates the state-of-the-art performance on Mujoco benchmarks.

Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Keywords: Image-based RL, Data augmentation in RL, Continuous Control
One-sentence Summary: We proposed a model-free off-policy algorithm for image-based continuous control that significantly outperforms previous methods both in sample and time complexity.

Learning Generalizable Representations for Reinforcement Learning via Adaptive Meta-learner of Behavioral Similarities

Keywords: deep reinforcement learning, deep learning, representation learning

LIGS: Learnable Intrinsic-Reward Generation Selection for Multi-Agent Learning

Keywords: multi-agent, reinforcement learning, intrinsic rewards, exploration

Graph-Enhanced Exploration for Goal-oriented Reinforcement Learning

Keywords: Deep Reinforcement Learning, Goal-oriented Reinforcement Learning, Graph Structure, Exploration
One-sentence Summary: In this paper, we propose G2RL, a new goal-oriented RL that leverages the state-transition graph for effective exploration and efficient training.

Reinforcement Learning in Presence of Discrete Markovian Context Evolution

Keywords: context-dependent Reinforcement Learning, model-based reinforcement learning, hierarchical Dirichlet process

Offline Reinforcement Learning for Large Scale Language Action Spaces

Keywords: task-oriented dialogue, pre-trained language model, offline reinforcement learning

Keywords: reinforcement learning, acquisition function, information gain
One-sentence Summary: We draw a connection between Bayesian Optimal Experiment Design and RL to develop an acquisition function to guide data collection in model based RL leading to improved sample efficiency.

Keywords: offline RL
One-sentence Summary: Characterization of scenarios where offline reinforcement learning outperforms behavioral cloning

Model-Based Offline Meta-Reinforcement Learning with Regularization

Keywords: offline reinforcement learning, model-based reinforcement learning, behavior policy, Meta-reinforcement learning
One-sentence Summary: This paper proposes a novel offline Meta-RL algorithm with regularization, which has provable performance improvement and outperforms the existing baselines empirically.

The Role of Pretrained Representations for the OOD Generalization of RL Agents

Keywords: representations, out-of-distribution, generalization, deep learning, reinforcement learning
One-sentence Summary: We study the role of pretrained representations for the out-of-distribution generalization of RL agents.

The Essential Elements of Offline RL via Supervised Learning

Keywords: reinforcement learning, deep reinforcement learning, offline reinforcement learning
One-sentence Summary: Experimentally evaluating when and why supervised learning solves offline RL

Training Transition Policies via Distribution Matching for Complex Tasks

Keywords: Reinforcement Learning, Hierarchical Reinforcement Learning, Inverse Reinforcement Learning
One-sentence Summary: Training transition policies via distribution matching

finger8603

😀

Document