Least-Squares Policy Iteration, Lagoudakis et al, 2003.JMLR,Algorithm: LSPI.
Tree-Based Batch Mode Reinforcement Learning, Ernst et al, 2005.JMLR.Algorithm: FQI.
Neural fitted q iteration–first experiences with a data efficient neural reinforcement learning method, Riedmiller, 2005.Algorithm: NFQ
Off-Policy Actor-Critic, Degris et al, 2012.CoRR.Algorithm: Off-Policy Actor-Critic.
Guided Policy Search, Levine et al, 2013.ICML.Algorithm: GPS.
Safe Policy Improvement by Minimizing Robust Baseline Regret, Petrik et al,2016.NIPS.Algorithm:RMDP,Approximate Robust Baseline Regret Minimization
Double Robust Off-Policy Value Evaluation for Reinforcement Learning, Jiang et al, 2016.ICML.Algorithm😃R
Break Curse of Horizon: Infinite-Horizon Off-Policy Estimation, Liu et al, 2018.NIPS.Algorithm:Stationary State Density Ratio Estimation
Safe Policy Improvement with Baseline Bootstrapping, Laroche et al, 2018.ICML.Algorithm: SPIBB.
Constrained Policy Improvement For Safe and Efficient Reinforcement Learning, Sarafian et al, 2018.IJCAI.Algorithm: RBI.
Off-Policy Deep Reinforcement Learning without Exploration, Fujimoto et al, 2019.ICML.Algorithm: BCQ, VAE-BC.
Stabilizing Off-Policy RL via Bootstrapping Error Reduction, Kumar et al, 2019.NIPS.Algorithm: BEAR-QL.
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections, Nachum et al, 2019.NIPS.Algorithm: DualDICE.
AlgaeDICE: Policy Gradient from Arbitrary Experience, Nachum et al, 2019.arxiv.Algorithm: ALGAE.
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning, Peng et al, 2019.arxiv.Algorithm: AWR
Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift, Islam et al, 2019.arxiv.Algorithm: StateKL.
Behavior Regularized Offline Reinforcement Learning, Wu et al, 2019.CoRR.Algorithm: BRAC(vp_pr).
Off-Policy Policy Gradient with State Distribution Correction, Liu et al, 2019.CoRR.Algorithm: OPPOSD
From Importance Sampling to Double Robust Policy Gradient, Huang et al, 2020.ICML.Agorithm😃R-PG.
Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning, Siegel et al, 2020.ICLR.Algorithm: Behavior Extraction Priors.
GenDICE: Generalized Offline Estimation of Stationary Values, Zhang et al, 2020.ICLR.Algorithm:GenDICE.
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values, Zhang et al, 2020.ICML.Algorithm:GradientDICE.
Batch Stationary Distribution Estimation, Wen et al, 2020.ICML.Algorithm: variational power method.
BRPO: Batch Residual Policy Optimization, Sohn et al, 2020.IJCAI.Algorithm: BRPO.
On Reward-Free Reinforcement Learning with Linear Function Approximation, Wang et al, 2020.NIPS.Algorithm: Exploration& Planning Phase Reward Free RL.
AWAC: Accelerating Online Reinforcement Learning with Offline Dataset, Nair et al, 2020.arxiv.Algorithm: AWAC.
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies, Kallus et al, 2020.NIPS.Algorithm: deterministic DR.
Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning, Kallus et al, 2020.arxiv.Algorithm: Efficient Off-Policy Evaluation for Natural Stochastic Policies
Conservative Q-Learning for Offline Reinforcement Learning, Kumar et al, 2020.NIPS.Algorithm: CQL.
Provably Good Batch Reinforcement Learning Without Great Exploration, Liu et al , 2020.NIPS.Algorithm: PQI.
Critic Regularized Regression, Wang et al, 2020.NIPS.Algorithm: CRR.
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL, Kamyar et al, 2020.ICML.Algorithm: EMaQ.
Batch Reinforcement Learning Through Continiation Method, Guo et al, 2021.ICLR.Algorithm: Soft Policy Iteration through Continuation Method.
Offline Reinforcement Learning with Fisher Divergence Critic Regularization, Kostrikov et al, 2021.ICML.Algorithm: Fisher-BRC.
Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble, Lee et al, 2021.arxiv.Algorithm: Balance Replay, Pessimistic Q-Ensemble.
You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL, Wonjoon Goo and Scott Niekum, 2021.CoRL.Algorithm: YOEO.
Causal Reinforcement Learning using Observational and Interventional Data, Gasse et al, 2021.arxiv.Algorithm: augmented POMDP.
Dealing with Unknown: Pessimistic Offline Reinforcement Learning, Li et al, 2021.CoRL.Algorithm: PessORL.
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble, An et al, 2021.NIPS.Algorithm: EDAC.
Offline Reinforcement Learning with Implicit Q-Learning, Kostrikov et al, 2021.arxiv.Algorithm: IQL.
Value Penalized Q-Learning for Recommender Systems, Gao et al, 2021.arxiv.Algorithm: VPQ.
Offline Reinforcement Learning with Pseudometric Learning, Dadashi et al, 2021.ICML.Algorithm: PLOFF.
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation, Lee et al, 2021.ICML.Algorithm: OptiDICE.
Offline RL Without Off-Policy Evaluation, Brandfonbrener et al, 2021.NIPS.Algorithm: One-step algorithm.
Offline Reinforcement Learning with Soft Behavior Regularization, Xu et al, 2021.arxiv.Algorithm: SBAC.
MOReL: Model-Based Offline Reinforcement Learning, Kidambi et al, 2020.TWIML.Algorithm: MOReL.
MOPO: Model-based Offline Policy Optimization, Yu et al, 2020.NIPS.Algorithm: MOPO.
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization, Matsushima et al, 2020.ICLR.Algorithm: BREMEN.
Overcoming Model Bias for Robust Offline Deep Reinforcement Learning, Swazinna et al, 2020.arxiv.Algorithm: MOOSE.
Model-Based Offline Planning, Argenson et al, 2020.arxiv.Alogorithm: MBOP.
DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs, Shrestha et al, 2020.ICLR.Algorithm: DAC-MDP.
Causality and Batch Reinforcement Learning: Complementary Approaches to Planning in Unknown Domains, Bannon et al, 2020.arxiv.Algorithm: Counterfactual Policy Evaluation.
Counterfactual Data Augmentation using Locally Factored Dynamics, Pitis et al, 2020.NIPS.Algorithm: CoDA.
Offline Reinforcement Learning from Images with Latent Space Model, Rafailov et al, 2020.arxiv.Algorithm: LOMPO.
Model-Based Visual Planning with Self-Supervised Functional Distances, Tian et al, 2020.ICLR.Algorithm: MBOLD.
Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment, Ball et al, 2021.ICML.Algorithm: AugWM.
Vector Quantized Models for Planning, Ozair et al, 2021.ICML.Algorithm: VQVAE.
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators, Agarwal et al, 2021.NIPS.Algorithm: PerSim.
COMBO: Conservative Offline Model-Based Policy Optimization, Yu et al, 2021.NIPS.Algorithm: COMBO.
Offline Model-based Adaptable Policy Learning, Chen et al, 2021.NIPS.Algorithm: MAPLE.
Online and Offline Reinforcement Learning by Planning with a Learned Model, Schrittwieser et al, 2021.NIPS.Algorithm: MuZero Unplugged.
Representation Matters: Offline Pretraining for Sequential Decision Making, Yang et al, 2021.ICML.Algorithm: representation learning via contrastive self-prediction.
Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen et al, 2021.arxiv.Algorithm: DT.
Offline Reinforcement Learning as One Big Sequence Modeling Problem, Janner et al, 2021.NIPS.Algorithm: Trajectory Transformer.
StARformer: Transformer with State-Action-Reward Representations, Shang et al, 2021.arxiv.Algorithm: StARformer.
Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL, Cang et al, 2021.arxiv.Algorithm: MABE.
Offline Reinforcement Learning with Reverse Model-based Imagination, Wang et al, 2021.NIPS.Algorithm: ROMI.
Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics, Weissenbacher et al, 2021.arxiv.Algorithm: KFC.
Generalized Decision Transformer for Offline Hindsight Information Matching, Furuta et al, 2021.arxiv.Algorithm: DT-X, CDT, BDT.
UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning, Diehl et al, 2021.arxiv.Algorithm: UMBRELLA.
Hyperparameter Selection for Offline Reinforcement Learning, Paine et al, 2020.arxiv.
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning, Chen et al, 2020.arxiv.
Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones, Thananjeyan et al, 2020.arxiv.
Batch Value-function Approximation with Only Realizability, Xie et al, 2020.arxiv.
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient, Hao et al, 2020.arxiv.
What are the Statistical Limits of Offline RL with Linear Function Approximation?, Wang et al, 2020.RL Theory Seminar2021.
A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting, Amortila et al, 2020.arxiv.
Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation, Lu et al, 2020.arxiv.
Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL, Zanette et al, 2020.RL Theory Seminar2021.
A Workflow for Offline Model-Free Robotic Reinforcement Learning, Kumar et al, 2021.CoRL.
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning in Robotics, Sinha et al, 2021.CoRL.
Instabilities of Offline RL with Pre-Trained Neural Representation, Wang et al, 2021.ICML.
Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning, Duan et al, 2021.ICML.
Offline Contextual Bandits with Overparameterized Models, Brandfonbrener et al, 2021.ICML.
Is Pessimism Provably Efficient for Offline RL?, Jin et al, 2021.RL Theory Seminar2021.
Near-Optimal Offline Reinforcement Learning via Double Variance Reduction, Yin et al, 2021.NIPS.
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism, Rashidinejad et al, 2021.RL Theory Seminar2021.
Nearly Horizon-Free Offline Reinforcement Learning, Ren et al, 2021.NIPS.
Bellman-consistent Pessimism for Offline Reinforcement Learning, Xie et al, 2021.RL Theory Seminar2021.
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning, Xie et al, 2021.NIPS.
The Difficulty of Passive Learning in Deep Reinforcement Learning, Ostrovski et al, 2021.NIPS.
Multi-Task Batch Reinforcememt Learning with Metric Learning, Li et al, 2020.NIPS.Algorithm: MBML.
Offline Meta Learning of Exploration, Dorfman et al, 2020.arxiv.Algorithm: BORel.
Offline Meta-Reinforcement Learning with Advantage Weighting, Mitchell et al, 2020.ICML.Algorithm: MACAW.
Goal-Conditioned Batch Reinforcement Learning for Rotation Invariant Locomotion, Mavalankar et al, 2020.arxiv.Algorithm: Enforcing equivalence.
Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework, Zhang et al, 2020.AAAI.Algorithm: MaxRenyi.
Reset-Free Lifelong Learning with Skill-Space Planning, Lu et al, 2021.ICLR.Algorithm: LiSP.
Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization, Li et al, 2021.ICLR.Algorithm: FOCAL.
Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills, Chebotar et al, 2021.ICML.Algorithm: Actionable Model.
Conservative Data Sharing for Multi-Task Offline Reinforcement Learning, Yu et al, 2021.NIPS.Algorithm: CDS.
Offline Meta Reinforcement Learning — Identifiability Challenges and Effective Data Collection Strategies, Dorfman et al, 2021.NIPS.Algorithm: BORel.
Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions, Mazoure et al, 2021.arxiv.Algorithm: GSF.
Offline Meta-Reinforcement Learning with Online Self-Supervision, Pong et al, 2021.arxiv.Algorithm: Semi-Supervised Meta Actor-Critic.
Lifelong Robotic Reinforcement Learning by Retaining Experiences, Xie et al, 2021.arxiv.Algorithm: Lifelong RL by Retaining Experiences.
A list of competitions that is of interest to the community. (sorted by starting date)
Real-world Reinforcement Learning Challenge—Learning to make fair and incentive coupon decisions for sales promotion from data, organized by Polixir, Dec. 25, 2021 – Feb. 27, 2022 (Ongoing)
MineRL BASALT Challenge NeurIPS 2021 Competition—Learning from Human Feedback in Minecraft, organized by C.H.A.I. – UC Berkeley, July 7, 2021 – Dec 14, 2021
MineRL Diamond Challenge NeurIPS 2021 Competition—Training Sample-Efficient Agents in Minecraft, organized by MineRL Labs – Carnegie Mellon University, Jun. 9, 2021 – Dec., 2021
Tactile Games Playtest Agent—Level Difficulty Prediction of Lily’s Garden Levels, at 3rd IEEE Conference on Games in 2021, organized by Tactile Games
Real Robot Challenge, organized by Empirical Inference Max Planck Institute for Intelligent Systems, May 28, 2021 – Sep. 16, 2021
Real Robot Challenge, organized by Empirical Inference Max Planck Institute for Intelligent Systems, Aug. 10, 2020 – Dec. 14, 2020
MineRL NeurIPS 2020 Competition—Sample-efficient reinforcement learning in Minecraft, organized by MineRL Labs – Carnegie Mellon University, Jul. 1, 2020 – Dec. 5, 2020
MineRL NeurIPS 2019 Competition—Sample-efficient reinforcement learning in Minecraft, organized by MineRL Labs – Carnegie Mellon University, May 10, 2019 – Dec 14, 2019