# awesome-offline-rl
This is a collection of research and review papers for **offline reinforcement learning (offline rl)**. Feel free to star and fork.
Maintainers:
- Haruka Kiyohara (Tokyo Institute of Technology / Hanjuku-kaso Co., Ltd.)
- Yuta Saito (Hanjuku-kaso Co., Ltd. / Cornell University)
We are looking for more contributors and maintainers! Please feel free to pull requests.
format:
- [title](paper link) [links]
- author1, author2, and author3. arXiv/conferences/journals/, year.
For any question, feel free to contact: saito@hanjuku-kaso.com
## 目录
- Papers
- Open Source Software/Implementations
- Blog/Podcast
- Related Workshops
- Tutorials/Talks/Lectures
## Papers
### Review Papers
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
- Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. arXiv, 2020.
### Position Papers
- Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation
- Haruka Kiyohara, Kosuke Kawakami, and Yuta Saito. arXiv, 2021.
- A Survey of Generalisation in Deep Reinforcement Learning
- Robert Kirk, Amy Zhang, Edward Grefenstette, and Tim Rocktäschel. arXiv, 2021.
### Offline RL: Theory/Methods
- How to Leverage Unlabeled Data in Offline Reinforcement Learning
- Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Chelsea Finn, and Sergey Levine. arXiv, 2022.
- Can Wikipedia Help Offline Reinforcement Learning?
- Machel Reid, Yutaro Yamada, and Shixiang Shane Gu. arXiv, 2022.
- MOORe: Model-based Offline-to-Online Reinforcement Learning
- Yihuan Mao, Chao Wang, Bin Wang, and Chongjie Zhang. arXiv, 2022.
- Operator Deep Q-Learning: Zero-Shot Reward Transferring in Reinforcement Learning
- Ziyang Tang, Yihao Feng, and Qiang Liu. arXiv, 2022.
- Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning
- Samin Yeasar Arnob, Riashat Islam, and Doina Precup. arXiv, 2022.
- Single-Shot Pruning for Offline Reinforcement Learning
- Samin Yeasar Arnob, Riyasat Ohib, Sergey Plis, and Doina Precup. arXiv, 2022.
- Model Selection in Batch Policy Optimization
- Jonathan N. Lee, George Tucker, Ofir Nachum, and Bo Dai. arXiv, 2021.
- RvS: What is Essential for Offline RL via Supervised Learning?
- Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, and Sergey Levine. arXiv, 2021.
- Learning Contraction Policies from Offline Data
- Navid Rezazadeh, Maxwell Kolarich, Solmaz S. Kia, and Negar Mehr. arXiv, 2021.
- DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization
- Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, and Sergey Levine. arXiv, 2021.
- CoMPS: Continual Meta Policy Search
- Glen Berseth, Zhiwei Zhang, Grace Zhang, Chelsea Finn, Sergey Levine. arXiv, 2021.
- MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance
- Michael Luo, Ashwin Balakrishna, Brijen Thananjeyan, Suraj Nair, Julian Ibarz, Jie Tan, Chelsea Finn, Ion Stoica, and Ken Goldberg. arXiv, 2021.
- Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks
- Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, and Bo Xu. arXiv, 2021.
- Generalizing Off-Policy Learning under Sample Selection Bias
- Tobias Hatt, Daniel Tschernutter, and Stefan Feuerriegel. arXiv, 2021.
- Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization
- Thanh Nguyen-Tang, Sunil Gupta, A.Tuan Nguyen, and Svetha Venkatesh. arXiv, 2021.
- Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions
- Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, and Jonathan Tompson. arXiv, 2021.
- Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification
- Ling Pan, Longbo Huang, Tengyu Ma, and Huazhe Xu. arXiv, 2021.
- Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
- Yanwei Jia and Xun Yu Zhou. arXiv, 2021.
- Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation
- Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, and Yunzong Xu. arXiv, 2021.
- UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning
- Christopher Diehl, Timo Sievernich, Martin Krüger, Frank Hoffmann, and Torsten Bertran. arXiv, 2021.
- Generalized Decision Transformer for Offline Hindsight Information Matching [website]
- Hiroki Furuta, Yutaka Matsuo, and Shixiang Shane Gu. arXiv, 2021.
- Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning
- Vincent Liu, James Wright, and Martha White. arXiv, 2021.
- Batch Reinforcement Learning from Crowds
- Guoxi Zhang and Hisashi Kashima. arXiv, 2021.
- Koopman Q-learning: Offline Reinforcement Learning via Symmetries of Dynamics
- Matthias Weissenbacher, Samarth Sinha, Animesh Garg, and Yoshinobu Kawahara. arXiv, 2021.
- SCORE: Spurious COrrelation REduction for Offline Reinforcement Learning
- Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai, Zhaoran Wang, and Jing Jiang. arXiv, 2021.
- Safely Bridging Offline and Online Reinforcement Learning
- Wanqiao Xu, Kan Xu, Hamsa Bastani, and Osbert Bastani. arXiv, 2021.
- Efficient Robotic Manipulation Through Offline-to-Online Reinforcement Learning and Goal-Aware State Information
- Jin Li, Xianyuan Zhan, Zixu Xiao, and Guyue Zhou. arXiv, 2021.
- Offline Reinforcement Learning with Value-based Episodic Memory
- Xiaoteng Ma, Yiqin Yang, Hao Hu, Qihan Liu, Jun Yang, Chongjie Zhang, Qianchuan Zhao, and Bin Liang. arXiv, 2021.
- Value Penalized Q-Learning for Recommender Systems
- Chengqian Gao, Ke Xu, and Peilin Zhao. arXiv, 2021.
- Offline Reinforcement Learning with Soft Behavior Regularization
- Haoran Xu, Xianyuan Zhan, Jianxiong Li, and Honglei Yin. arXiv, 2021.
- Planning from Pixels in Environments with Combinatorially Hard Search Spaces
- Marco Bagatella, Mirek Olšák, Michal Rolínek, and Georg Martius. arXiv, 2021.
- StARformer: Transformer with State-Action-Reward Representations
- Jinghuan Shang and Michael S. Ryoo. arXiv, 2021.
- Offline Reinforcement Learning with Implicit Q-Learning
- Ilya Kostrikov, Ashvin Nair, and Sergey Levine. arXiv, 2021.
- Representation Learning for Online and Offline RL in Low-rank MDPs
- Masatoshi Uehara, Xuezhou Zhang, and Wen Sun. arXiv, 2021.
- Revisiting Design Choices in Model-Based Offline Reinforcement Learning
- Cong Lu, Philip J. Ball, Jack Parker-Holder, Michael A. Osborne, and Stephen J. Roberts. arXiv, 2021.
- Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters
- Vladislav Kurenkov and Sergey Kolesnikov. arXiv, 2021.
- Offline RL With Resource Constrained Online Deployment [code]
- Jayanth Reddy Regatti, Aniket Anand Deshmukh, Frank Cheng, Young Hun Jung, Abhishek Gupta, and Urun Dogan. arXiv, 2021.
- Lifelong Robotic Reinforcement Learning by Retaining Experiences [website]
- Annie Xie and Chelsea Finn. arXiv, 2021.
- Dual Behavior Regularized Reinforcement Learning
- Chapman Siu, Jason Traish, and Richard Yi Da Xu. arXiv, 2021.
- DCUR: Data Curriculum for Teaching via Samples with Reinforcement Learning [website] [code]
- Daniel Seita, Abhinav Gopal, Zhao Mandi, and John Canny. arXiv, 2021.
- DROMO: Distributionally Robust Offline Model-based Policy Optimization
- Ruizhen Liu, Dazhi Zhong, and Zhicong Chen. arXiv, 2021.
- Implicit Behavioral Cloning
- Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, and Jonathan Tompson. arXiv, 2021.
- Reducing Conservativeness Oriented Offline Reinforcement Learning
- Hongchang Zhang, Jianzhun Shao, Yuhang Jiang, Shuncheng He, and Xiangyang Ji. arXiv, 2021.
- Policy Gradients Incorporating the Future
- David Venuto, Elaine Lau, Doina Precup, and Ofir Nachum. arXiv, 2021.
- Offline Decentralized Multi-Agent Reinforcement Learning
- Jiechuan Jiang and Zongqing Lu. arXiv, 2021.
- OPAL: Offline Preference-Based Apprenticeship Learning [website]
- Daniel Shin and Daniel S. Brown. arXiv, 2021.
- Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning
- Haoran Xu, Xianyuan Zhan, and Xiangyu Zhu. arXiv, 2021.
- Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage [video]
- Masatoshi Uehara and Wen Sun. arXiv, 2021.
- Offline Meta-Reinforcement Learning with Online Self-Supervision
- Vitchyr H. Pong, Ashvin Nair, Laura Smith, Catherine Huang, and Sergey Levine. arXiv, 2021.
- Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning
- Lionel Blondé and Alexandros Kalousis. arXiv, 2021.
- The Least Restriction for Offline Reinforcement Learning
- Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble
- Seunghyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, and Jinwoo Shin. arXiv, 2021.
- Causal Reinforcement Learning using Observational and Interventional Data
- Maxime Gasse, Damien Grasset, Guillaume Gaudron, and Pierre-Yves Oudeyer. arXiv, 2021.
- On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data
- Chenjun Xiao, Ilbin Lee, Bo Dai, Dale Schuurmans, and Csaba Szepesvari. arXiv, 2021.
- Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL [website]
- Catherine Cang, Aravind Rajeswaran, Pieter Abbeel, and Michael Laskin. arXiv, 2021.
- On Multi-objective Policy Optimization as a Tool for Reinforcement Learning
- Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, and Martin Riedmiller. arXiv, 2021.
- Offline Reinforcement Learning as Anti-Exploration
- Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, and Matthieu Geist. arXiv, 2021.
- Corruption-Robust Offline Reinforcement Learning
- Xuezhou Zhang, Yiding Chen, Jerry Zhu, and Wen Sun. arXiv, 2021.
- Offline Inverse Reinforcement Learning
- Firas Jarboui and Vianney Perchet. arXiv, 2021.
- Heuristic-Guided Reinforcement Learning
- Ching-An Cheng, Andrey Kolobov, and Adith Swaminathan. arXiv, 2021.
- Reinforcement Learning as One Big Sequence Modeling Problem
- Michael Janner, Qiyang Li, and Sergey Levine. arXiv, 2021.
- Decision Transformer: Reinforcement Learning via Sequence Modeling
- Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. arXiv, 2021.
- Model-Based Offline Planning with Trajectory Pruning
- Xianyuan Zhan, Xiangyu Zhu, and Haoran Xu. arXiv, 2021.
- InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem
- Markel Sanz Ausin, Hamoon Azizsoltani, Song Ju, Yeo Jin Kim, and Min Chi. arXiv, 2021.
- Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm [video]
- Lin Chen, Bruno Scherrer, and Peter L. Bartlett. arXiv, 2021.
- MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [website]
- Dmitry Kalashnikov, Jacob Varley, Yevgen Chebotar, Benjamin Swanson, Rico Jonschkowski, Chelsea Finn, Sergey Levine, and Karol Hausman. arXiv, 2021.
- Distributional Offline Continuous-Time Reinforcement Learning with Neural Physics-Informed PDEs (SciPhy RL for DOCTR-L)
- Igor Halperin. arXiv, 2021.
- Regularized Behavior Value Estimation
- Caglar Gulcehre, Sergio Gómez Colmenarejo, Ziyu Wang, Jakub Sygnowski, Thomas Paine, Konrad Zolna, Yutian Chen, Matthew Hoffman, Razvan Pascanu, and Nando de Freitas. arXiv, 2021.
- Causal-aware Safe Policy Improvement for Task-oriented dialogue
- Govardana Sachithanandam Ramachandran, Kazuma Hashimoto, and Caiming Xiong. arXiv, 2021.
- Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning
- Lanqing Li, Yuanhao Huang, and Dijun Luo. arXiv, 2021.
- Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
- Luofeng Liao, Zuyue Fu, Zhuoran Yang, Mladen Kolar, and Zhaoran Wang. arXiv, 2021.
- GELATO: Geometrically Enriched Latent Model for Offline Reinforcement Learning
- Guy Tennenholtz, Nir Baram, and Shie Mannor. arXiv, 2021.
- MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning
- DiJia Su, Jason D. Lee, John M. Mulvey, and H. Vincent Poor. arXiv, 2021.
- Continuous Doubly Constrained Batch Reinforcement Learning
- Rasool Fakoor, Jonas Mueller, Pratik Chaudhari, and Alexander J. Smola. arXiv, 2021.
- Q-Value Weighted Regression: Reinforcement Learning with Limited Data
- Piotr Kozakowski, Łukasz Kaiser, Henryk Michalewski, Afroz Mohiuddin, and Katarzyna Kańska. arXiv, 2021.
- Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency
- Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, and Tengyang Xie. arXiv, 2021.
- Fast Rates for the Regret of Offline Reinforcement Learning [video]
- Yichun Hu, Nathan Kallus, and Masatoshi Uehara. arXiv, 2021.
- Identifying Decision Points for Safe and Interpretable Reinforcement Learning in Hypotension Treatment
- Kristine Zhang, Yuanheng Wang, Jianzhun Du, Brian Chu, Leo Anthony Celi, Ryan Kindle, and Finale Doshi-Velez. arXiv, 2021.
- Weighted Model Estimation for Offline Model-based Reinforcement Learning
- Toru Hishinuma and Kei Senda. NeurIPS, 2021.
- A Minimalist Approach to Offline Reinforcement Learning
- Scott Fujimoto and Shixiang Shane Gu. NeurIPS, 2021.
- Conservative Offline Distributional Reinforcement Learning
- Yecheng Jason Ma, Dinesh Jayaraman, and Osbert Bastani. NeurIPS, 2021.
内容未完,完整请点击下方链接查看原文