CORL (Clean Offline Reinforcement Learning)
Twitter arXiv Ruff
⚠️ This is an active and supported fork of the original CORL library. The original repository is on freeze and will not be update further.
🧵 CORL is an Offline Reinforcement Learning library that provides high-quality and easy-to-follow single-file implementations of SOTA ORL algorithms. Each implementation is backed by a research-friendly codebase, allowing you to run or tune thousands of experiments. Heavily inspired by cleanrl for online RL, check them out too!
📜 Single-file implementation
📈 Benchmarked Implementation (11+ offline algorithms, 5+ offline-to-online algorithms, 30+ datasets with detailed logs)
🖼 Weights and Biases integration
You can read more about CORL design and main results in our technical paper.
⭐ If you're interested in discrete control, make sure to check out our new library — Katakomba. It provides both discrete control algorithms augmented with recurrence and an offline RL benchmark for the NetHack Learning environment.
⚠️ NOTE: CORL (similarily to CleanRL) is not a modular library and therefore it is not meant to be imported. At the cost of duplicate code, we make all implementation details of an ORL algorithm variant easy to understand. You should consider using CORL if you want to 1) understand and control all implementation details of an algorithm or 2) rapidly prototype advanced features that other modular ORL libraries do not support.
Getting started
Please refer to the documentation for more details. TLDR:
git clone https://github.com/corl-team/CORL.git && cd CORL
pip install -r requirements/requirements_dev.txt
alternatively, you could use docker
docker build -t <image_name> .
docker run --gpus=all -it --rm --name <container_name> <image_name>
Algorithms Implemented
Algorithm Variants Implemented Wandb Report
Offline and Offline-to-Online
✅ Conservative Q-Learning for Offline Reinforcement Learning
(CQL) offline/cql.py
finetune/cql.py Offline
Offline-to-online
✅ Accelerating Online Reinforcement Learning with Offline Datasets
(AWAC) offline/awac.py
finetune/awac.py Offline
Offline-to-online
✅ Offline Reinforcement Learning with Implicit Q-Learning
(IQL) offline/iql.py
finetune/iql.py Offline
Offline-to-online
Offline-to-Online only
✅ Supported Policy Optimization for Offline Reinforcement Learning
(SPOT) finetune/spot.py Offline-to-online
✅ Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
(Cal-QL) finetune/cal_ql.py Offline-to-online
Offline only
✅ Behavioral Cloning
(BC) offline/any_percent_bc.py Offline
✅ Behavioral Cloning-10%
(BC-10%) offline/any_percent_bc.py Offline
✅ A Minimalist Approach to Offline Reinforcement Learning
(TD3+BC) offline/td3_bc.py Offline
✅ Decision Transformer: Reinforcement Learning via Sequence Modeling
(DT) offline/dt.py Offline
✅ Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
(SAC-N) offline/sac_n.py Offline
✅ Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
(EDAC) offline/edac.py Offline
✅ Revisiting the Minimalist Approach to Offline Reinforcement Learning
(ReBRAC) offline/rebrac.py Offline
✅ Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
(LB-SAC) offline/lb_sac.py Offline Gym-MuJoCo
D4RL Benchmarks
You can check the links above for learning curves and details. Here, we report reproduced final and best scores. Note that they differ by a significant margin, and some papers may use different approaches, not making it always explicit which reporting methodology they chose. If you want to re-collect our results in a more structured/nuanced manner, see results.