强化学习路线图

AndyYue1893

人工智能是21世纪最激动人心的技术之一。人工智能，就是像人一样的智能，而人的智能包括感知、决策和认知(从直觉到推理、规划、意识等)。其中，感知解决what，深度学习已经超越人类水平；决策解决how，强化学习在游戏和机器人等领域取得了一定效果；认知解决why，知识图谱、因果推理、持续学习等第三代人工智能正在研究。

强化学习，采用反馈学习的方式解决序贯决策问题，因此必然是通往通用人工智能的终极钥匙。其中，AI 1.0 符号学派， AI 2.0 联结学派，AI 3.0不管是结合也好，另辟蹊径也好，必然离不开行为学派，因为这是自然智能的学习方式。我特别喜欢强化学习，深深被其框架所吸引，智能体通过与环境交互来成长，这不就是生命的进化规律么！

个人作为一名AI独立研究员，一路也是通过知乎、b站、GitHub、公众号和各类博客学习过来，非常感谢网络时代大家的分享，同时将自己在强化学习方面的经验总结整理分享，既是方便自己学习，也希望能帮助一点刷到这条知乎的朋友们。当然，强化学习也面临很多问题，希望我们一起解决，让强化学习变得更好！

https://zhuanlan.zhihu.com/p/104224859

1. 视频（从入门到放弃）

1 .5 Stanford_Emma Brunskill_CS234: Reinforcement Learning | Winter 2019

2. 书籍

2.1 强化学习圣经Rich Sutton中文书、英文电子书、代码 ★★★★★

基础必读，有助于理解强化学习精髓
https://item.jd.com/12696004.html
http://incompleteideas.net/book/the-book-2nd.html
https://github.com/AndyYue1893/reinforcement-learning-an-introduction

2.2 Python深度学习：基于PyTorch[Deep Learning with Python and PyTorch] ★★★★★

思路简洁、清晰，内容经典、精华，深度强化学习研究基础
https://item.jd.com/12590209.html

2.3 Python强化学习实战_Sudharsan Ravichandiran、代码 ★★★★

上手快，代码清晰
https://item.jd.com/12506442.html
https://github.com/AndyYue1893/Hands-On-Reinforcement-Learning-With-Python

2.4 强化学习精要_冯超 ★★★★

从基础到前沿，附代码
https://item.jd.com/12344157.html

2.5 Reinforcement Learning With Open AI TensorFlow and Keras Using Python_OpenAI

注重实战（提取码: av5p）
https://pan.baidu.com/share/init?surl=nQpNbhkI-3WucSD0Mk7Qcg

3. 教程

3.1 莫烦Python

通俗易懂，快速入门

3.2 OpenAI Spinning Up英文版、中文版、介绍by量子位

在线学习平台，包括原理、算法、论文、代码
https://spinningup.openai.com/en/latest/
https://spinningup.readthedocs.io/zh_CN/latest/index.html
https://zhuanlan.zhihu.com/p/49087870

3.3 Stable Baselines3

PyTorch实现代码
https://stable-baselines3.readthedocs.io/en/master/
https://github.com/DLR-RM/stable-baselines3

4. 代码

除了https://github.com/AndyYue1893/spinningup 和 https://github.com/DLR-RM/stable-baselines3，推荐以下个人实现参考。代码有时比论文更重要！！！

4.1 sweetice

https://github.com/AndyYue1893/Deep-reinforcement-learning-with-pytorch

4.2 张楚珩

https://github.com/zhangchuheng123/Reinforcement-Implementation

5. 算法

请问DeepMind和OpenAI身后的两大RL流派有什么具体的区别？
https://www.zhihu.com/question/316626294/answer/627373838

三大经典算法，追根溯源

5.1 DQN(连续状态、离散动作)

Mnih. Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529. (Nature版本)
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf

5.2 DDPG(连续状态、连续动作)

David. Silver, et al. "Deterministic policy gradient algorithms." ICML. 2014.

5.3 A3C & A2C

Mnih. Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International conference on machine learning. 2016.
https://www.researchgate.net/publication/301847678_Asynchronous_Methods_for_Deep_Reinforcement_Learning
https://openai.com/blog/baselines-acktr-a2c/

6. 环境

6.1 OpenAI Gym

http://gym.openai.com

6.2 Emo Todorov Mujoco

http://www.mujoco.org

6.3 通用格子世界环境类

https://zhuanlan.zhihu.com/p/28109312
https://cs.stanford.edu/people/karpathy/reinforcejs/index.html

7. 框架/平台

目前最好用的大规模强化学习算法训练库是什么？
https://www.zhihu.com/question/377263715/answer/1120555103

7.1 OpenAI Baselines & Stable Baselines

集成度高，经典必读
https://github.com/openai/baselines
https://github.com/hill-a/stable-baselines

7.2 百度 PARL

扩展性强，可复现性好，友好
https://github.com/paddlepaddle/parl

7.3 DeepMind OpenSpiel（仅支持Debian和Ubuntu）

28种棋牌类游戏和24种算法
https://github.com/deepmind/open_spiel

7.4 清华 tianshou

fast-speed modularized framework and pythonic API
完美复现paper结果
https://github.com/thu-ml/tianshou

8. 论文

8.1 Spinning Up推荐论文 ★★★★★

https://zhuanlan.zhihu.com/p/50343077

8.2 NeuronDance ★★★★★

https://blog.csdn.net/gsww404

8.3 清华张楚珩 ★★★★

https://zhuanlan.zhihu.com/p/46600521

8.4 paperswithcode附代码 ★★★★

https://www.paperswithcode.com/area/playing-games
https://github.com/AndyYue1893/pwc

9. PPT

9.1 Reinforcement learning_Nando de Freitas_DeepMind_2019

https://pan.baidu.com/s/1KF10W9GifZCDf9T4FY2H9Q

9.2 Policy Optimization_Pieter Abbeel_OpenAI/UC Berkeley/Gradescope

https://pan.baidu.com/s/1KF10W9GifZCDf9T4FY2H9Q

10. 会议&期刊

10.1 会议

AAAI、NIPS、ICML、ICLR、IJCAI、AAMAS、IROS等

10.2 期刊

AI、JMLR、JAIR、Machine Learning、JAAMAS等

10.3 计算机和人工智能会议（期刊）排名

11. 公众号

1 深度强化学习实验室
2 机器之心
3 AI科技评论
4 新智元
5 学术头条

12. 知乎

12.1 大牛

田渊栋、Flood Sung、许铁-巡洋舰科技（微信公众号同名）、
周博磊、俞扬、张楚珩、天津包子馅儿、JQWang2048 及其互关大牛等

12.2 专栏

David Silver强化学习公开课中文讲解及实践（叶强，比较经典）
强化学习知识大讲堂（《深入浅出强化学习：原理入门》作者天津包子馅儿）
智能单元（杜克、Floodsung、wxam，聚焦通用人工智能，Flood Sung：深度学习论文阅读路线图 Deep Learning Papers Reading Roadmap很棒，Flood Sung：最前沿：深度强化学习的强者之路）
深度强化学习落地方法论（西交大牛，实操经验丰富）
深度强化学习（知乎：JQWang2048，GitHub：NeuronDance，CSDN：J. Q. Wang）
神经网络与强化学习（《Reinforcement Learning: An Introduction》读书笔记）
强化学习基础David Silver笔记（陈雄辉，南大，DiDi AI Labs）

13. 博客

博客大牛理解力超强！

13.1 lilianweng（OpenAI）

https://lilianweng.github.io/lil-log/

13.2 J. Q. Wang

https://blog.csdn.net/gsww404

13.3 草帽BOY

https://blog.csdn.net/u013236946/category_6965927.html

13.4 Andrej Karpathy（李飞飞高徒，Tesla AI和Autopilot Vision 部门主管）

http://karpathy.github.io

13.5 fromeast简书

https://www.jianshu.com/u/910953b56cd1

13.6 大卜口(谷歌大脑研究科学家 David Ha)

blog.otoro.net

14. 官网

14.1 OpenAI

14.2 DeepMind

14.3 Berkeley Artificial Intelligence Research

以及Sutton老爷子、Andrew NG、David Silver、Pieter Abbeel、John Schulman、Sergey Levine、Chelsea Finn、Andrej Karpathy等主页

RLer

A Guide Resource for Deep Reinforcement Learning

1. About this work:

This deep intensive learning database was initiated by the 【 Deep Reinforcement Learning Laboratory(DeepRL-Lab) 】and was jointly created by more than ** Ph.D. doctors and experts in the field. The goal is to enable each learner to make rapid progress and acquire relevant knowledge.

2. How to contribute？:

This project welcomes the contribution of each reinforcement learner, can be submitted according to their knowledge accumulation in a certain direction, and will be included in the list of contributors.

3. How to communicate？:

Welcome to the WeChat public (Deep-RL) and add WeChat Assistant (NeuronDance)

1. Books
2. Courses
3. Survey-and-Frontier
4. Environment-and-Framework
5. Baselines-and-Benchmarks
6. Algorithm
7. Applications
8. Advanced-Topics
9. Relate-Coureses
10. Multi-Agents
11. Paper-Resources
12.Contributors

#1. Books

Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto (2017),Chinese-Edtion, Code
Algorithms for Reinforcement Learning by Csaba Szepesvari (updated 2019)
Deep Reinforcement Learning Hands-On by Maxim Lapan (2018),Code
Reinforcement learning, State-Of-The- Art by Marco Wiering, Martijin van Otterlo
Deep Reinforcement Learning in Action by Alexander Zai and Brandon Brown (in progress)
Grokking Deep Reinforcement Learning by Miguel Morales (in progress)
Multi-Agent Machine Learning A Reinforcement Approach【百度云链接】 by Howard M.Schwartz(2017)
强化学习在阿里的技术演进与业务创新 by Alibaba Group
Hands-On Reinforcement Learning with Python(百度云链接)
Reinforcement Learning And Optimal Control by Dimitri P. Bertsekas, 2019

#2. Courses

UCL Course on RL(★★★) by David Sliver, Video-en,Video-zh
OpenAI's Spinning Up in Deep RL by OpenAI(2018)
Udacity-Deep Reinforcement learning, 2019-10-31
Stanford CS-234: Reinforcement Learning (2019), Videos
DeepMind Advanced Deep Learning & Reinforcement Learning (2018),Videos
GeorgiaTech CS-8803 Deep Reinforcement Learning (2018?)
UC Berkeley CS294-112 Deep Reinforcement Learning (2018 Fall),Video-zh
Deep RL Bootcamp by Berkeley CA(2017)
Thomas Simonini's Deep Reinforcement Learning Course
CS-6101 Deep Reinforcement Learning , NUS SoC, 2018/2019, Semester II
Course on Reinforcement Learning by Alessandro Lazaric，2018
Learn Deep Reinforcement Learning in 60 days

#3. Survey-and-Frontier

Deep Reinforcement Learning by Yuxi Li
Algorithms for Reinforcement Learning by Morgan & Claypool, 2009
Modern Deep Reinforcement Learning Algorithms by Sergey Ivanov(54-Page)
Deep Reinforcement Learning: An Overview (2018)
A Brief Survey of Deep Reinforcement Learning (2017)
Deep Reinforcement Learning Doesn't Work Yet（★） by Irpan, Alex(2018), ChineseVersion
Deep Reinforcement Learning that Matters(★) by Peter Henderson1, Riashat Islam1
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
An Introduction to Deep Reinforcement Learning
Challenges of Real-World Reinforcement Learning
Topics in Reinforcement Learning
Reinforcement Learning: A Survey,1996.
A Tutorial Survey of Reinforcement Learning, Sadhana,1994.
Reinforcement Learning in Robotics, A Survey, 2013
A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation., 2018
Universal Reinforcement Learning Algorithms: Survey and Experiments,2017
Bayesian Reinforcement Learning: A Survey, 2016
Benchmarking Reinforcement Learning Algorithms on Real-World Robots

#4. Environment-and-Framework

OpenAI Gym (GitHub) (docs)
rllab (GitHub) (readthedocs)
Ray (Doc)
Dopamine: https://github.com/google/dopamine (uses some tensorflow)
trfl: https://github.com/deepmind/trfl (uses tensorflow)
ChainerRL (GitHub) (API: Python)
Surreal GitHub (API: Python) (support: Stanford Vision and Learning Lab).Paper
PyMARL GitHub (support: http://whirl.cs.ox.ac.uk/)
TF-Agents: https://github.com/tensorflow/agents (uses tensorflow)
TensorForce (GitHub) (uses tensorflow)
RL-Glue (Google Code Archive) (API: C/C++, Java, Matlab, Python, Lisp) (support: Alberta)
MAgent https://github.com/geek-ai/MAgent (uses tensorflow)
RLlib http://ray.readthedocs.io/en/latest/rllib.html (API: Python)
http://burlap.cs.brown.edu/ (API: Java)
rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch
robotics-rl-srl - S-RL Toolbox: Reinforcement Learning (RL) and State Representation Learning (SRL) for Robotics
pysc2: StarCraft II Learning Environment
Arcade-Learning-Environment
OpenAI universe - A software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications
DeepMind Lab - A customisable 3D platform for agent-based AI research
Project Malmo - A platform for Artificial Intelligence experimentation and research built on top of Minecraft by Microsoft
Retro Learning Environment - An AI platform for reinforcement learning based on video game emulators. Currently supports SNES and Sega Genesis. Compatible with OpenAI gym.
torch-twrl - A package that enables reinforcement learning in Torch by Twitter
UETorch - A Torch plugin for Unreal Engine 4 by Facebook
TorchCraft - Connecting Torch to StarCraft
rllab - A framework for developing and evaluating reinforcement learning algorithms, fully compatible with OpenAI Gym
TensorForce - Practical deep reinforcement learning on TensorFlow with Gitter support and OpenAI Gym/Universe/DeepMind Lab integration.
OpenAI lab - An experimentation system for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
keras-rl - State-of-the art deep reinforcement learning algorithms in Keras designed for compatibility with OpenAI.
BURLAP - Brown-UMBC Reinforcement Learning and Planning, a library written in Java
MAgent - A Platform for Many-agent Reinforcement Learning.
Ray RLlib - Ray RLlib is a reinforcement learning library that aims to provide both performance and composability.
SLM Lab - A research framework for Deep Reinforcement Learning using Unity, OpenAI Gym, PyTorch, Tensorflow.
Unity ML Agents - Create reinforcement learning environments using the Unity Editor
Intel Coach - Coach is a python reinforcement learning research framework containing implementation of many state-of-the-art algorithms.
ELF - An End-To-End, Lightweight and Flexible Platform for Game Research
Unity ML-Agents Toolkit
rlkit
https://gym.openai.com/envs/#classic_control
https://github.com/erlerobot/gym-gazebo
https://github.com/robotology/gym-ignition
https://github.com/dartsim/gym-dart
https://github.com/Roboy/gym-roboy
https://github.com/openai/retro
https://github.com/openai/gym-soccer
https://github.com/duckietown/gym-duckietown
https://github.com/Unity-Technologies/ml-agents (Unity, multiagent)
https://github.com/koulanurag/ma-gym (multiagent)
https://github.com/ucuapps/modelicagym
https://github.com/mwydmuch/ViZDoom
https://github.com/benelot/pybullet-gym
https://github.com/Healthcare-Robotics/assistive-gym
https://github.com/Microsoft/malmo
https://github.com/nadavbh12/Retro-Learning-Environment
https://github.com/twitter/torch-twrl
https://github.com/arex18/rocket-lander
https://github.com/ppaquette/gym-doom
https://github.com/thedimlebowski/Trading-Gym
https://github.com/Phylliade/awesome-openai-gym-environments
https://github.com/deepmind/pysc2 (by DeepMind) (Blizzard StarCraft II Learning Environment (SC2LE) component)

#5. Baselines-and-Benchmarks

#6. Algorithms

1. DQN serial

Playing Atari with Deep Reinforcement Learning arxiv [code
Deep Reinforcement Learning with Double Q-learning [arxiv [code
Dueling Network Architectures for Deep Reinforcement Learning [arxiv [code
Prioritized Experience Replay [arxiv [code
Noisy Networks for Exploration [arxiv [code
A Distributional Perspective on Reinforcement Learning [arxiv [code
Rainbow: Combining Improvements in Deep Reinforcement Learning [arxiv [code

2. Others

Algorithm Codeing

Deep-Reinforcement-Learning-Algorithms-with-PyTorch

#7. Applications

7.1 Basic

Reinforcement Learning Applications
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control by Hua Wei，Guanjie Zheng(2018)
Deep Reinforcement Learning by Yuxi Li, 2018
Deep Reinforcement Learning in Robotics
7.2 Robotics
- Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion (Kohl, ICRA 2004) [Paper
- Robot Motor SKill Coordination with EM-based Reinforcement Learning (Kormushev, IROS 2010) [Paper [Video
- Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (Hester, ICRA 2010) [Paper [Video
- Autonomous Skill Acquisition on a Mobile Manipulator (Konidaris, AAAI 2011) [Paper [Video
- PILCO: A Model-Based and Data-Efficient Approach to Policy Search (Deisenroth, ICML 2011) [Paper
- Incremental Semantically Grounded Learning from Demonstration (Niekum, RSS 2013) [Paper
- Efficient Reinforcement Learning for Robots using Informative Simulated Priors (Cutler, ICRA 2015) [Paper [Video
- Robots that can adapt like animals (Cully, Nature 2015) [Paper] [Video] [Code]
- Black-Box Data-efficient Policy Search for Robotics (Chatzilygeroudis, IROS 2017) [Paper] [Video] [Code]

#8. Advanced-Topics

8.1. Model-free RL

playing atari with deep reinforcement learning NIPS Deep Learning Workshop 2013. paper

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Human-level control through deep reinforcement learning Nature 2015. paper

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis
Deep Reinforcement Learning with Double Q-learning AAAI 16. paper

Hado van Hasselt, Arthur Guez, David Silver
Dueling Network Architectures for Deep Reinforcement Learning ICML16. paper

Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas
Deep Recurrent Q-Learning for Partially Observable MDPs AAA15. paper

Matthew Hausknecht, Peter Stone
Prioritized Experience Replay ICLR 2016. paper

Tom Schaul, John Quan, Ioannis Antonoglou, David Silver
Asynchronous Methods for Deep Reinforcement Learning ICML2016. paper

Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
A Distributional Perspective on Reinforcement Learning ICML2017. paper

Marc G. Bellemare, Will Dabney, Rémi Munos
Noisy Networks for Exploration ICLR2018. paper

Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg
Rainbow: Combining Improvements in Deep Reinforcement Learning AAAI2018. paper

Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

8.2. Model-based RL

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion NIPS2018. paper

Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee
Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning ICML2018.paper

Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, Sergey Levine
Value Prediction Network NIPS2017. paper

Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, Sergey Levine
Imagination-Augmented Agents for Deep Reinforcement Learning NIPS2017. paper

Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra
Continuous Deep Q-Learning with Model-based Acceleration ICML2016. paper

Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
Uncertainty-driven Imagination for Continuous Deep Reinforcement Learning CoRL2017. paper

Gabriel Kalweit, Joschka Boedecker
Model-Ensemble Trust-Region Policy Optimization ICLR2018. paper

Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models NIPS2018. paper

Kurtland Chua, Roberto Calandra, Rowan McAllister, Sergey Levine
Dyna, an integrated architecture for learning, planning, and reacting ACM1991. paper

Sutton, Richard S
Learning Continuous Control Policies by Stochastic Value Gradients NIPS 2015. paper

Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez
Imagination-Augmented Agents for Deep Reinforcement Learning NIPS 2017. paper

Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, Demis Hassabis, David Silver, Daan Wierstra
Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks ICLR 2017. paper

Stefan Depeweg, José Miguel Hernández-Lobato, Finale Doshi-Velez, Steffen Udluft

8.3 Function Approximation methods (Least-Square Temporal Difference, Least-Square Policy Iteration)

Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper
Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper [Code

8.4 Policy Search/Policy Gradient

Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper
Natural Actor-Critic, ECML, 2005. [Paper
Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper
Relative Entropy Policy Search, AAAI, 2010. [Paper
Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper
Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper
PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper
Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper
Black-Box Data-efficient Policy Search for Robotics, IROS, 2017. [Paper]

8.5 Hierarchical RL

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper
Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007. [Paper

8.6 Inverse RL

updating..........

8.7 Meta RL

updating..........

8.8. Rewards

8.9. Policy Gradient

Policy Gradient

8.10. Distributed Reinforcement Learning

Asynchronous Methods for Deep Reinforcement Learning by ICML 2016.paper
GA3C: GPU-based A3C for Deep Reinforcement Learning by Iuri Frosio, Stephen Tyree, NIPS 2016
Distributed Prioritized Experience Replay by Dan Horgan, John Quan, David Budden,ICLR 2018
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures by Lasse Espeholt, Hubert Soyer, Remi Munos ,ICML 2018
Distributed Distributional Deterministic Policy Gradients by Gabriel Barth-Maron, Matthew W. Hoffman, ICLR 2018.
Emergence of Locomotion Behaviours in Rich Environments by Nicolas Heess, Dhruva TB, Srinivasan Sriram, 2017
GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning by Jacky Liang, Viktor Makoviychuk, 2018
Recurrent Experience Replay in Distributed Reinforcement Learning bySteven Kapturowski, Georg Ostrovski, ICLR 2019.

#9. Relate-Coureses

9.1. Game Theory

Game Theory Course, Yale University
Game Theory - The Full Course, Stanford University
Algorithmic Game Theory (CS364A, Fall 2013) , Stanford University

9.2. other

......

#10. Multi-Agents

10.1 Tutorial and Books

Deep Multi-Agent Reinforcement Learning by Jakob N Foerster, 2018. PhD Thesis.
Multi-Agent Machine Learning: A Reinforcement Approach by H. M. Schwartz, 2014.
Multiagent Reinforcement Learning by Daan Bloembergen, Daniel Hennes, Michael Kaisers, Peter Vrancx. ECML, 2013.
Multiagent systems: Algorithmic, game-theoretic, and logical foundations by Shoham Y, Leyton-Brown K. Cambridge University Press, 2008.

10.2 Review Papers

A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems by Silva, Felipe Leno da; Costa, Anna Helena Reali. JAIR, 2019.
Autonomously Reusing Knowledge in Multiagent Reinforcement Learning by Silva, Felipe Leno da; Taylor, Matthew E.; Costa, Anna Helena Reali. IJCAI, 2018.
Deep Reinforcement Learning Variants of Multi-Agent Learning Algorithms by Castaneda A O. 2016.
Evolutionary Dynamics of Multi-Agent Learning: A Survey by Bloembergen, Daan, et al. JAIR, 2015.
Game theory and multi-agent reinforcement learning by Nowé A, Vrancx P, De Hauwere Y M. Reinforcement Learning. Springer Berlin Heidelberg, 2012.
Multi-agent reinforcement learning: An overview by Buşoniu L, Babuška R, De Schutter B. Innovations in multi-agent systems and applications-1. Springer Berlin Heidelberg, 2010
A comprehensive survey of multi-agent reinforcement learning by Busoniu L, Babuska R, De Schutter B. IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews, 2008
If multi-agent learning is the answer, what is the question? by Shoham Y, Powers R, Grenager T. Artificial Intelligence, 2007.
From single-agent to multi-agent reinforcement learning: Foundational concepts and methods by Neto G. Learning theory course, 2005.
Evolutionary game theory and multi-agent reinforcement learning by Tuyls K, Nowé A. The Knowledge Engineering Review, 2005.
An Overview of Cooperative and Competitive Multiagent Learning by Pieter Jan ’t HoenKarl TuylsLiviu PanaitSean LukeJ. A. La Poutré. AAMAS's workshop LAMAS, 2005.
Cooperative multi-agent learning: the state of the art by Liviu Panait and Sean Luke, 2005.

10.3 Framework papers

Mean Field Multi-Agent Reinforcement Learning by Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. ICML 2018.
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments by Lowe R, Wu Y, Tamar A, et al. arXiv, 2017.
Deep Decentralized Multi-task Multi-Agent RL under Partial Observability by Omidshafiei S, Pazis J, Amato C, et al. arXiv, 2017.
Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games by Peng P, Yuan Q, Wen Y, et al. arXiv, 2017.
Robust Adversarial Reinforcement Learning by Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta. arXiv, 2017.
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning by Foerster J, Nardelli N, Farquhar G, et al. arXiv, 2017.
Multiagent reinforcement learning with sparse interactions by negotiation and knowledge transfer by Zhou L, Yang P, Chen C, et al. IEEE transactions on cybernetics, 2016.
Decentralised multi-agent reinforcement learning for dynamic and uncertain environments by Marinescu A, Dusparic I, Taylor A, et al. arXiv, 2014.
CLEANing the reward: counterfactual actions to remove exploratory action noise in multiagent learning by HolmesParker C, Taylor M E, Agogino A, et al. AAMAS, 2014.
Bayesian reinforcement learning for multiagent systems with state uncertainty by Amato C, Oliehoek F A. MSDM Workshop, 2013.
Multiagent learning: Basics, challenges, and prospects by Tuyls, Karl, and Gerhard Weiss. AI Magazine, 2012.
Classes of multiagent q-learning dynamics with epsilon-greedy exploration by Wunder M, Littman M L, Babes M. ICML, 2010.
Conditional random fields for multi-agent reinforcement learning by Zhang X, Aberdeen D, Vishwanathan S V N. ICML, 2007.
Multi-agent reinforcement learning using strategies and voting by Partalas, Ioannis, Ioannis Feneris, and Ioannis Vlahavas. ICTAI, 2007.
A reinforcement learning scheme for a partially-observable multi-agent game by Ishii S, Fujita H, Mitsutake M, et al. Machine Learning, 2005.
Asymmetric multiagent reinforcement learning by Könönen V. Web Intelligence and Agent Systems, 2004.
Adaptive policy gradient in multiagent learning by Banerjee B, Peng J. AAMAS, 2003.
Reinforcement learning to play an optimal Nash equilibrium in team Markov games by Wang X, Sandholm T. NIPS, 2002.
Multiagent learning using a variable learning rate by Michael Bowling and Manuela Veloso, 2002.
Value-function reinforcement learning in Markov game by Littman M L. Cognitive Systems Research, 2001.
Hierarchical multi-agent reinforcement learning by Makar, Rajbala, Sridhar Mahadevan, and Mohammad Ghavamzadeh. The fifth international conference on Autonomous agents, 2001.
An analysis of stochastic game theory for multiagent reinforcement learning by Michael Bowling and Manuela Veloso, 2000.

10.4 Joint action learning

AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents by Conitzer V, Sandholm T. Machine Learning, 2007.
Extending Q-Learning to General Adaptive Multi-Agent Systems by Tesauro, Gerald. NIPS, 2003.
Multiagent reinforcement learning: theoretical framework and an algorithm. by Hu, Junling, and Michael P. Wellman. ICML, 1998.
The dynamics of reinforcement learning in cooperative multiagent systems by Claus C, Boutilier C. AAAI, 1998.
Markov games as a framework for multi-agent reinforcement learning by Littman, Michael L. ICML, 1994.

10.5 Cooperation and competition

Emergent complexity through multi-agent competition by Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch, 2018.
Learning with opponent learning awareness by Jakob Foerster, Richard Y. Chen2, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch, 2018.
Multi-agent Reinforcement Learning in Sequential Social Dilemmas by Leibo J Z, Zambaldi V, Lanctot M, et al. arXiv, 2017. [Post]
Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds by Roi Ceren, Prashant Doshi, and Bikramjit Banerjee, pp. 530-538, AAMAS 2016.
Opponent Modeling in Deep Reinforcement Learning by He H, Boyd-Graber J, Kwok K, et al. ICML, 2016.
Multiagent cooperation and competition with deep reinforcement learning by Tampuu A, Matiisen T, Kodelja D, et al. arXiv, 2015.
Emotional multiagent reinforcement learning in social dilemmas by Yu C, Zhang M, Ren F. International Conference on Principles and Practice of Multi-Agent Systems, 2013.
Multi-agent reinforcement learning in common interest and fixed sum stochastic games: An experimental study by Bab, Avraham, and Ronen I. Brafman. Journal of Machine Learning Research, 2008.
Combining policy search with planning in multi-agent cooperation by Ma J, Cameron S. Robot Soccer World Cup, 2008.
Collaborative multiagent reinforcement learning by payoff propagation by Kok J R, Vlassis N. JMLR, 2006.
Learning to cooperate in multi-agent social dilemmas by de Cote E M, Lazaric A, Restelli M. AAMAS, 2006.
Learning to compete, compromise, and cooperate in repeated general-sum games by Crandall J W, Goodrich M A. ICML, 2005.
Sparse cooperative Q-learning by Kok J R, Vlassis N. ICML, 2004.

10.6 Coordination

Coordinated Multi-Agent Imitation Learning by Le H M, Yue Y, Carr P. arXiv, 2017.
Reinforcement social learning of coordination in networked cooperative multiagent systems by Hao J, Huang D, Cai Y, et al. AAAI Workshop, 2014.
Coordinating multi-agent reinforcement learning with limited communication by Zhang, Chongjie, and Victor Lesser. AAMAS, 2013.
Coordination guided reinforcement learning by Lau Q P, Lee M L, Hsu W. AAMAS, 2012.
Coordination in multiagent reinforcement learning: a Bayesian approach by Chalkiadakis G, Boutilier C. AAMAS, 2003.
Coordinated reinforcement learning by Guestrin C, Lagoudakis M, Parr R. ICML, 2002.
Reinforcement learning of coordination in cooperative multi-agent systems by Kapetanakis S, Kudenko D. AAAI/IAAI, 2002.

10.7 Security

Markov Security Games: Learning in Spatial Security Problems by Klima R, Tuyls K, Oliehoek F. The Learning, Inference and Control of Multi-Agent Systems at NIPS, 2016.
Cooperative Capture by Multi-Agent using Reinforcement Learning, Application for Security Patrol Systems by Yasuyuki S, Hirofumi O, Tadashi M, et al. Control Conference (ASCC), 2015
Improving learning and adaptation in security games by exploiting information asymmetry by He X, Dai H, Ning P. INFOCOM, 2015.

10.8 Self-Play

A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning by Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel. NIPS 2017.
Deep reinforcement learning from self-play in imperfect-information games by Heinrich, Johannes, and David Silver. arXiv, 2016.
Fictitious Self-Play in Extensive-Form Games by Heinrich, Johannes, Marc Lanctot, and David Silver. ICML, 2015.

10.9 Learning To Communicate

Emergent Communication through Negotiation by Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark, 2018.
Emergence of Linguistic Communication From Referential Games with Symbolic and Pixel Input by Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark
EMERGENCE OF LANGUAGE WITH MULTI-AGENT GAMES: LEARNING TO COMMUNICATE WITH SEQUENCES OF SYMBOLS by Serhii Havrylov, Ivan Titov. ICLR Workshop, 2017.
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning by Abhishek Das, Satwik Kottur, et al. arXiv, 2017.
Emergence of Grounded Compositional Language in Multi-Agent Populations by Igor Mordatch, Pieter Abbeel. arXiv, 2017. [Post]
Cooperation and communication in multiagent deep reinforcement learning by Hausknecht M J. 2017.
Multi-agent cooperation and the emergence of (natural) language by Lazaridou A, Peysakhovich A, Baroni M. arXiv, 2016.
Learning to communicate to solve riddles with deep distributed recurrent q-networks by Foerster J N, Assael Y M, de Freitas N, et al. arXiv, 2016.
Learning to communicate with deep multi-agent reinforcement learning by Foerster J, Assael Y M, de Freitas N, et al. NIPS, 2016.
Learning multiagent communication with backpropagation by Sukhbaatar S, Fergus R. NIPS, 2016.
Efficient distributed reinforcement learning through agreement by Varshavskaya P, Kaelbling L P, Rus D. Distributed Autonomous Robotic Systems, 2009.

10.10 Transfer Learning

Simultaneously Learning and Advising in Multiagent Reinforcement Learning by Silva, Felipe Leno da; Glatt, Ruben; and Costa, Anna Helena Reali. AAMAS, 2017.
Accelerating Multiagent Reinforcement Learning through Transfer Learning by Silva, Felipe Leno da; and Costa, Anna Helena Reali. AAAI, 2017.
Accelerating multi-agent reinforcement learning with dynamic co-learning by Garant D, da Silva B C, Lesser V, et al. Technical report, 2015
Transfer learning in multi-agent systems through parallel transfer by Taylor, Adam, et al. ICML, 2013.
Transfer learning in multi-agent reinforcement learning domains by Boutsioukis, Georgios, Ioannis Partalas, and Ioannis Vlahavas. European Workshop on Reinforcement Learning, 2011.
Transfer Learning for Multi-agent Coordination by Vrancx, Peter, Yann-Michaël De Hauwere, and Ann Nowé. ICAART, 2011.

10.11 Imitation and Inverse Reinforcement Learning

Multi-Agent Adversarial Inverse Reinforcement Learning by Lantao Yu, Jiaming Song, Stefano Ermon. ICML 2019.
Multi-Agent Generative Adversarial Imitation Learning by Jiaming Song, Hongyu Ren, Dorsa Sadigh, Stefano Ermon. NeurIPS 2018.
Cooperative inverse reinforcement learning by Hadfield-Menell D, Russell S J, Abbeel P, et al. NIPS, 2016.
Comparison of Multi-agent and Single-agent Inverse Learning on a Simulated Soccer Example by Lin X, Beling P A, Cogill R. arXiv, 2014.
Multi-agent inverse reinforcement learning for zero-sum games by Lin X, Beling P A, Cogill R. arXiv, 2014.
Multi-robot inverse reinforcement learning under occlusion with interactions by Bogert K, Doshi P. AAMAS, 2014.
Multi-agent inverse reinforcement learning by Natarajan S, Kunapuli G, Judah K, et al. ICMLA, 2010.

10.12 Meta Learning

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments by l-Shedivat, M. 2018.

10.13 Application

MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence by Zheng L et al. NIPS 2017 & AAAI 2018 Demo. (Github Page)
Collaborative Deep Reinforcement Learning for Joint Object Search by Kong X, Xin B, Wang Y, et al. arXiv, 2017.
Multi-Agent Stochastic Simulation of Occupants for Building Simulation by Chapman J, Siebers P, Darren R. Building Simulation, 2017.
Extending No-MASS: Multi-Agent Stochastic Simulation for Demand Response of residential appliances by Sancho-Tomás A, Chapman J, Sumner M, Darren R. Building Simulation, 2017.
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving by Shalev-Shwartz S, Shammah S, Shashua A. arXiv, 2016.
Applying multi-agent reinforcement learning to watershed management by Mason, Karl, et al. Proceedings of the Adaptive and Learning Agents workshop at AAMAS, 2016.
Crowd Simulation Via Multi-Agent Reinforcement Learning by Torrey L. AAAI, 2010.
Traffic light control by multiagent reinforcement learning systems by Bakker, Bram, et al. Interactive Collaborative Information Systems, 2010.
Multiagent reinforcement learning for urban traffic control using coordination graphs by Kuyer, Lior, et al. oint European Conference on Machine Learning and Knowledge Discovery in Databases, 2008.
A multi-agent Q-learning framework for optimizing stock trading systems by Lee J W, Jangmin O. DEXA, 2002.
Multi-agent reinforcement learning for traffic light control by Wiering, Marco. ICML. 2000.

#11. Paper-Resources
2020-01
updating AAAI-2020 conference(detail in /ConferencePaper/AAAI/2020)

2019-07

Jun

April-May

March 2019

Feb 2019

Jan 2019

2018

Accelerated Methods for Deep Reinforcement Learning. arxiv
A Deep Reinforcement Learning Chatbot (Short Version). arxiv
AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search. arxiv :star:
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. arxiv
Composable Deep Reinforcement Learning for Robotic Manipulation. arxiv
Cooperative Multi-Agent Reinforcement Learning for Low-Level Wireless Communication. arxiv
Deep Reinforcement Fuzzing. arxiv
Deep Reinforcement Learning of Cell Movement in the Early Stage of C. elegans Embryogenesis. arxiv
Deep Reinforcement Learning For Sequence to Sequence Models. arxiv code
Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods. arxiv
Deep Reinforcement Learning in Portfolio Management. arxiv code
Deep Reinforcement Learning using Capsules in Advanced Game Environments. arxiv
Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft. arxiv
Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes. arxiv code
Diversity is All You Need: Learning Skills without a Reward Function. arxiv
Faster Deep Q-learning using Neural Episodic Control. arxiv
Feedback-Based Tree Search for Reinforcement Learning. arxiv
Feudal Reinforcement Learning for Dialogue Management in Large Domains. arxiv
Forward-Backward Reinforcement Learning. arxiv
Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies. arxiv
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arxiv
Kickstarting Deep Reinforcement Learning. arxiv
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. arxiv
Meta Reinforcement Learning with Latent Variable Gaussian Processes. arxiv
Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches. arxiv
Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations. arxiv
Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents. arxiv
Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. arxiv
Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review. arxiv
Reinforcement Learning from Imperfect Demonstrations. arxiv
Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. arxiv
RUDDER: Return Decomposition for Delayed Rewards. arxiv code
Semi-parametric Topological Memory for Navigation. arxiv tensorflow
Shared Autonomy via Deep Reinforcement Learning. arxiv
Setting up a Reinforcement Learning Task with a Real-World Robot. arxiv
Simple random search provides a competitive approach to reinforcement learning. arxiv code
Unsupervised Meta-Learning for Reinforcement Learning. arxiv
Using reinforcement learning to learn how to play text-based games. arxiv

2017

A Deep Reinforcement Learning Chatbot. arxiv
A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem. arxiv code
A Deep Reinforced Model for Abstractive Summarization. arxiv
A Distributional Perspective on Reinforcement Learning. arxiv
A Laplacian Framework for Option Discovery in Reinforcement Learning. arxiv :star:
Boosting the Actor with Dual Critic. arxiv
Bridging the Gap Between Value and Policy Based Reinforcement Learning. arxiv
Car Racing using Reinforcement Learning. pdf
Cold-Start Reinforcement Learning with Softmax Policy Gradients. arxiv
Curiosity-driven Exploration by Self-supervised Prediction. arxiv tensorflow
Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning. arxiv code
DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning. arxiv code
Deep Reinforcement Learning: An Overview. arxiv
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward. arxiv code
Deep reinforcement learning from human preferences. arxiv
Deep Reinforcement Learning that Matters. arxiv code
Device Placement Optimization with Reinforcement Learning. arxiv
Distributional Reinforcement Learning with Quantile Regression. arxiv
End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning. arxiv
Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arxiv
Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning. arxiv
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. arxiv
Learning how to Active Learn: A Deep Reinforcement Learning Approach. arxiv tensorflow
Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning. arxiv tensorflow
MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence. arxiv code :star:
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arxiv
Micro-Objective Learning : Accelerating Deep Reinforcement Learning through the Discovery of Continuous Subgoals. arxiv
Neural Architecture Search with Reinforcement Learning. arxiv tensorflow
Neural Map: Structured Memory for Deep Reinforcement Learning. arxiv
Observational Learning by Reinforcement Learning. arxiv
Overcoming Exploration in Reinforcement Learning with Demonstrations. arxiv
Practical Network Blocks Design with Q-Learning. arxiv
Rainbow: Combining Improvements in Deep Reinforcement Learning. arxiv
Reinforcement Learning for Architecture Search by Network Transformation. arxiv code
Reinforcement Learning via Recurrent Convolutional Neural Networks. arxiv code
Reinforcement Learning with a Corrupted Reward Channel. arxiv :star:
Reinforcement Learning with Deep Energy-Based Policies. arxiv code
Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads. arxiv
Robust Deep Reinforcement Learning with Adversarial Attacks. arxiv
Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arxiv
Shallow Updates for Deep Reinforcement Learning. arxiv code
Stochastic Neural Networks for Hierarchical Reinforcement Learning. pdf code
Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing. arxiv code
Task-Oriented Query Reformulation with Reinforcement Learning. arxiv code
Teaching a Machine to Read Maps with Deep Reinforcement Learning. arxiv code
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning. arxiv code
Value Prediction Network. arxiv
Variational Deep Q Network. arxiv
Virtual-to-real Deep Reinforcement Learning: Continuous Control of Mobile Robots for Mapless Navigation.arxiv
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning. arxiv

2016

Asynchronous Methods for Deep Reinforcement Learning. [arxiv] :star:
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning, E. Parisotto, et al., ICLR. [arxiv]
A New Softmax Operator for Reinforcement Learning.[url]
Benchmarking Deep Reinforcement Learning for Continuous Control, Y. Duan et al., ICML. [arxiv]
Better Computer Go Player with Neural Network and Long-term Prediction, Y. Tian et al., ICLR. [arxiv]
Deep Reinforcement Learning in Parameterized Action Space, M. Hausknecht et al., ICLR. [arxiv]
Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks, R. Houthooft et al., arXiv. [url]
Control of Memory, Active Perception, and Action in Minecraft, J. Oh et al., ICML. [arxiv]
Continuous Deep Q-Learning with Model-based Acceleration, S. Gu et al., ICML. [arxiv]
Continuous control with deep reinforcement learning. [arxiv] :star:
Deep Successor Reinforcement Learning. [arxiv]
Dynamic Frame skip Deep Q Network, A. S. Lakshminarayanan et al., IJCAI Deep RL Workshop. [arxiv]
Deep Exploration via Bootstrapped DQN. [arxiv] :star:
Deep Reinforcement Learning for Dialogue Generation. [arxiv] tensorflow
Deep Reinforcement Learning in Parameterized Action Space. [arxiv] :star:
Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments.[url]
Designing Neural Network Architectures using Reinforcement Learning. arxiv code
Dialogue manager domain adaptation using Gaussian process reinforcement learning. [arxiv]
End-to-End Reinforcement Learning of Dialogue Agents for Information Access. [arxiv]
Generating Text with Deep Reinforcement Learning. [arxiv]
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, C. Finn et al., arXiv. [arxiv]
Hierarchical Reinforcement Learning using Spatio-Temporal Abstractions and Deep Neural Networks, R. Krishnamurthy et al., arXiv. [arxiv]
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation, T. D. Kulkarni et al., arXiv. [arxiv]
Hierarchical Object Detection with Deep Reinforcement Learning. [arxiv]
High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al., ICLR. [arxiv]
Increasing the Action Gap: New Operators for Reinforcement Learning, M. G. Bellemare et al., AAAI. [arxiv]
Interactive Spoken Content Retrieval by Deep Reinforcement Learning. [arxiv]
Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, S. Levine et al., arXiv. [url]
Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks, J. N. Foerster et al., arXiv. [url]
Learning to compose words into sentences with reinforcement learning. [url]
Loss is its own Reward: Self-Supervision for Reinforcement Learning.[arxiv]
Model-Free Episodic Control. [arxiv]
Mastering the game of Go with deep neural networks and tree search. [nature] :star:
MazeBase: A Sandbox for Learning from Games .[arxiv]
Neural Architecture Search with Reinforcement Learning. [pdf]
Neural Combinatorial Optimization with Reinforcement Learning. [arxiv]
Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning. [url]
Online Sequence-to-Sequence Active Learning for Open-Domain Dialogue Generation. arXiv. [arxiv]
Policy Distillation, A. A. Rusu et at., ICLR. [arxiv]
Prioritized Experience Replay. [arxiv] :star:
Reinforcement Learning Using Quantum Boltzmann Machines. [arxiv]
Safe and Efficient Off-Policy Reinforcement Learning, R. Munos et al.[arxiv]
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. [arxiv]
Sample-efficient Deep Reinforcement Learning for Dialog Control. [url]
Self-Correcting Models for Model-Based Reinforcement Learning.[url]
Unifying Count-Based Exploration and Intrinsic Motivation. [arxiv]
Value Iteration Networks. [arxiv]

2015

ADAAPT: A Deep Architecture for Adaptive Policy Transfer from Multiple Sources. arxiv
Action-Conditional Video Prediction using Deep Networks in Atari Games. arxiv :star:
Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. arxiv :star:
[DDPG] Continuous control with deep reinforcement learning. arxiv :star:
[NAF] Continuous Deep Q-Learning with Model-based Acceleration. arxiv :star:
Dueling Network Architectures for Deep Reinforcement Learning. arxiv :star:
Deep Reinforcement Learning with an Action Space Defined by Natural Language.arxiv
Deep Reinforcement Learning with Double Q-learning. arxiv :star:
Deep Recurrent Q-Learning for Partially Observable MDPs. arxiv :star:
DeepMPC: Learning Deep Latent Features for Model Predictive Control. pdf
Deterministic Policy Gradient Algorithms. pdf :star:
Dueling Network Architectures for Deep Reinforcement Learning. arxiv
End-to-End Training of Deep Visuomotor Policies. arxiv :star:
Giraffe: Using Deep Reinforcement Learning to Play Chess. arxiv
Generating Text with Deep Reinforcement Learning. arxiv
How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies. arxiv
Human-level control through deep reinforcement learning. nature :star:
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models. arxiv :star:
Learning Simple Algorithms from Examples. arxiv
Language Understanding for Text-based Games Using Deep Reinforcement Learning. pdf :star:
Learning Continuous Control Policies by Stochastic Value Gradients.pdf :star:
Multiagent Cooperation and Competition with Deep Reinforcement Learning. arxiv
Maximum Entropy Deep Inverse Reinforcement Learning. arxiv
Massively Parallel Methods for Deep Reinforcement Learning. pdf] :star:
On Learning to Think- Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. arxiv
Playing Atari with Deep Reinforcement Learning. arxiv
Recurrent Reinforcement Learning: A Hybrid Approach. arxiv
Strategic Dialogue Management via Deep Reinforcement Learning. arxiv
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control. arxiv
Trust Region Policy Optimization. pdf :star:
Universal Value Function Approximators. pdf
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning. arxiv

2014

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.[url]

2013

Evolving large-scale neural networks for vision-based reinforcement learning. [idsia] :star:
Playing Atari with Deep Reinforcement Learning. [toronto] :star:

#12. Contributors

Special thanks to the following people for their unselfish contribution to this work

Cite

Based on the above information, we have made a comprehensive summary of the deep reinforcement of learning materials, and we would like to express our heartfelt thanks to them.

ruoqi

RLer 很棒的资源！但是目前的排版比较乱，建议重新排版。link
也是失效的

Tzy2020

ruoqi 查看实验室github啊，上面有链接的

https://github.com/NeuronDance/DeepRL/tree/master/A-Guide-Resource-For-DeepRL

RLer

实验室发布的这张算法图挺有用，推荐: 一图读懂65个深度强化学习算法

Jevon

谢谢！！！

Janayt

太棒了！！

miku336

很全面，感谢分享

Document