参考
[1] Deep Reinforcement Learning Doesn’t Work Yet, Alex Irpan, 2018
[2] Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control, Islam et al, 2017
[3] Deep Reinforcement Learning that Matters, Henderson et al, 2017
[4] Lessons Learned Reproducing a Deep Reinforcement Learning Paper, Matthew Rahtz, 2018
[5] UCL Course on RL
[6] Berkeley Deep RL Course
[7] Deep RL Bootcamp
[8] Nuts and Bolts of Deep RL, John Schulman
[9] Stanford Deep Learning Tutorial: Multi-Layer Neural Network
[10] The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Karpathy, 2015
[11] LSTM: A Search Space Odyssey, Greff et al, 2015
[12] Understanding LSTM Networks, Chris Olah, 2015
[13] Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, Chung et al, 2014 (GRU paper)
[14] Conv Nets: A Modular Perspective, Chris Olah, 2014
[15] Stanford CS231n, Convolutional Neural Networks for Visual Recognition
[16] Deep Residual Learning for Image Recognition, He et al, 2015 (ResNets)
[17] Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et al, 2014 (Attention mechanisms)
[18] Attention Is All You Need, Vaswani et al, 2017
[19] A Simple Weight Decay Can Improve Generalization, Krogh and Hertz, 1992
[20] Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Srivastava et al, 2014
[21] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, Ioffe and Szegedy, 2015
[22] Layer Normalization, Ba et al, 2016
[23] Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks, Salimans and Kingma, 2016
[24] Stanford Deep Learning Tutorial: Stochastic Gradient Descent
[25] Adam: A Method for Stochastic Optimization, Kingma and Ba, 2014
[26] An overview of gradient descent optimization algorithms, Sebastian Ruder, 2016
[27] Auto-Encoding Variational Bayes, Kingma and Welling, 2013 (Reparameterization trick)
[28] Tensorflow
[29] PyTorch
[30] Spinning Up