论文pdf: https://jmlr.org/papers/volume19/17-131/17-131.pdf
Experience replay is a technique that allows off-policy reinforcement-learning methods to reuse past experiences. The stability and speed of convergence of reinforcement learning, as well as the eventual performance of the learned policy, are strongly dependent on the experiences being replayed. Which experiences are replayed depends on two important choices. The first is which and how many experiences to retain in the experience replay buffer. The second choice is how to sample the experiences that are to be replayed from that buffer. We propose new methods for the combined problem of experience retention and experience sampling. We refer to the combination as experience selection. We focus our investigation specifically on the control of physical systems, such as robots, where exploration is costly. To determine which experiences to keep and which to replay, we investigate different proxies for their immediate and long-term utility. These proxies include age, temporal difference error and the strength of the applied exploration noise. Since no currently available method works in all situations, we propose guidelines for using prior knowledge about the characteristics of the control problem at hand to choose the appropriate experience replay strategy
经验重放是一种允许策略外强化学习方法重用过去经验的技术。强化学习的稳定性和收敛速度,以及学习策略的最终表现,很大程度上取决于重放的经验。重放哪些体验取决于两个重要的选择。首先是在体验回放缓冲区中保留哪些体验以及保留多少体验。第二个选择是如何对要从该缓冲区重放的体验进行采样。我们为经验保留和经验抽样的组合问题提出了新的方法。我们将这种组合称为经验选择。我们将研究重点放在物理系统的控制上,例如机器人,在这些系统中探索成本很高。为了确定保留哪些体验以及重放哪些体验,我们调查了不同的智能体以了解它们的即时和长期效用。这些代理包括时间差异误差和应用探索噪声的强度。由于目前没有可用的方法在所有情况下都适用,因此我们提出了使用关于手头控制问题特征的先验知识来选择适当的经验重放策略的指南