Reinforcement learning (RL) allows an agent to solve sequential decision-making problems by interacting with an environment in a trial-and-error fashion. When these environments are very complex, pure random exploration of possible solutions often fails, or is very sample inefficient, requiring an unreasonable amount of interaction with the environment. Hierarchical reinforcement learning (HRL) utilizes forms of temporal- and state-abstractions in order to tackle these challenges, while simultaneously paving the road for behavior reuse and increased interpretability of RL systems. In this survey paper we first introduce a selection of problem-specific approaches, which provided insight in how to utilize often handcrafted abstractions in specific task settings. We then introduce the Options framework, which provides a more generic approach, allowing abstractions to be discovered and learned semi-automatically. Afterwards we introduce the goal-conditional approach, which allows sub-behaviors to be embedded in a continuous space. In order to further advance the development of HRL agents, capable of simultaneously learning abstractions and how to use them, solely from interaction with complex high dimensional environments, we also identify a set of promising research directions.
强化学习 (RL) 允许智能体通过以试错方式与环境交互来解决顺序决策问题。当这些环境非常复杂时,对可能解决方案的纯随机探索通常会失败,或者样本效率非常低,需要与环境进行不合理的交互。分层强化学习 (HRL) 利用时间和状态抽象的形式来应对这些挑战,同时为行为重用和增强 RL 系统的可解释性铺平道路。在这份调查报告中,我们首先介绍了一系列针对特定问题的方法,这些方法提供了有关如何在特定任务设置中使用经常手工制作的抽象的见解。然后我们介绍了 Options 框架,它提供了一种更通用的方法,允许半自动地发现和学习抽象。之后我们介绍了目标条件方法,它允许将子行为嵌入到一个连续的空间中。为了进一步推进 HRL 代理的开发,能够同时学习抽象以及如何使用它们,仅通过与复杂的高维环境的交互,我们还确定了一组有前途的研究方向。