正在加载...
请使用更现代的浏览器并启用 JavaScript 以获得最佳浏览体验。
加载论坛时出错,请强制刷新页面重试。
【ICML-2019】强化学习在现实世界中落地应用的9大挑战汇总
RLer
Training off-line from the fixed logs of an external behavior policy.
Learning on the real system from limited samples.
High-dimensional continuous state and action spaces.
Safety constraints that should never or at least rarely be violated.
Tasks that may be partially observable, alternatively viewed as non-stationary or stochastic.
Reward functions that are unspecified, multi-objective, or risk-sensitive.
System operators who desire explainable policies and actions.
Inference that must happen in real-time at the control frequency of the system.
Large and/or unknown delays in the system actuators, sensors, or rewards.
Arxiv地址:
https://arxiv.org/pdf/1904.12901.pdf
NanNan
RLer
华为诺亚方舟决策推理实验室主任
郝建业
进行主题分享《深度强化学习的挑战及落地》
Document