好处是选优势函数具有几乎最小的方差

(来自GAE原文,选形式4或者6)
GAE原文说各种策略梯度的估计方差比较得看这篇:
Greensmith, Evan, Bartlett, Peter L, and Baxter, Jonathan. Variance reduction techniques for gradient estimates in reinforcement learning. The Journal of Machine Learning Research, 5:1471–1530, 2004.