原本是pendulum的代码,在跑了自己的环境,智能体需要输出3个连续动作,动作范围都是【-1,1】
打印了网络参数之后,跑了一百多个epoch,发现是在一次次更新之后,网络参数出现Nan值,所以想请教一下大神是什么原因
ValueError: Expected parameter loc (Tensor of shape (1000, 3)) of distribution Normal(loc: torch.Size([1000, 3]), scale: torch.Size([1000, 3])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]], device='cuda:0', grad_fn=<TanhBackward0>)