Distributional RL brought along a lot of excitement in the RL community. However, I did not see many recent papers trying to continue with the development of the method. Is there a reason for this observation, or am I simply wrong? What is being worked on in this direction?
Also, how would you improve the efficiency of a distributional RL algorithm? For example, could adding some noise to the deep NN in a DSAC algorithm promote more exploration?
Are you aware if GTSophy? The team behind the project has settled for a distributional version of SAC that they devolped themselces iirc. They replaced the Q function with multiple qunatile regressors of future reward. The mean of the future reward (ie the standard Q function) can them be estimated from the quantiles.
Edit: The paper is freely available at https://www.researchgate.net/publication/358484368_Outracing_champion_Gran_Turismo_drivers_with_deep_reinforcement_learning.