Abstract: Markov decision processes (MDPs) is a minimalist framework that is designed to capture the most important aspects of decision making under uncertainty, a problem of major practical interest and thoroughly studies in reinforcement learning. The unfortunate price of the minimalist approach is that MDPs lack structure and as such planning and learning in MDPs with combinatorial-sized state and action spaces is strongly intractable: Bellman's curse of dimensionality is here to stay in the worst-case. However, apparently already Bellman and his co-workers realized as early as in the 1960s that for many problem of practical interest, the optimal value function of an MDP is well approximated by just using a few basis functions that are standardly used in numerical calculations. As knowing the optimal value function is essentially equivalent to knowing how to act optimally, one hopes that there will be some algorithms that can efficiently compute the few approximating coefficients. If this is possible, we can think of the algorithm as computing the value function in a compressed space. However, until recently not much has been known about whether these compressed computations are possible and when. In this talk, I will discuss a few recent results (some positive, some negative) that are concerned with these compressed computations and conclude with some open problems.
Bio: Csaba Szepesvari received his PhD in 1999 from Jozsef Attila University in Szeged, Hungary, and is currently a professor in the Department of Computing Science at the University of Alberta, and a principal investigator at the Alberta Ingenuity Center for Machine Learning. His interests include learning theory, online and interactive learning, and more specifically, reinforcement learning. He has authored two books and approximately 200 peer-reviewed journal and conference papers. Szepesvari serves as the action editor of the Journal of Machine Learning Research and Machine Learning, as well as on various program committees.