• Sorted by Date • Classified by Publication Type •
Karun Rao and Shimon Whiteson. V-MAX: Tempered Optimism for Better PAC Reinforcement Learning. In AAMAS 2012: Proceedings of the Eleventh International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 375–382, June 2012.
Recent advances in reinforcement learning have yielded several PAC-MDP algorithms that, using the principle of optimism in the face of uncertainty, are guaranteed to act near-optimally with high probability on all but a polynomial number of samples. Unfortunately, many of these algorithms, such as R-MAX, perform poorly in practice because their initial exploration in each state, before the associated model parameters have been learned with confidence, is random. Others, such as Model-Based Interval Estimation (MBIE) have weaker sample complexity bounds and require careful parameter tuning. This paper proposes a new PAC-MDP algorithm called V-MAX designed to address these problems. By restricting its optimism to future visits, V-MAX can exploit its experience early in learning and thus obtain more cumulative reward than R-MAX. Furthermore, doing so does not compromise the quality of exploration, as we prove bounds on the sample complexity of V-MAX that are identical to those of R-MAX. Finally, we present empirical results in two domains demonstrating that V-MAX can substantially outperform R-MAX and match or outperform MBIE while being easier to tune, as its performance is invariant to conservative choices of its primary parameter.
@InProceedings{rao:aamas12,
author = "Karun Rao and Shimon Whiteson",
title = "{V-MAX}: Tempered Optimism for Better {PAC} Reinforcement Learning",
booktitle = "AAMAS 2012: Proceedings of the Eleventh International
Joint Conference on Autonomous Agents and
Multi-Agent Systems",
pages = "375-382",
month = "June",
year = 2012,
}
Generated by bib2html.pl (written by Patrick Riley ) on Thu May 16, 2013 09:59:45