Shimon Whiteson's Publications

Sorted by DateClassified by Publication Type

V-MAX: Tempered Optimism for Better PAC Reinforcement Learning

Karun Rao and Shimon Whiteson. V-MAX: Tempered Optimism for Better PAC Reinforcement Learning. In AAMAS 2012: Proceedings of the Eleventh International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 375–382, June 2012.

Download

[PDF]414.1kB  

Abstract

Recent advances in reinforcement learning have yielded several PAC-MDP algorithms that, using the principle of optimism in the face of uncertainty, are guaranteed to act near-optimally with high probability on all but a polynomial number of samples. Unfortunately, many of these algorithms, such as R-MAX, perform poorly in practice because their initial exploration in each state, before the associated model parameters have been learned with confidence, is random. Others, such as Model-Based Interval Estimation (MBIE) have weaker sample complexity bounds and require careful parameter tuning. This paper proposes a new PAC-MDP algorithm called V-MAX designed to address these problems. By restricting its optimism to future visits, V-MAX can exploit its experience early in learning and thus obtain more cumulative reward than R-MAX. Furthermore, doing so does not compromise the quality of exploration, as we prove bounds on the sample complexity of V-MAX that are identical to those of R-MAX. Finally, we present empirical results in two domains demonstrating that V-MAX can substantially outperform R-MAX and match or outperform MBIE while being easier to tune, as its performance is invariant to conservative choices of its primary parameter.

BibTeX Entry

@InProceedings{rao:aamas12,
  author        = "Karun Rao and Shimon Whiteson",
  title         = "{V-MAX}: Tempered Optimism for Better {PAC} Reinforcement Learning",
  booktitle     = "AAMAS 2012: Proceedings of the Eleventh International
                  Joint Conference on Autonomous Agents and
                  Multi-Agent Systems",
   pages = "375-382",                     
  month = "June",    
  year          = 2012,
}

Generated by bib2html.pl (written by Patrick Riley ) on Thu May 16, 2013 09:59:45