• Sorted by Date • Classified by Publication Type •
Harm van Seijen, Hado van Hasselt, Shimon Whiteson, and Marco Wiering. A Theoretical and Empirical Analysis of Expected Sarsa. In ADPRL 2009: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 177–184, March 2009.
This paper presents a theoretical and empirical analysis of Expected Sarsa, a variation on Sarsa, the classic on-policy temporal-difference method for model-free reinforcement learning. Expected Sarsa exploits knowledge about stochasticity in the behavior policy to perform updates with lower variance. Doing so allows for higher learning rates and thus faster learning. In deterministic environments, Expected Sarsa's updates have zero variance, enabling a learning rate of 1. We prove that Expected Sarsa converges under the same conditions as Sarsa and formulate specific hypotheses about when Expected Sarsa will outperform Sarsa and Q-learning. Experiments in multiple domains confirm these hypotheses and demonstrate that Expected Sarsa has significant advantages over these more commonly used methods.
@InProceedings{vanseijen:adprl09,
author = "Harm van Seijen and Hado van Hasselt and Shimon Whiteson and Marco Wiering",
title = "A Theoretical and Empirical Analysis of Expected Sarsa",
booktitle = "ADPRL 2009: Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning",
pages = "177-184",
month = "March",
year = 2009,
}
Generated by bib2html.pl (written by Patrick Riley ) on Wed Mar 07, 2012 15:56:00