next up previous
Next: INTRODUCTION

Fast and Efficient Reinforcement Learning
with Truncated Temporal Differences

Pawetex2html_wrap1314 Cichosz
Institute of Electronics Fundamentals
Warsaw University of Technology
Nowowiejska 15/19, 00-665 Warsaw, Poland
cichosz@ipe.pw.edu.pl

Jan J. Mulawka
Institute of Electronics Fundamentals
Warsaw University of Technology
Nowowiejska 15/19, 00-665 Warsaw, Poland
jml@ipe.pw.edu.pl

Abstract:

The problem of temporal credit assignment in reinforcement learning is typically solved using algorithms based on the methods of temporal differences TD(tex2html_wrap_inline1294). Of those, Q-learning is currently best understood and most widely used. Using TD-based algorithms with tex2html_wrap_inline1296 often allows one to speed up the propagation of credit significantly, but it involves certain implementational problems. The traditional implementation of TD(tex2html_wrap_inline1296) based on eligibility traces suffers from lack of generality and computational inefficiency. The TTD (Truncated Temporal Differences) procedure is a simple TD(tex2html_wrap_inline1294) approximation technique that appears to overcome these drawbacks of eligibility traces. The paper outlines this technique, discusses its computational efficiency advantages, and presents experimental studies with the combination of TTD and Q-learning in deterministic and stochastic environments. These experiments show that TTD makes it possible to obtain a significant learning speedup without reducing reliability at essentially the same computational cost as usual TD(0) learning. We conclude that the TTD procedure is probably the most promising way of using TD methods for reinforcement learning, especially for tasks with large state spaces and a hard temporal credit assignment problem.





Pawel Cichosz
Fri Oct 10 11:41:28 CEST 1997