next up previous
Next: Introduction

Truncated Temporal Differences with Function Approximation: Successful Examples Using CMAC

Pawetex2html_wrap777 Cichosz
Institute of Electronics Fundamentals
Warsaw University of Technology
Nowowiejska 15/19, 00-665 Warsaw, Poland
cichosz@ipe.pw.edu.pl
http://www.ipe.pw.edu.pl/~cichosz

Abstract:

Combining reinforcement learning algorithms with function approximators in order to generalize over the state space has recently received particular interest and is widely believed to be one of the crucial issues for scaling reinforcement learning to practically interesting domains. This paper examines the combination of the TTD procedure, a computationally efficient approximate implementation of TD(tex2html_wrap_inline759) methods, with CMAC, a function approximator especially suitable for reinforcement learning due to its computational efficiency and on-line learning capability. Most of previous studies have investigated the combination of CMAC with either TD(0)-based algorithms, which usually learn much slower than for tex2html_wrap_inline763, or with the traditional implementation of TD(tex2html_wrap_inline759) based on eligibility traces, associated with high computational costs. This study, by combining CMAC with TTD, attempts to reconcile fast learning with computational efficiency and generalization capabilities. The presented experimental results show the successful performance of the Q-learning algorithm implemented using the TTD procedure and CMAC in two tasks with continuous state spaces.





Pawel Cichosz
Fri Oct 10 11:22:41 CEST 1997