Bayesian reinforcement learning in markovian and non-markovian tasks

A. Ez-Zizi, Simon Farrell, D. Leslie

Research output: Chapter in Book/Conference paperConference paperpeer-review

2 Citations (Scopus)

Abstract

© 2015 IEEE. We present a Bayesian reinforcement learning model with a working memory module which can solve some non-Markovian decision processes. The model is tested, and compared against SARSA (lambda), on a standard working-memory task from the psychology literature. Our method uses the Kalman temporal difference framework, And its extension to stochastic state transitions, to give posterior distributions over state-Action values. This framework provides a natural mechanism for using reward information to update more than the current state-Action pair, and thus negates the use of eligibility traces. Furthermore, the existence of full posterior distributions allows the use of Thompson sampling for action selection, which in turn removes the need to choose an appropriately parameterised action-selection method.
Original languageEnglish
Title of host publication2015 IEEE Symposium Series on Computational Intelligence
EditorsJacek Zurada, Marco Dorigo
PublisherIEEE, Institute of Electrical and Electronics Engineers
Pages579-586
Number of pages8
ISBN (Print)9781479975600
DOIs
Publication statusPublished - 2015
EventIEEE Symposium Series on Computational Intelligence, SSCI 2015 - Cape Town, South Africa
Duration: 8 Dec 201510 Dec 2015

Conference

ConferenceIEEE Symposium Series on Computational Intelligence, SSCI 2015
Country/TerritorySouth Africa
CityCape Town
Period8/12/1510/12/15

Fingerprint

Dive into the research topics of 'Bayesian reinforcement learning in markovian and non-markovian tasks'. Together they form a unique fingerprint.

Cite this