AKF-SR: Adaptive Kalman filtering-based successor representation

1University of Toronto, 2Concordia University
Neurocomputing 2022
MY ALT TEXT

Block diagram of the proposed AKF-SR framework.


Abstract

Recent studies suggest that Successor Representation (SR)-based models provide adaptation to changes in the goal locations or reward function faster than model-free algorithms, together with lower computational cost compared to that of model-based algorithms. However, it is not known how such representation might help agents to manage uncertainty in their decision making. Existing methods for the SR learning based on standard temporal difference methods do not capture uncertainty about the estimated SR.

In order to address this issue, the paper presents AKF-SR. First, Kalman temporal difference approach, which is a combination of Kalman filter and the temporal difference method, is used within the AKF-SR framework to cast the SR learning procedure into a filtering problem to benefit from uncertainty estimation of the SR.

An adaptive Kalman filtering approach is then applied within the proposed AKF-SR framework in order to tune the measurement noise covariance and measurement mapping function of Kalman filter.

Moreover, an active learning method that exploits the estimated uncertainty of the SR to form the behaviour policy leading to more visits to less certain values is proposed to improve the overall performance of an agent in terms of received rewards while interacting with its environment.

Experimental results based on three reinforcement learning environments illustrate the efficacy of the proposed AKF-SR framework over state-of-the-art frameworks in terms of cumulative reward and speed of convergence to changes in the reward function.


Comparison with other RL methods

The proposed AKF-SR framework is compared to the state of the art RL algorithms: Deep Q-Network [37], Substochastic Successor Representation (SSR) framework [36], and Universal Successor Representations (USR) [2].

Inverted Pendulum
Autonomous Car
Lunar Lander

Moreover, Fig. 4 demonstrates the key advantage of AKF-SR, rapid adaptability to changes in the reward function.

MY ALT TEXT
Fig. 4 – Value function after change of the reward’s value at the upright position of the Inverted Pendulum.


BibTeX


        @article{malekzadeh2022akf,
        title={AKF-SR: Adaptive Kalman filtering-based successor representation},
        author={Malekzadeh, Parvin and Salimibeni, Mohammad and Hou, Ming and Mohammadi, Arash and Plataniotis, Konstantinos N},
        journal={Neurocomputing},
        volume={467},
        pages={476--490},
        year={2022},
        publisher={Elsevier}
        }