Multiobjective deep reinforcement learning approach for ATM cash replenishment planning | Nabil BELGASMI | Banque de Tunisie, Tunisia |

5^th World Machine Learning and Deep Learning Congress

August 30-31, 2018

Machine Learning: Discovering the New Era of Intelligence

Nabil BELGASMI

Banque de Tunisie, Tunisia

Title: Multiobjective deep reinforcement learning approach for ATM cash replenishment planning

Biography

Biography: Nabil BELGASMI

Abstract

The current framework of reinforcement learning is based on a single objective performance optimization, that is maximizing the expected returns based on scalar rewards that come from either univariate environment response to the agent actions or from a weighted aggregation of a multivariate response. But in many real world situations, tradeoffs must be made among multiple conflicting objectives that have different order of magnitude, measurement units and business specific contexts related to the problem being solved (i.e. costs, lead time, quality of service, profits, etc.). The aggregation of such sub-rewards to get a scalar reward assumes a perfect knowledge about the decision maker preferences and the way she perceives the importance of each objective. In this study, we consider the problem of learning the best ATM cash replenishment policies in an uncertain multiobjective context given an arbitrary history of cash withdrawals that may be nonstationary and may contain outliers. We propose a model-free Multiobjective Deep Reinforcement Learning approach that allows us to compete against the human decision maker and to find the best policy per ATM that outperforms the current human policy. The idea is to disaggregate the performance of a replenishment policy to form a vector of objective functions. The performance of the human policy is then a multi-dimensional reference point (Rh). The task of the deep reinforcement learning algorithm is to find a policy that generates a set of performance points which Pareto-dominate the current human reference point (Rh).