Affiliations: Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci, 32, Milan, Italy
Correspondence:
[*]
Corresponding author: Alberto Maria Metelli, Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci, 32, Milan, Italy. E-mail: albertomaria.metelli@polimi.it.
Abstract: In this paper, we provide a unified presentation of the Configurable Markov Decision Process (Conf-MDP) framework. A Conf-MDP is an extension of the traditional Markov Decision Process (MDP) that models the possibility to configure some environmental parameters. This configuration activity can be carried out by the learning agent itself or by an external configurator. We introduce a general definition of Conf-MDP, then we particularize it for the cooperative setting, where the configuration is fully functional to the agent’s goals, and non-cooperative setting, in which agent and configurator might have different interests. For both settings, we propose suitable solution concepts. Furthermore, we illustrate how to extend the traditional value functions for MDPs and Bellman operators to this new framework.
Keywords: Reinforcement learning, Markov
Decision Process, configurable Markov Decision Process