Minmax fuzzy deterministic policy gradient for zero-sum differential game: Take pursuit-evasion problem as example

Liao, Wei; Wei, Xiaohui; Lai, Jizhou

doi:10.3233/JIFS-210032

Minmax fuzzy deterministic policy gradient for zero-sum differential game: Take pursuit-evasion problem as example

Article type: Research Article

Authors: Liao, Wei^{a; b} | Wei, Xiaohui^{a; b; *} | Lai, Jizhou^c

Affiliations: [a] Key Laboratory of Fundamental Science for National Defense-Advanced Design Technology of Flight Vehicle, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China | [b] State Key Laboratory of Mechanics and Control of Mechanical Structures, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China | [c] College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China

Correspondence: [*] Corresponding author. Xiaohui Wei. Tel.: +86 18168069513. E-mail: wei_xiaohui@nuaa.edu.cn.

Abstract: A novel actor-critic algorithm is introduced and applied to zero-sum differential game. The proposed novel structure consists of two actors and a critic. Different actors represent the control policies of different players, and the critic is used to approximate the state-action utility function. Instead of neural network, the fuzzy inference system is applied as approximators for the actors and critic so that the specific practical meaning can be represented by the linguistic fuzzy rules. Since the goals of the players in the game are completely opposite, the actors for different players are simultaneously updated in opposite directions during the training. One actor is updated updated toward the direction that can minimize the Q value while the other updated toward the direction that can maximize the Q value. A pursuit-evasion problem with two pursuers and one evader is taken as an example to illustrate the validity of our method. In this problem, the two pursuers the same actor and the symmetry in the problem is used to improve the replay buffer. At the end of this paper, some confrontations between the policies with different training episodes are conducted.

Keywords: Fuzzy inference system, differential game, reinforcement learning, pursuit-evasion problem, deterministic policy gradient

DOI: 10.3233/JIFS-210032

Journal: Journal of Intelligent & Fuzzy Systems, vol. 41, no. 1, pp. 1069-1082, 2021

Published: 11 August 2021

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn

For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl

如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl

Share this:

North America

Europe

Asia