Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Liao, Weia; b | Wei, Xiaohuia; b; * | Lai, Jizhouc
Affiliations: [a] Key Laboratory of Fundamental Science for National Defense-Advanced Design Technology of Flight Vehicle, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China | [b] State Key Laboratory of Mechanics and Control of Mechanical Structures, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China | [c] College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, China
Correspondence: [*] Corresponding author. Xiaohui Wei. Tel.: +86 18168069513. E-mail: wei_xiaohui@nuaa.edu.cn.
Abstract: A novel actor-critic algorithm is introduced and applied to zero-sum differential game. The proposed novel structure consists of two actors and a critic. Different actors represent the control policies of different players, and the critic is used to approximate the state-action utility function. Instead of neural network, the fuzzy inference system is applied as approximators for the actors and critic so that the specific practical meaning can be represented by the linguistic fuzzy rules. Since the goals of the players in the game are completely opposite, the actors for different players are simultaneously updated in opposite directions during the training. One actor is updated updated toward the direction that can minimize the Q value while the other updated toward the direction that can maximize the Q value. A pursuit-evasion problem with two pursuers and one evader is taken as an example to illustrate the validity of our method. In this problem, the two pursuers the same actor and the symmetry in the problem is used to improve the replay buffer. At the end of this paper, some confrontations between the policies with different training episodes are conducted.
Keywords: Fuzzy inference system, differential game, reinforcement learning, pursuit-evasion problem, deterministic policy gradient
DOI: 10.3233/JIFS-210032
Journal: Journal of Intelligent & Fuzzy Systems, vol. 41, no. 1, pp. 1069-1082, 2021
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl