You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.
Go to headerGo to navigationGo to searchGo to contentsGo to footer
In content section. Select this link to jump to navigation

Descent wins five gold medals at the Computer Olympiad

Descent (Cohen-Solal, 2020; Cohen-Solal and Cazenave, 2021) is a zero knowledge Deep Reinforcement Learning algorithm that has learned to play many games. It won five gold medals at the 2020 Computer Olympiad.

Unlike AlphaZero-like algorithms (Silver et al., 2018), the Descent framework uses a variant of Unbounded Minimax (Korf and Chickering, 1996), instead of Monte Carlo Tree Search, to construct the partial game tree used to determine the best action to play and to collect data for learning. During training, at each move, the best sequences of moves are iteratively extended until terminal states. During evaluations, the safest action is chosen (after that the best sequences of moves are iteratively extended each until a leaf state is reached). Moreover, it also does not use a policy network, only a value network. The actions therefore do not need to be encoded. Unlike the AlphaZero paradigm, with Descent all data generated during the searches to determine the best actions to play is used for learning. As a result, much more data is generated per game, and thus the training is done more quickly and does not require a (massive) parallelization to give good results (contrary to AlphaZero). It can use end-of-game heuristic evaluation to improve its level of play faster, such as game score or game length (in order to win quickly and lose slowly).

Five gold medals were won by our programs based on these algorithms for the following games: Othello 10 × 10, Breakthrough, Surakarta, Amazons, and Clobber. A silver medal was won by our programs at Othello 8 × 8.

The other competitors for each game were:

  • Breakthrough: DaSoJai (author: Wei-Lin Wu and Shun-Shii Lin) and Polygames (Facebook NDHU). Polygames (Cazenave et al., 2020) is a reimplementation of AlphaZero.

  • Amazons: 8QP (Johan de Koning) and SherlockGo (Liang Shuang, Liang Tailin, Wang Jilong, Li Xiaorui, and Zhou Ke).

  • Othello 10 × 10: Polygames (Facebook NDHU) and Persona (Surag Nair, Nai-Yuan Chang, Shun-Shii Lin).

  • Othello 8 × 8: Polygames (Facebook NDHU) and Maverick (Yen-Chi Chen and Shun-Shii Lin).

  • Clobber: Pan.exe (Johan de Koning), Klopper (Johannes Schwagereit), and Calpurnia (Christian Jans). Calpurnia uses an AlphaZero-like approach and an endgame solver based on the Combinatorial Game Theory.

  • Surakarta: CZF_Surakarta (Liang-Fu Liu), FuChou (Jia-Fong Yeh, Yen-Chi Chen, Shun-Shii Lin), and VSSurakarta (Zhang Yunpeng, Li Wei, Zhang Yuxuan, Zhang Pei, and Zhou Ke). CZF_Surakarta is trained based on AlphaZero.

Some details of the matches performed by the programs created by the Descent framework are described in Table 1.

The Descent framework could struggle with connection and alignment games. It notably lost against Polygames at Hex and Havannah. However, Polygames used much more computing power for their training (100 GPU against 1 GPU) and uses very deep networks, so it is possible that this is the cause of this difference.

Table 1

Details of matches played during the first phase et second phase (playoffs) of the 2020 Computer Olympiad on Ludii: Ludii game ID, Ludii version, game name, first player program, second player program, and the winner (in bold: program based on the Descent framework)

IDVersionGameFirst playerSecond playerWinner
3031.1.5Havannah 8PolygamesDoombot-8Polygames
3101.1.5Havannah 8Doombot-8PolygamesPolygames
3331.1.5BreakthroughDaSoJaiR2D2R2D2
3361.1.5BreakthroughR2D2DaSoJaiR2D2
3481.1.5ClobberHerculeKlopperHercule
3501.1.5ClobberKlopperHerculeHercule
3601.1.5ClobberPan.exeHerculePan.exe
3611.1.5ClobberHerculePan.exePan.exe
3911.1.5Othello 8MaverickRéplicateur #8Maverick
3921.1.5Othello 8Réplicateur #8MaverickMaverick
4141.1.5BreakthroughPolygamesR2D2R2D2
4171.1.5BreakthroughR2D2PolygamesR2D2
4301.1.7Othello 8PolygamesRéplicateur #8Réplicateur #8
4351.1.7Othello 8Réplicateur #8PolygamesRéplicateur #8
4361.1.7Othello 10PolygamesRéplicateur #10Réplicateur #10
4391.1.7Othello 10Réplicateur #10PolygamesRéplicateur #10
4431.1.7SurakartaFuChouAthénanAthénan
4451.1.7SurakartaAthénanFuChouAthénan
4471.1.7Othello 10PersonaRéplicateur #10Réplicateur #10
4481.1.7Othello 10Réplicateur #10PersonaRéplicateur #10
4521.1.5Amazons8QPThéséeThésée
4531.1.5AmazonsThésée8QPThésée
4621.1.7SurakartaAthénanVSSurakartaAthénan
5001.1.7SurakartaVSSurakartaAthénanAthénan
4751.1.7SurakartaCZF_SurakartaAthénanCZF_Surakarta
4771.1.7SurakartaAthénanCZF_SurakartaAthénan
4801.1.7SurakartaFuChouAthénandraw
4871.1.7ClobberCalpurniaHerculeHercule
4881.1.7ClobberHerculeCalpurniaHercule
5071.1.7AmazonsSherlockGoThéséeThésée
5081.1.7AmazonsThéséeSherlockGoThésée
5171.1.8Hex 13Ultron-13PolygamesPolygames
5291.1.8Hex 13PolygamesUltron-13Polygames
5321.1.8Hex 19Ultron-19PolygamesUltron-19
5381.1.8Hex 19PolygamesUltron-19Polygames
5411.1.9ClobberKlopperHerculeHercule
5421.1.9ClobberHerculeKlopperHercule
5481.1.9ClobberPan.exeHerculeHercule
5491.1.9ClobberHerculePan.exePan.exe
5541.1.9ClobberPan.exeHerculeHercule
5601.1.9Hex 19PolygamesUltron-19Polygames
5621.1.9Hex 19Ultron-19PolygamesPolygames

Descent thus constitutes an alternative to AlphaZero, in particular under resource constraints.

References

1 

Cazenave, T., Chen, Y.-C., Chen, G.-W., Chen, S.-Y., Chiu, X.-D., Dehos, J., Elsa, M., Gong, Q., Hu, H., Khalidov, V., Cheng-Ling, L., Lin, H.-I., Lin, Y.-J., Martinet, X., Mella, V., Rapin, J., Roziere, B., Synnaeve, G., Teytaud, F., Teytaud, O., Ye, S.-C., Ye, Y.-J., Yen, S.-J. & Zagoruyko, S. ((2020) ). Polygames: Improved zero learning. ICGA Journal, 42: (4), 244–256. doi:10.3233/ICG-200157.

2 

Cohen-Solal, Q. (2020). Learning to Play Two-Player Perfect-Information Games without Knowledge. arXiv preprint. 2008.01188.

3 

Cohen-Solal, Q. & Cazenave, T. ((2021) ). Minimax strikes back. In Reinforcement Learning in Games at AAAI.

4 

Korf, R.E. & Chickering, D.M. ((1996) ). Best-first minimax search. Artificial intelligence, 84: (1–2), 299–337. doi:10.1016/0004-3702(95)00096-8.

5 

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al. ((2018) ). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362: (6419), 1140–1144. doi:10.1126/science.aar6404.