What Shogi Programs Still Cannot Do A New

Outline The importance of testing Test sets for chess Test sets for shogi A

The importance of testing Game programming A program should play strongly More common is

The importance of testing The requirements of a test set Testing a wide variety

Test sets for chess The Bratko-Kopec test set 12 tactical positions and 12 strategic

Test sets for shogi The Matsubara-Iida test set 48 positions taken from professional games

Test sets for shogi Other test sets for shogi Yamashita’s test set (10 positions)

A new test set for shogi What do we want from a test set?

A new test set for shogi Positions selected from Shukan Shogi Every week six

A new test set for shogi This was not easy! More than 1500 positions

Problem area analysis Why are the positions difficult? Using the analysis tools in Todai

Problem area analysis Horizon effect and tsume shogi Problem 750 -3 Solved: 16% Solution

Problem area analysis Horizon effect and tsume shogi The problem Horizon checks after 2四銀、1四

Problem area analysis Horizon effect and tsume shogi Another problem: tsume shogi deep in

Problem area analysis Evaluation and forward pruning Problem 755 -3 Solved: 51% Solution 2二金、同金、2三角

Problem area analysis Evaluation and forward pruning The problem: an incorrect evaluation After 2一角成、4一玉

Problem area analysis Unpromoted pieces Problem 935 -2 Solved: 95% Solution 1三歩不成、2六銀直、（1四歩は反則）1四玉

Problem area analysis Unpromoted pieces The problem here seems a special case of forward

Problem area analysis Other problem areas Insufficient hardware speed Some positions could be solved

Problem area analysis Overview Problem Area Positions Insufficient hardware speed 31 Inaccurate evaluation function

Some new results New program versions have been released Todai Shogi 6 and 7,

Differences between humans and computers How difficult are the positions for human players? Almost

Conclusions and future work We have proposed a set of 100 positions that is

Finally Download the test set here gamelab. yz. yamagata-u. ac. jp/RESEARCH/shogitestset. zip Let me

Slides: 24

Download presentation

What Shogi Programs Still Cannot Do - A New Test Set for Shogi - Reijer Grimbergen and Taro Muraoka Department of Informatics Yamagata University 2004/11/13 GPW 2004 1

Outline The importance of testing Test sets for chess Test sets for shogi A new test set for shogi Problem area analysis Some new results Differences between humans and computers Conclusions and future work 2004/11/13 GPW 2004 2

The importance of testing Game programming A program should play strongly More common is the reverse approach: minimize the number of bad moves Testing can help determine problem areas Incremental testing Save positions that the program did not handle well Drawbacks • Test set is program-specific • Positions selected subjectively 2004/11/13 GPW 2004 3

The importance of testing The requirements of a test set Testing a wide variety of potential problem areas Not specific for one program Test design in games Mainly done for chess Current test sets for shogi have shortcomings Shogi research is at a point where focusing the effort could be a great help Proposing a new test set for shogi 2004/11/13 GPW 2004 4

Test sets for chess The Bratko-Kopec test set 12 tactical positions and 12 strategic positions Designed to compare human and computer performance in chess Thus far, no program can solve all positions Reinfeld’s Win at chess 300 tactical positions Used as a first test for new programs LCT II 35 positions Good balance between strategic, tactical and endgame positions An ELO rating can be calculated from the solved positions The Lindner test set A set of positions that are considered hard for computers to solve 2004/11/13 GPW 2004 5

Test sets for shogi The Matsubara-Iida test set 48 positions taken from professional games Selected by an expert player Aims at judging the strength of shogi programs First given to human players to establish a connection with playing strength Problems with the Matsubara-Iida test set Judging programming strength can be established more accurately by playing on the internet No ELO calculation like in LCT II Subjective selection leaves doubts about test balance What is difficult for computers is not necessarily difficult for humans and vice versa, so connection with playing strength is unreliable 2004/11/13 GPW 2004 6

Test sets for shogi Other test sets for shogi Yamashita’s test set (10 positions) Tanase’s test set (19 positions) Problems with these test sets Too small Program specific Unclear if there is only one solution 2004/11/13 GPW 2004 7

A new test set for shogi What do we want from a test set? 1. As general as possible 2. Points to as many problem areas as possible Find positions that can not be solved by the best programs Finding weaknesses instead of measuring strength 2004/11/13 GPW 2004 8

A new test set for shogi Positions selected from Shukan Shogi Every week six next-move problems Middle game positions and endgame positions Different tactical themes: winning material, attack, defense and mating Our goal: create a test set of 100 positions The programs we used AI Shogi 2003 Todai Shogi 5 Gekisashi 2 Conditions 30 seconds on 2 GHz Pentium 4 2004/11/13 GPW 2004 9

A new test set for shogi This was not easy! More than 1500 positions needed to be checked to find our test set Additional feature The percentage of respondents who solved the problem is given Differences between what is difficult for humans and difficult for computers 2004/11/13 GPW 2004 10

Problem area analysis Why are the positions difficult? Using the analysis tools in Todai Shogi, Gekisashi and AI Shogi to find problem areas Our first analysis indicates seven problem areas Horizon effect due to consecutive checks Not calling the tsume shogi solver deep in the search tree Inaccurate evaluation function Incorrect forward pruning Mate with unpromoted pieces Insufficient hardware speed Problems with time allocation 2004/11/13 GPW 2004 11

Problem area analysis Horizon effect and tsume shogi Problem 750 -3 Solved: 16% Solution 2四銀、1四玉（同歩、 2三金、同玉、3ニ角成）、3五金 Program replies Todai: 1五歩（敗勢） Gekisashi: 3ニ角成（後手優勢） AI Shogi: 3五金 2004/11/13 GPW 2004 12

Problem area analysis Horizon effect and tsume shogi The problem Horizon checks after 2四銀、1四玉、3五金 The same position without horizon checks can be solved by all programs 2004/11/13 GPW 2004 13

Problem area analysis Horizon effect and tsume shogi Another problem: tsume shogi deep in the search tree Gekisashi with more time 2四銀、1四玉、3五金、7九銀、同玉、2五桂、1 五歩、同馬、同銀（－1192） White has mate in 9 after 同玉 and black has a mate in 3 after 2五桂! 2004/11/13 GPW 2004 14

Problem area analysis Evaluation and forward pruning Problem 755 -3 Solved: 51% Solution 2二金、同金、2三角成、3三金、同馬 Program replies Todai: 2一角成、4一玉、6一金（勝勢） Gekisashi: 6八銀、5六成銀、3七桂、6六銀、 2五桂、5四歩、 2一角成、4一玉（先手勝勢） AI Shogi: 6八銀、5八成銀、 2一角成、4一玉 2004/11/13 GPW 2004 15

Problem area analysis Evaluation and forward pruning The problem: an incorrect evaluation After 2一角成、4一玉 the white king can escape, but this can not be assessed Evaluating the chances of escaping an attack is difficult? Another problem: forward pruning Consecutive sacrifices 2二金 and 2三角成 Multiple sacrifices not searched deep enough? 2004/11/13 GPW 2004 16

Problem area analysis Unpromoted pieces Problem 935 -2 Solved: 95% Solution 1三歩不成、2六銀直、（1四歩は反則）1四玉 Program replies Todai: 5二と（敗勢） Gekisashi: 8四桂（後手勝勢） AI Shogi: 投了(!) 2004/11/13 GPW 2004 17

Problem area analysis Unpromoted pieces The problem here seems a special case of forward pruning Promoting a major piece or a pawn is almost always better than not promoting Non-promotions of these pieces are pruned to improve search efficiency Not a high priority problem, but could have consequences for thinking in opponent time When there is no difference between promoting and nonpromoting a piece, non-promoting makes thinking in opponent time useless My advice : play the non-promotion to win some time! 2004/11/13 GPW 2004 18

Problem area analysis Other problem areas Insufficient hardware speed Some positions could be solved by giving the program more time Improved hardware speed will automatically solve these positions Time allocation In some positions, the programs would play very quickly These positions were deleted from our test set However, it might be a different problem area: when to cut off the search? 2004/11/13 GPW 2004 19

Problem area analysis Overview Problem Area Positions Insufficient hardware speed 31 Inaccurate evaluation function 20 Incorrect forward pruning 19 Horizon effect 18 Tsume shogi 11 Mate using unpromoted pieces 6 Reason unclear 7 2004/11/13 GPW 2004 20

Some new results New program versions have been released Todai Shogi 6 and 7, Gekisashi 3 and AI Shogi 2004 Results of Todai 6 on the test set Solved 6 positions The problem areas of these positions was different • • 2004/11/13 Inaccurate evaluation function (2 positions) Insufficient hardware speed (2 positions) Horizon effect (1 position) Reason unclear (1 position) GPW 2004 21

Differences between humans and computers How difficult are the positions for human players? Almost half of the positions (46) can be solved by more than 50% of the human respondents There are 14 positions that can not be solved by computers, but by more than 80% of the humans 2004/11/13 GPW 2004 Human percentage Positions 0 – 10% 0 11 – 20% 12 21 – 30% 18 31 – 40% 10 41 – 50% 13 51 – 60% 16 61 – 70% 7 71 – 80% 9 81 – 90% 9 91 – 100% 5 22

Conclusions and future work We have proposed a set of 100 positions that is general and points to specific problem areas in computer shogi As more positions get solved, we intend to replace them with new positions Further investigate of the unsolved positions for which the problem could not be determined Making further comparisons between what is difficult for humans and difficult for computers 2004/11/13 GPW 2004 23

Finally Download the test set here gamelab. yz. yamagata-u. ac. jp/RESEARCH/shogitestset. zip Let me know about your results 2004/11/13 GPW 2004 24