Diplomacy

ARENA

John Newbury 28 April 2013

AI in the Game of Diplomacy: ARENA

Outline

ARENA (Analyses of Results of Experiments Now Available) contains the latest analyses and discussions of the SAGA database of games run by DTDT (the forerunner of MARS), such as the relative abilities of the various contestants, in the the various environments, and relative difficulties of the various environments for the various competitors, together with ideas and plans for future missions.

Analyses

All the following analyses all apply to the latest snapshot of SAGA, from which you can perform further analyses. As always, the results of each clone of a given bot adds independently to any accumulated statistics, indicated as "plays". Versions are included only when more than one has been released. See Bots for details of the players; only Stragotiator is yet to be included.

Tournaments

See Tournaments for detail analyses of discrete tournaments between the bots that were available in 2006. See graphs in Tournaments #3 and #4 contain some graphs that show the progress of a "slow knockout", as the probability and hence frequency of selection of each bot depends on some measure of their demonstrated ability. Tournament #9 was the latest and most complete tournament, including press by BlabBot (the only released bot that was then capable of it!); the cross tabs in that tournament gives a quick summary of the mean scores or games between each pair of powers (only two kinds of bot played in any of those tournaments).

No-Press Games

Bots

The following table shows the mean score ("fitness") of bots obtained from all (1905) no-press (level 0) games in SAGA, (7*1905 = 13335 plays in total), irrespective of their Tournament name, up to the end of 2011-10-08. Here, the "score" is based on the DAIDE Standard method: each player pays one point to enter each game; the "pot" being divided equally between all eventual winners (soloist, or survivors in a draw); the sum and mean score is therefore zero. All the bots used their default tuning parameters, except that Albert had a "-t" (tournament) argument. Real-time default deadlines were set (5, 2 and 3 minutes, for movement, retreat and adjustment phases, respectively), albeit games were restarted – with the same set of powers, but fresh random power assignments – if any turn lasted more than 30 seconds, to avoid perpetual hanging due to faulty bots. However, only Albert used significant time, but was never (informally) observed to approach the 30 second timeout.

Note that this simple analysis ignores the mean fitness of their opponents and the mean difficulty of the power played. No systematic bias due to power played is likely, since assignment of power was random. The scores are, however, systematically compressed to some degree, due to the slow knockout method, which gradually tends to make the better bots play more games, and hence tends to pit the better bots against better bots (including their clones). (This effect could be eliminated in future analysis, while continuing to collect more samples of the better (and more interesting) bots. But even now, the effect would not tend to alter the ordering, because their selection becomes subject to bias only as they demonstrate their fitness.)

None of the 2006 Tournament games are included, because the level was set to allow press (level 8000), even though in many games none of the players were capable of press. (In due course, the database will record the maximum press levels that each bot is capable of, thereby allowing inclusion of further applicable games.)

Name	Plays	Mean Score	SD of Mean Score
Albert 5.9	2118	1.568	0.073
KissMyBot 6	1672	1.013	0.077
Project20M	971	0.082	0.079
Man'chi AngryBot	828	-0.079	0.080
Minerva	821	-0.268	0.072
DiploBot	698	-0.388	0.074
BlabBot 2.1	570	-0.502	0.073
Diplominator	553	-0.514	0.075
HaAI	577	-0.538	0.070
DumbBot	537	-0.559	0.072
Brutus 0.8	522	-0.588	0.071
Man'chi AttackBot	447	-0.699	0.061
Man'chi RevengeBot	402	-0.778	0.048
Man'chi DefenceBot	393	-0.822	0.045
Man'chi ChargeBot	368	-0.853	0.050
Man'chi ParanoidBot	364	-0.929	0.015
HoldBot	383	-0.954	0.012
Man'chi RandBot	351	-0.956	0.028
RubyBot	366	-0.972	0.010
RandBot	394	-0.994	0.004

The wide difference in number of plays is due the slow knockouts of the weaker bots. Albert is top, as expected. Only Albert, KissMyBot and Project20M achieved positive scores (the total score in any game being 0); only they would have won more than their "entry stakes"; the others would have lost. RubyBot is, disappointingly, comparable to random- and hold- bots.

BlabBot should be equivalent to DumbBot, as there was no press allowed. The differences between the result for these bots, and between RandBot and Man'chi RandBot, are due to, and within, the noise due to limited numbers of plays, as can be seen from the standard deviations (SD) of the mean scores.

Powers

The following table indicates the relative difficulty of playing each of the powers in the STANDARD variant, by the bots in the same set of games as above. (No other variant has yet been trialled.) Higher mean scores mean easier for the powers to play, by the bots in the above games. Once again, the fitness of the players is ignored, but no systematic bias is expected here, albeit different bots may have different preferences. (Such an analysis may be presented in due course.)

Code	Plays	Mean Score	SD of Mean Score
FRA	1905	0.497	0.065
ENG	1905	0.348	0.062
TUR	1905	0.341	0.062
ITA	1905	-0.023	0.055
RUS	1905	-0.295	0.047
GER	1905	-0.365	0.045
AUS	1905	-0.503	0.040

Here, FRA, ENG and TUR produced above average scores; the rest below average. However, the standard deviations (SD) of the mean scores indicate some reordering would be likely for a different sample, especially ENG and TUR, and to a lesser extent, RUS and GER.

Press Games

See Tournament #9 for a comparison of the fitness in (level 8000) press games of all bots released in 2006. Extensive press games between all released bots, on all variants, with and without press (Level 8000 and 0), albeit short time limits, have now been run and are available in SAGA 2 database, but are yet to be properly analysed.

Solo-Power SC Distribution

See Solo-Power SC Distribution for an Excel spreadsheet that shows counts of specific SC-power combinations when that power soloed in the Standard variant, from 10608 samples in 572 games (18.55 mean solo SC count). It was compiled from the logs of saved games 24599-41296 (5 missing in range) as defined in SAGA 2 database. (These logs are from the currently most recent games there, equally distributed amongst all current variants and Levels 0 and 8000 of press, but biased towards selection of the better bot contestants.) You can sort this in many useful ways; too many conveniently to display them all here.

Tracking, including use of cookies, is used by this website: see Logging.
Comments about this page are welcome: please post to DipAi or email to me.