DiplomacyARENAJohn Newbury 28 April 2013 |
Home > Diplomacy > DEMO > ARENA
AI in the Game of Diplomacy: ARENA
ARENA (Analyses of Results of Experiments Now Available) contains the latest analyses and discussions of the SAGA database of games run by DTDT (the forerunner of MARS), such as the relative abilities of the various contestants, in the the various environments, and relative difficulties of the various environments for the various competitors, together with ideas and plans for future missions.
All the following analyses all apply to the latest snapshot of SAGA, from which you can perform further analyses. As always, the results of each clone of a given bot adds independently to any accumulated statistics, indicated as "plays". Versions are included only when more than one has been released. See Bots for details of the players; only Stragotiator is yet to be included.
See Tournaments for detail analyses of discrete tournaments between the bots that were available in 2006. See graphs in Tournaments #3 and #4 contain some graphs that show the progress of a "slow knockout", as the probability and hence frequency of selection of each bot depends on some measure of their demonstrated ability. Tournament #9 was the latest and most complete tournament, including press by BlabBot (the only released bot that was then capable of it!); the cross tabs in that tournament gives a quick summary of the mean scores or games between each pair of powers (only two kinds of bot played in any of those tournaments).
The following table shows the mean score ("fitness") of bots obtained from all (1905) no-press (level 0) games in SAGA, (7*1905 = 13335 plays in total), irrespective of their Tournament name, up to the end of 2011-10-08. Here, the "score" is based on the DAIDE Standard method: each player pays one point to enter each game; the "pot" being divided equally between all eventual winners (soloist, or survivors in a draw); the sum and mean score is therefore zero. All the bots used their default tuning parameters, except that Albert had a "-t" (tournament) argument. Real-time default deadlines were set (5, 2 and 3 minutes, for movement, retreat and adjustment phases, respectively), albeit games were restarted – with the same set of powers, but fresh random power assignments – if any turn lasted more than 30 seconds, to avoid perpetual hanging due to faulty bots. However, only Albert used significant time, but was never (informally) observed to approach the 30 second timeout.
Note that this simple analysis ignores the mean fitness of their opponents and the mean difficulty of the power played. No systematic bias due to power played is likely, since assignment of power was random. The scores are, however, systematically compressed to some degree, due to the slow knockout method, which gradually tends to make the better bots play more games, and hence tends to pit the better bots against better bots (including their clones). (This effect could be eliminated in future analysis, while continuing to collect more samples of the better (and more interesting) bots. But even now, the effect would not tend to alter the ordering, because their selection becomes subject to bias only as they demonstrate their fitness.)
None of the 2006 Tournament games are included, because the level was set to allow press (level 8000), even though in many games none of the players were capable of press. (In due course, the database will record the maximum press levels that each bot is capable of, thereby allowing inclusion of further applicable games.)
Name | Plays | Mean Score | SD
of Mean Score |
---|---|---|---|
Albert 5.9 | 2118 | 1.568 | 0.073 |
KissMyBot 6 | 1672 | 1.013 | 0.077 |
Project20M | 971 | 0.082 | 0.079 |
Man'chi AngryBot | 828 | -0.079 | 0.080 |
Minerva | 821 | -0.268 | 0.072 |
DiploBot | 698 | -0.388 | 0.074 |
BlabBot 2.1 | 570 | -0.502 | 0.073 |
Diplominator | 553 | -0.514 | 0.075 |
HaAI | 577 | -0.538 | 0.070 |
DumbBot | 537 | -0.559 | 0.072 |
Brutus 0.8 | 522 | -0.588 | 0.071 |
Man'chi AttackBot | 447 | -0.699 | 0.061 |
Man'chi RevengeBot | 402 | -0.778 | 0.048 |
Man'chi DefenceBot | 393 | -0.822 | 0.045 |
Man'chi ChargeBot | 368 | -0.853 | 0.050 |
Man'chi ParanoidBot | 364 | -0.929 | 0.015 |
HoldBot | 383 | -0.954 | 0.012 |
Man'chi RandBot | 351 | -0.956 | 0.028 |
RubyBot | 366 | -0.972 | 0.010 |
RandBot | 394 | -0.994 | 0.004 |
The wide difference in number of plays is due the slow knockouts of the weaker bots. Albert is top, as expected. Only Albert, KissMyBot and Project20M achieved positive scores (the total score in any game being 0); only they would have won more than their "entry stakes"; the others would have lost. RubyBot is, disappointingly, comparable to random- and hold- bots.
BlabBot should be equivalent to DumbBot, as there was no press allowed. The differences between the result for these bots, and between RandBot and Man'chi RandBot, are due to, and within, the noise due to limited numbers of plays, as can be seen from the standard deviations (SD) of the mean scores.
The following table indicates the relative difficulty of playing each of the powers in the STANDARD variant, by the bots in the same set of games as above. (No other variant has yet been trialled.) Higher mean scores mean easier for the powers to play, by the bots in the above games. Once again, the fitness of the players is ignored, but no systematic bias is expected here, albeit different bots may have different preferences. (Such an analysis may be presented in due course.)
Code | Plays | Mean Score | SD
of Mean Score |
---|---|---|---|
FRA | 1905 | 0.497 | 0.065 |
ENG | 1905 | 0.348 | 0.062 |
TUR | 1905 | 0.341 | 0.062 |
ITA | 1905 | -0.023 | 0.055 |
RUS | 1905 | -0.295 | 0.047 |
GER | 1905 | -0.365 | 0.045 |
AUS | 1905 | -0.503 | 0.040 |
Here, FRA, ENG and TUR produced above average scores; the rest below average. However, the standard deviations (SD) of the mean scores indicate some reordering would be likely for a different sample, especially ENG and TUR, and to a lesser extent, RUS and GER.
See Tournament #9 for a comparison of the fitness in (level 8000) press games of all bots released in 2006. Extensive press games between all released bots, on all variants, with and without press (Level 8000 and 0), albeit short time limits, have now been run and are available in SAGA 2 database, but are yet to be properly analysed.
See Solo-Power SC Distribution for an Excel spreadsheet that shows counts of specific SC-power combinations when that power soloed in the Standard variant, from 10608 samples in 572 games (18.55 mean solo SC count). It was compiled from the logs of saved games 24599-41296 (5 missing in range) as defined in SAGA 2 database. (These logs are from the currently most recent games there, equally distributed amongst all current variants and Levels 0 and 8000 of press, but biased towards selection of the better bot contestants.) You can sort this in many useful ways; too many conveniently to display them all here.