Diplomacy
Tournaments

Tournament #2

John Newbury        17 July 2012

Home > Diplomacy > Tournaments > Tournament2

Tournaments for AI in the Game of Diplomacy: Tournament #2


Method

Tournament #2 finished on 17 November 2005. The same method was used as in Tournament #1, but with three additional bots, HoldBot, RandBot and Project20M. See Bots for details for the players.

The tournament was directed automatically by the (ad hoc) DeepLoamSea Tournament Director Tools (DTDT). 1000 games of STANDARD variant Diplomacy were controlled by David Norman's DAIDE Server, with results being saved to an Access database for later analysis.

For each game, the first bot to be selected was the one that had played the fewest times so far in the tournament (selected arbitrarily when equal). Further bots for the game were selected uniformly at random from those available, with a given selection having no effect on the probabilities for later selections in the game. If a game seemed to hang (indicated by no advancing of turn for many seconds, depending on stage) it was terminated and rerun. The same selection of bots were used for a rerun to avoid biasing against selecting error-prone bots. (There was otherwise a significant bias. However, the Server would generally assign bots to different powers next time, so there would be a bias against any specific bot-power assignments that tended to hang.) In this way, each game comprised a uniformly random mixture of bots, including possible clones of a bot, but (because of how the first bot was selected in each game) the choice of games to play was such as to tend to minimise the variation in number of times each bot was selected over the whole tournament. (As the current Server always randomly assigns specified bot to available powers, it was not possible actively to minimise variation in assignment of a given bot to a given power.)

Each game was terminated when a bot had gained more than half the supply centres (normal finish), or when there had been no change to supply centre scores for a year (potential stalemate). (I believe that none of the bots used would ever request or offer any draw or solo win.)

Server

The following optional (David Norman's) DAIDE Server version 0.25 flags were set:

NB: -ptl=1 (press time limit before deadline) would have been set, but caused Server failure (at least in version 0.24). However, as no bot here could use press, all press settings were irrelevant in this tournament.

Bots

The names and versions of the cohort of 13 competing bots were, alphabetically:

See the Bots for details.

Results

The tournament of 1000 games took about 8 hours to run on an AMD 3400 with 1 GB of RAM under Windows XP Home, with little other concurrent activity. 26% of the time was spent in game set-up, which included the time to start the server and bots, and that wasted on playing games that hung before completion, 9 of which had of which took over a minute (the longest being 163 seconds) and must have taken exceptionally many restarts to succeed. Of the 74% of time that was actually devoted to playing successful games, the mean time per year was 1.49 seconds. Although not directly measured, from casual observation, the set time limits rarely expired (except when totally hung) – turns rarely taking more than a fraction of a second.

An analysis of performance of each bot is shown in Tables 1a to 1d. Each is sorted in descending order of the last column. Plays is the number of times the bot played – multiple instance in a game counting separately; the average is therefore necessarily about 538 in this tournament. Score indicates the bot's strength relative to the average of the cohort, which is scaled to be 1, being the average number of points that a given bot received for its plays. In each game, one point for each power (7 for STANDARD) was shared between all the stars of the game. A leader is a bot that owned at least as many supply centres as any other when the game ended. A solo is a bot that had finished with more than half the available supply centres (and hence must be the sole leader). A survivor is a bot that still owned some supply centres when the game ended. The percentages of leader, solo and survivor relate to the plays by a given bot. 60.2 % of games had a solo; the remainder were formally unresolved by standard rules; draws were not accepted (or requested), although the Server classed them as DIAS (Draw Including All Survivors).

Table 1a
Bot Plays Score
Project20M v 0.1 534 2.925
Man'chi AngryBot 7 536 2.156
DiploBot v1.1 534 1.778
DumbBot 4 556 1.630
HaAI 0.64 Vanilla 533 1.470
Man'chi AttackBot 7 550 1.005
Man'chi ChargeBot 7 535 0.667
Man'chi RevengeBot 7 537 0.626
Man'chi DefenceBot 7 538 0.429
Man'chi ParanoidBot 7 534 0.243
Man'chi RandBot 7 535 0.050
RandBot 2 542 0.017
HoldBot 2 536 0.000
 
Table 1b
Bot Leader %
Project20M v 0.1 42.88
Man'chi AngryBot 7 32.09
DiploBot v1.1 26.97
DumbBot 4 23.74
HaAI 0.64 Vanilla 22.33
Man'chi AttackBot 7 15.45
Man'chi ChargeBot 7 9.91
Man'chi RevengeBot 7 9.68
Man'chi DefenceBot 7 7.06
Man'chi ParanoidBot 7 4.49
Man'chi RandBot 7 0.93
RandBot 2 0.55
HoldBot 2 0.00
  
Table 1c
Bot Solo %
Project20M v 0.1 29.40
Man'chi AngryBot 7 19.96
DiploBot v1.1 19.10
DumbBot 4 15.65
HaAI 0.64 Vanilla 10.88
Man'chi ChargeBot 7 5.98
Man'chi AttackBot 7 4.73
Man'chi RevengeBot 7 3.91
Man'chi DefenceBot 7 1.67
Man'chi ParanoidBot 7 0.37
Man'chi RandBot 7 0.19
RandBot 2 0.00
HoldBot 2 0.00
 
Table 1d
Bot Survivor %
Project20M v 0.1 90.82
Man'chi AngryBot 7 87.50
Man'chi ParanoidBot 7 84.08
HaAI 0.64 Vanilla 81.80
DiploBot v1.1 80.90
Man'chi DefenceBot 7 80.86
Man'chi AttackBot 7 79.09
Man'chi RevengeBot 7 75.42
HoldBot 2 73.32
DumbBot 4 69.42
Man'chi ChargeBot 7 66.92
Man'chi RandBot 7 43.36
RandBot 2 42.99

An analysis of performance of each power is shown in Tables 2a to 2d. Each is sorted in descending order of the last column. The values in each table are analogous to those for bots, above, but relate to the power, rather than bot concerned. Plays is not shown as it was always necessarily equal to the total number of games (1000).

Table 2a
Power Score
RUS 1.490
ENG 1.082
TUR 1.065
FRA 1.048
ITA 0.979
GER 0.816
AUS 0.520
 
Table 2b
Power Leader %
RUS 22.4
ENG 16.4
TUR 15.8
FRA 15.7
ITA 14.8
GER 12.6
AUS 7.9
 
Table 2c
Power Solo %
RUS 12.6
FRA 9.8
ITA 9.3
TUR 8.4
ENG 7.7
GER 7.0
AUS 5.4
 
Table 2d
Power Survivor %
ENG 86.8
TUR 82.6
FRA 81.3
ITA 77.2
RUS 69.8
GER 60.2
AUS 56.9

Some further miscellaneous statistics are shown in Table 3.

Table 3: Miscellaneous Statistics

  Minimum Maximum Mean
Set-up Seconds per Game 1.5 163.2 7.5
Playing Seconds per Game 2.5 159.4 21.6
Total Seconds per Game 7.9 670.0 29.0
Years per Game 3 48 14.51
Supply Centres of Leaders 6 22 15.66
Leaders per Game 1 4 1.06
Survivors per Game 2 7 5.15

Conclusions

Bearing in mind the 3 extra bots in this Tournament, the Results here are comparable with the Results of Tournament #1. Consequently, the Conclusions of Tournament #1 also generally apply here, and are not reiterated. There is a little reordering within the corresponding tables, but only where values were similar.

Note that Project20M leads in all the analyses (Tables 1a to 1d). As would be expected, David's [sic] HoldBot comes last where it has to gain maximum supply centres (Tables 1a to 1c), which could never happen, except, in theory, if Russia; but it has moderate survival ability (Table 1d), probably because it does not provoke anyone, always guards its supply centres (albeit non-optimally), and requires an attack with one support to dislodge each unit. David Norman's RandBot is always worse than Man'chi RandBot – it may be just chance, although their may be some systematic differences in their behaviour.


Tracking, including use of cookies, is used by this website: see Logging.
Comments about this page are welcome: please post to DipAi or email to me.