Diplomacy
|
Home > Diplomacy > Tournaments > Tournament2
Tournaments for AI in the Game of Diplomacy: Tournament #2
Tournament #2 finished on 17 November 2005. The same method was used as in Tournament #1, but with three additional bots, HoldBot, RandBot and Project20M. See Bots for details for the players.
The tournament was directed automatically by the (ad hoc) DeepLoamSea Tournament Director Tools (DTDT). 1000 games of STANDARD variant Diplomacy were controlled by David Norman's DAIDE Server, with results being saved to an Access database for later analysis.
For each game, the first bot to be selected was the one that had played the fewest times so far in the tournament (selected arbitrarily when equal). Further bots for the game were selected uniformly at random from those available, with a given selection having no effect on the probabilities for later selections in the game. If a game seemed to hang (indicated by no advancing of turn for many seconds, depending on stage) it was terminated and rerun. The same selection of bots were used for a rerun to avoid biasing against selecting error-prone bots. (There was otherwise a significant bias. However, the Server would generally assign bots to different powers next time, so there would be a bias against any specific bot-power assignments that tended to hang.) In this way, each game comprised a uniformly random mixture of bots, including possible clones of a bot, but (because of how the first bot was selected in each game) the choice of games to play was such as to tend to minimise the variation in number of times each bot was selected over the whole tournament. (As the current Server always randomly assigns specified bot to available powers, it was not possible actively to minimise variation in assignment of a given bot to a given power.)
Each game was terminated when a bot had gained more than half the supply centres (normal finish), or when there had been no change to supply centre scores for a year (potential stalemate). (I believe that none of the bots used would ever request or offer any draw or solo win.)
The following optional (David Norman's) DAIDE Server version 0.25 flags were set:
-var=STANDARD
-lvl=8000 (press level 8000 = free text, but none play above level 0 = no press)
-kill=1 (ending game if no change in supply centre assignments for 1 year)
-mtl=2, -rtl=2, -btl = 2 (movement, retreat and adjustment time limits 2 secs.; the unlimited default occasionally causing at least one bot to compute for ages; perhaps forever)
-npr and -npb (no press during retreats or adjustments)
-xlog (no logging, albeit Server 0.25 can log at acceptable speed)
NB: -ptl=1 (press time limit before deadline) would have been set, but caused Server failure (at least in version 0.24). However, as no bot here could use press, all press settings were irrelevant in this tournament.
The names and versions of the cohort of 13 competing bots were, alphabetically:
DiploBot v1.1
DumbBot 4
HaAI 0.64 Vanilla
HoldBot
Man'chi AngryBot 7
Man'chi AttackBot 7
Man'chi ChargeBot 7
Man'chi DefenceBot 7
Man'chi ParanoidBot 7
Man'chi RevengeBot 7
Man'chi RandBot 7
Project20M
RandBot
See the Bots for details.
The tournament of 1000 games took about 8 hours to run on an AMD 3400 with 1 GB of RAM under Windows XP Home, with little other concurrent activity. 26% of the time was spent in game set-up, which included the time to start the server and bots, and that wasted on playing games that hung before completion, 9 of which had of which took over a minute (the longest being 163 seconds) and must have taken exceptionally many restarts to succeed. Of the 74% of time that was actually devoted to playing successful games, the mean time per year was 1.49 seconds. Although not directly measured, from casual observation, the set time limits rarely expired (except when totally hung) – turns rarely taking more than a fraction of a second.
An analysis of performance of each bot is shown in Tables 1a to 1d. Each is sorted in descending order of the last column. Plays is the number of times the bot played – multiple instance in a game counting separately; the average is therefore necessarily about 538 in this tournament. Score indicates the bot's strength relative to the average of the cohort, which is scaled to be 1, being the average number of points that a given bot received for its plays. In each game, one point for each power (7 for STANDARD) was shared between all the stars of the game. A leader is a bot that owned at least as many supply centres as any other when the game ended. A solo is a bot that had finished with more than half the available supply centres (and hence must be the sole leader). A survivor is a bot that still owned some supply centres when the game ended. The percentages of leader, solo and survivor relate to the plays by a given bot. 60.2 % of games had a solo; the remainder were formally unresolved by standard rules; draws were not accepted (or requested), although the Server classed them as DIAS (Draw Including All Survivors).
|
An analysis of performance of each power is shown in Tables 2a to 2d. Each is sorted in descending order of the last column. The values in each table are analogous to those for bots, above, but relate to the power, rather than bot concerned. Plays is not shown as it was always necessarily equal to the total number of games (1000).
|
Some further miscellaneous statistics are shown in Table 3.
Table 3: Miscellaneous Statistics | |||
---|---|---|---|
Minimum | Maximum | Mean | |
Set-up Seconds per Game | 1.5 | 163.2 | 7.5 |
Playing Seconds per Game | 2.5 | 159.4 | 21.6 |
Total Seconds per Game | 7.9 | 670.0 | 29.0 |
Years per Game | 3 | 48 | 14.51 |
Supply Centres of Leaders | 6 | 22 | 15.66 |
Leaders per Game | 1 | 4 | 1.06 |
Survivors per Game | 2 | 7 | 5.15 |
Bearing in mind the 3 extra bots in this Tournament, the Results here are comparable with the Results of Tournament #1. Consequently, the Conclusions of Tournament #1 also generally apply here, and are not reiterated. There is a little reordering within the corresponding tables, but only where values were similar.
Note that Project20M leads in all the analyses (Tables 1a to 1d). As would be expected, David's [sic] HoldBot comes last where it has to gain maximum supply centres (Tables 1a to 1c), which could never happen, except, in theory, if Russia; but it has moderate survival ability (Table 1d), probably because it does not provoke anyone, always guards its supply centres (albeit non-optimally), and requires an attack with one support to dislodge each unit. David Norman's RandBot is always worse than Man'chi RandBot – it may be just chance, although their may be some systematic differences in their behaviour.