Diplomacy
Tournaments

Tournament #6

John Newbury 17 July 2012

Home > Diplomacy > Tournaments > Tournament6

Tournaments for AI in the Game of Diplomacy: Tournament #6

Method

Tournament #6 finished on 2 March 2006. The bots and Server settings were as in Tournament #5, except that the server kill value, was raised from 4 to 100. (Games are terminated as a draw if there is no change in supply centre scores for kill number of years.) So, once again, a slow-knockout of 2000 games was used, with probability of a bot playing being proportional to its moving average of Strength. See Bots for details for the players.

Motivation

The motivation was to determine if a high kill value is viable with the current collection of bots. Higher values are desirable, as they would tend to properly resolve a higher proportions of games that would eventually have a solo winner, rather than assuming a draw if play ever just seems to be stalemated; but this risks pathologically long games and overall tournament length. If a game were truly stalemated, but at least one bot did not accept this, then the game would continue until the kill value was reached. It is known that none of the current bots recognises a draw, but neither does any attempt to create as stalemate; however stalemates could occur accidentally. Even if no games were truly stalemated, the distribution of resolution times might by pathologically skew.

Results

Play Rank
Bot	Plays
Project20M v 0.1	2849
KissMyBot v1.0	2655
Man'chi AngryBot 7	2280
HaAI 0.64 Vanilla	1161
DiploBot v1.1	1159
DumbBot 4	977
Man'chi AttackBot 7	593
Man'chi RevengeBot 7	407
Man'chi DefenceBot 7	369
Man'chi ChargeBot 7	357
Man'chi ParanoidBot 7	310
RandBot 2	303
Man'chi RandBot 7	291
HoldBot 2	289

Solo Rank
Bot	Solo %
Project20M v 0.1	23.17
KissMyBot v1.0	21.32
Man'chi AngryBot 7	17.76
DiploBot v1.1	9.49
HaAI 0.64 Vanilla	8.61
DumbBot 4	8.09
Man'chi AttackBot 7	3.88
Man'chi RevengeBot 7	1.72
Man'chi DefenceBot 7	1.63
Man'chi ChargeBot 7	1.40
Man'chi ParanoidBot 7	0.65
Man'chi RandBot 7	0.00
RandBot 2	0.00
HoldBot 2	0.00


Leader Rank
Bot	Leader %
Project20M v 0.1	23.62
KissMyBot v1.0	21.39
Man'chi AngryBot 7	18.46
DiploBot v1.1	9.66
HaAI 0.64 Vanilla	8.87
DumbBot 4	8.09
Man'chi AttackBot 7	4.38
Man'chi RevengeBot 7	1.97
Man'chi DefenceBot 7	1.63
Man'chi ChargeBot 7	1.40
Man'chi ParanoidBot 7	0.65
Man'chi RandBot 7	0.00
RandBot 2	0.00
HoldBot 2	0.00


Survivor Rank
Bot	Survivor %
Project20M v 0.1	76.31
Man'chi AngryBot 7	68.20
KissMyBot v1.0	66.29
Man'chi ParanoidBot 7	65.48
HaAI 0.64 Vanilla	65.29
Man'chi DefenceBot 7	58.27
Man'chi AttackBot 7	57.17
DiploBot v1.1	54.70
Man'chi RevengeBot 7	48.16
HoldBot 2	47.06
Man'chi ChargeBot 7	44.54
DumbBot 4	41.86
Man'chi RandBot 7	20.96
RandBot 2	17.82

In the above tables, Plays is the total number of plays by the given bot, where each instance of a given bot in a game counts as a play; it shows the effect of slow-knockout; Solo %, Leader % and Survivor % are the percentage of plays by the given bot in which, at the end of the game, it owned more than half the supply centres, owned at least as many supply centres as any other power, or owned at least one supply centre, respectively. Each table is in descending order of the numeric field.

Only 37 (1.85%) of the 2000 were draws, of which, just 6 were 3-way and 1 was 2-way. (No draws were actually offered by any of the bots.) This compares with previously unpublished figures for Tournament #5: 347 (17.35%) draws, of which 28 (1.4%) were 3-way, 0 were 2-way. The mean number of years per game rose from 22.0 to 31.3 (1.4 times as high); max number rose from 144 to 534 (3.7 times as high).

Conclusions

The same raw results were logged as in Tournament #5. The results were comparable, as shown above, except that this time Project20M appeared to be champion on all measures, whereas previously KissMyBot 1.0 had had highest Solo %. Once again, this, perhaps, indicates a slightly higher tendency for KissMyBot to take chances – all or nothing. As before, Project20M tended to be Leader; perhaps making steady gains, but often being beaten to a solo by a more erratic KissMyBot. But this time, with the much larger kill value, Project20M was better than KissMyBot at progressing from an apparent stalemate to a solo. So Project20M is back as champion. My previously expressed preference for using the Leader measure of ability was, perhaps, vindicated by the all-round more accurate measurements when using a higher kill value.

Clearly a higher kill value (100) resolves a much higher proportion of games; that is, many games that were terminated by kill=4 probably would have soloed given enough time. A mean rise in years of 40% was no significant problem, yet large enough to leave only a negligible percentage of unresolved games – and some of those may have been forever irresolvable. So, to save much argument, I shall use kill=100 for the foreseeable future. However, the danger of pathologically long games remains – a more sophisticated approach might be needed in future, but the issue will be ignored for now.