Diplomacy

DEMO

John Newbury       2 May 2012

Home > Diplomacy > DEMO

AI in the Game of Diplomacy: DEMO


Contents

DEMO

DEMO (DAIDE Evolving Model Organizer) is intended as a general paradigm for automating experimental work such as is needed for testing, tuning and comparing bots, and generally exploring Diplomacy game-space (DipSpace) – a sort of ongoing tournament. Initially this will just be at the whole-game level, but could, potentially, cover aspects of play within games, by use of an appropriate observer and/or cooperation by bots under test. (Here we are just considering a DAIDE-specific version of what I would call EMO (Evolving Model Organizer), which might prove to be applicable in wider contexts. I always like to keep in mind possible generalizations!)

The most important tangible components of DEMO are MARS (a program that selects, runs and monitors games), SAGA (a database of game results) and the ARENA (analyses of the games already run and plans for future missions).

In due course, one or more empirical models would be used to predict the expected game results for given (possibly fuzzy) environments (game parameter sets), which would include game objectives (scoring/utility formula), research objectives, specific sets of contestants (mainly bots, but potentially also humans, or teams of humans, bots and/or other, isolated teams), their power assignments, bot and server parameter settings, and hardware platform – using results of previous games, weighted according to the relevance of their parameters. An environment would be specified by as many parameters as required, but the more specific the less relevant data, so less accuracy. Where a parameter were omitted, any value would be accepted.

An estimate of accuracy of the prediction may also be made, depending on closeness and importance of each matching parameter (dimension) and observed variability. However, accuracy of prediction would probably be very poor unless a large number of very similar games had been observed. (Recall the high amount of noise in my Tournaments, typically taking 2000 games (between fast bots) and 24 hours to reasonably eliminate.) So, unless very similar questions had been posed before, DEMO would typically proceed to design and run further series (sets of games) to improve the accuracy as required. But games (especially in the future, with sophisticated press and deep analysis) are generally too slow to run all desirable combinations, so prioritization is needed. To improve expected prediction accuracy, for minimal cost (time), ideally one might like to specify the relative or absolute errors required for each mission (research objective). For simplicity, initially at least, this will be done indirectly, by changing default weights for various desired kinds of series, such as inclusion of specific entities, contestants, variants, assignments, press level, deadlines – or combinations thereof. Sometimes a series would be of significant value towards multiple missions. In any case, multiple objectives would generally be active contemporaneously; for example, the next game to run would tend to be the one that is expected to minimize the maximum deviation from desired relative accuracy of any goal.

DEMO can be considered to be an ongoing tournament between all bots, humans and teams of interest, to which new contestants could be added at any time, and in which test criteria, and hence the competition rules, can be changed at any time in order to best investigate whatever is required at that time. (Finite tournaments would be superseded, replaced by an ongoing league table, albeit with many dimensions rather than a simple list.) All results would be stored in the SAGA database, (initially populated with data from my current Tournaments, automatically expanded in due course to include symmetrical patterns to reduce systematic error). SAGA would eventually be made publicly readable via Web pages, and maybe directly readable via ODBC. DEMO would provide its best estimate of the results of a wide range of possible series, using its available data. If more accuracy were required it would attempt organise the best sets of series to improve the accuracy, using the most appropriate type and minimum amount of resources, subject to competing demands for exploring different dimensions of DipSpace.

DEMO can also be considered to represent an extensible multiverse or ensemble of possibilities, any of which can be observed, and scrutinized in more detail as required.

I would be interested in results for the populations of contestants in many environments, but most interested in results for my bots (mainly DLS) and, especially, other comparable ones. (Others that are too weak, especially the likes of HoldBot and RandBot, are just boring; more advanced ones are always of interest in their own right, but if too far ahead it may be too difficult to perceive how to make progress with my bot – like a beginner trying to learn by studying how a chess grand master plays, or trying to learn calculus before learning arithmetic.) But any environment has many dimensions (including all server and client options and computing systems), not all combinations of which could be explored in significant depth (if at all); that is, few, if any, games would directly test some dimensions. Other DipAi members would have different interests and priorities, but on my computers priority would, of course, tend to be given to my interests – mainly developing DLS (but I might be persuaded to change them to a degree). Generally, other members would need to provide their own computing resources to explore areas only of interest to themselves, but big gains should be possible be sharing data, as there would probably be much overlap (for instance, when using commonly selected options).

NB: DEMO is open ended and much of what I describe here may never be done or even doable – it may become no more than a roadmap into the far future. However, DTDT already does much of what is required, and the essentials of an automated ongoing (rather than finite) dynamically changeable (rather than fixed) testing/tuning environment should be realizable during the planned tidy up for release in a future version of BBB.

Below are some of the main concepts to be incorporated, and the associated terminology that I shall use in the context of DEMO unless otherwise qualified. For simplicity, it is phrased as if now available.

MARS

MARS (Manage Acquisition of Required Samples) is a program that I am developing to deal with at least the top level automated aspects of DEMO, including the main user interface. It may, if convenient, handle all aspects, being switched into different modes, at the graphical and/or command line interface. It will select games according to the missions specified in the SAGA database, in which it records results of completed games. See main MARS page for details.

SAGA

SAGA (Samples of Acquired Game Attributes) is a database of results of Diplomacy games, run by MARS. For convenience, it currently also holds records that define the current mission for series (games) selected by MARS, analyses are shown in the ARENA. See main SAGA page for details.

ARENA

ARENA (Analyses of Results of Experiments Now Available) contains the latest analyses and discussions of the SAGA database of games run by MARS, such as the relative abilities of the various contestants, in the the various environments, and relative difficulties of the various environments for the various competitors, together with ideas and plans for future missions. See main ARENA page or details.

DipSpace

DipSpace is the mathematical space of Diplomacy game parameters, including server settings, available platforms and players that can be explored by the DEMO models.

Model

A model is a mathematical model of how Diplomacy produces what can be observed, including how different properties are related, for example, how ability in a given variant tends to affect ability in another variant or with various press levels. A model, however crude, is necessary for any averaging and making any predictions from past results if any parameters are changed – even assigning a given contestant to a different power or against a different set of contestants. Otherwise each series is in a distinct, non-commensurate domain; only averages from ideally prepared games would be meaningful, and only for predicting future identically prepared games – and only if no contestant could learn!

However, even without a model, some control of generality is possible by varying exactly which parameters are considered when preparing a game, that is, the reference class. A priori we would assume that past scores, say, for a given reference class define our expectation for future scores in that reference class. For instance, if variant, say, is ignored, so that all  known ones are sampled equally often, then mean past scores give our expected future scores when all variants are chosen equally often. Or the distribution might be, say, 90% Standard and 10% equally from all the rest. Weighting could be applied to adjust for when existing results were not actually prepared in the required way for some different environment. But note that if there were few observations when using certain parameters (e.g., certain variants), then the uncertainly of their true values could dominant that of the mean – infinitely so if no observations with that parameter!

The simplest (null) model assumes that game-parameters are irrelevant, so that all game-scores have the same weight for any purpose. But this is naive. Ideally, mean ability in one variant, say, should provide at least some prior expectation of ability in any another variant, even if never played, or for any other parameter, if part of a reference class that has been well sampled (such as the set of variants that someone has felt worth implementing). Arbitrarily complex models could be used, but, given the large amount of noise in Diplomacy (indicated by wide variations in results if a game is repeatedly rerun with identical parameters), a simple linear model, using correlations only between pairs of parameters may be all that could be meaningfully used.

Scores should be weighted in inverse proportion to the variances of their correlations, which should tend to eliminate the effects of irrelevant games, that is, those with too disparate parameters. But we must be wary, as in many cases there would only be very few samples, and most of the more familiar statistical methods only become valid as the number of samples becomes large. (A fluke in a small sample could give misleading results; we need to know when we do not know!) We should use the principle of maximum entropy (maxent) – which is effectively the same as the principle of indifference – and assume all possibilities are equally likely (therefore infinite variance, implying no knowledge of the true value, so zero weight) – except to the extent that we have evidence to the contrary. So variance starts off purely theoretical, becoming ever more dominated by empirical observations as samples accumulate.

Any models could be applied, retrospectively, to the raw data, at least while still manageable in quantity. If too much to analyse from scratch then summaries from a limited number of models must be used, perhaps using factor analysis – but this is a big, partly subjective, topic.

Parameter

A parameter of a game could include game objectives (scoring/utility formula), specific sets of contestants (mainly bots, but also humans or teams), their power assignments, settings of bot and server options, and hardware platform. A parameter such as variant could be broken down into subsidiary parameters (which may be shared with, or similar to, other variants), such as number of powers, number of provinces, SCs, HCs, mean adjacent provinces, and so on – plus an indefinable something else. Some parameters have discrete, often Boolean, values, such as whether press allowed during a retreat; others have a smooth (but not necessarily linear in effect) scale, such as a time limit on a given platform; yet others have an uneven, ad hoc, albeit still monotonic scale, such a press level. A set of (scalar) parameters that tends to be common to many games is called a setting (see SAGA.Settings table).

Environment

An environment is a weighted set of game parameters, representing a world in which a contestant can play (live). (So an environment may also be defined as a weighted set of previously defined environments.)  The population of contestants is distributed in a given environment in proportion to their mean scores in that environment, for the current distribution of other contestants in that environment, thereby making an ecosystem. (So the distribution is not fixed, but should converge in the absence of external influences, such as release of new bots or new versions.)

Each contestant has an optimum environment (its primary niche, giving highest score), considering all the other contestants in (known to) the lab, such that they will tend to obtain their highest score in that environment when playing against the evolved population of other contestants in the lab. It is not fixed, and only determined empirically. One contestant may have highest score (and so highest population) in one environment, but another contestant may be highest elsewhere. So there may well be no universally best contestant. Environments are important when awarding prizes. The population of each contestant is normalized to be in proportion to his (mean standard DAIDE) score. That is, the effective population of a given contestant is proportional to his mean score when competing with all other known contestants, selected in proportion to their effective populations (i.e., mean scores). So weak contestants only tend to have a small effect on the mean scores and population of all others in any environment.

DEMO optimizes the identification of all niches, and especially the primary one,  for each contestant, as evidence accumulates, from an initial best estimate and specified adjustment controls. (The contestant, or his manager, may also control optimization and/or readjust automated optimization controls at any time.) Niches may or may not be very specific, and may be mere weighting factors rather than absolute. A precise setting can keep closest to optimum, but may not be justified. A less precise setting allows better exploration for the optimum and better keeps the opposition guessing. For example, if I am better than the rest at playing in an arbitrary game variant, I may do better to include all known variants in my niches (at least with some weight, even if not all equal), rather just one or two, which other contestants might learn to play very well.

The mean normalized scores (or effective populations) rank the best (or fittest) bots in the model (with respect to other contestants known to the lab) – survival of the fittest. The top one would be due any prize I may offer in that environment, but in practice prizes would normally only be offered in the realms of my bots, which are not necessarily the same set. Also see Evaluation. These scores would also be used to weight evidence for any prediction. For example, if predicting which power (rather than contestant) will tend to win in a given (possibly vague) environment, games containing high-scoring contestants, representing high populations, will have higher weight. Games including (very low-scoring) HoldBot, say, will have little effect, unless the scenario explicitly or implicitly strongly biases selection to HoldBot games.

Note that although there is a unique ranking (at any given time) in a given environment, the ranking may be different in different environments, and there may be no universally best contestant. For example, A may beat B in environment #1, B beat C in environment #2 and C beat A in environment #3. There is a unique ranking in a given environment because only mean weighted scores between all known contestants are compared, rather than one-one one challenges (which could lead to the non-transitive A>B>C>A). The scores are weighted in proportion to the probability of such a game occurring "naturally", given the populations of the contestants concerned, being the product of the probabilities that each player would have been randomly chosen from those available. So changes in population can change ranking of otherwise unchanged contestants, depending on how well they can cope with those changes.

Ecosystem

An ecosystem comprises a given environment and population of contestants. The population would evolve in its environment from a given initial state. Unless otherwise stated, the ecosystem of an environment is that where its population has reached steady-state. However, it is plausible that there may be multiple steady-states in some environments, depending on the initial population and random effects. (Although there is no explicit randomness in Diplomacy, game theory often demands that players introduce randomness for optimal play; indeed they are usually somewhat unpredictable even when not explicitly randomizing. Choice of series (games played) also generally have a random element.) So, in general, an environment has an ensemble (weighted set) of discrete populations and associated ecosystems, even for a given initial state, let alone an ensemble of initial states. However, if there is sufficient randomness, a single steady state must result, albeit, if randomness is low enough, it may remain in a pseudo-stable state for long periods, before randomly flipping to another. For simplicity (to avoid the complication of a general ensemble), unless otherwise stated, it is assumed that there is sufficient randomness always to produce a single steady-state in a given environment, without undue pseudo-stable flipping, but otherwise minimal randomness to maximize the effect of the environment. Diplomacy probably tends to produce more than enough randomness in all but the most artificial of environments. Organizers can add more randomness if need be; in practice they will usually have to add yet more (weighted) randomness anyway, since there will generally be insufficient resource for series to explore more than a very small subset of the possibilities allowed, let alone adequately to represent the proportions required, by the environment that they are exploring (each on behalf of its director).

Niche

A niche is any environment where a given contestant is expected to have the highest score (and hence population). Each is usually a realm of the contestant's too, unless other contestant have a very uneven distribution of scores. The primary niche of a contestant is the environment where he tends to have the highest score of all.

Realm

A realm is any environment where a given contestant is expected to have more chance of having a higher score (and hence population) than any other contestant there. Each is usually a niche of the contestant's too, unless other contestants have a very uneven distribution of scores. The primary realm of a contestant is the environment where he has the highest probability of having highest score of all.

Population

The population of a given environment is the distribution of contestants that would be expected if proportional to their expected mean (standard DAIDE) score when competing against all known contestants (including their own clones) drawn from that distribution. So population and score distribution are synonymous. This simulates survival of the fittest. It is dominated by the currently best contestants, reflecting the fact the we are generally not very interested in ability to beat, say, RandBot, nor ability to beat a bot that only breeds well when fed a diet of RandBots (since such food would be rare in nature).

Ideally, many samples of all the implied competitions would actually be tested, but in practice it would only be practicable to test a small fraction of possibilities in depth. However, additional tests would be run, automatically, to improve the accuracy with respect to specified contestants in specified environments, as prioritised by the director of the lab. (Such prioritization would not cause any systematic shift in population.)

Note that there is no sensible concept of overall population, except within a specified environment or a (possibly weighted) set of environments (which is still an environment), and no objective way to specify what that environment should be – any more than we can say, objectively, what (environment) best exemplifies the essence of Diplomacy.

Determining the population distribution requires solving a simultaneous equation for each contestant, but this could be done incrementally as new results are produced in a previously solved environment, which can be built up from nothing. Note that one environment may contain or overlap other environments, so in principle there is generally some prior knowledge about a newly defined environment. However, this involves entering a mathematical jungle of multivariate analysis, and similar, which is very dependent on the model of data used – choice of which seems somewhat subjective. See series and model.

Series

A series is a set of games chosen to best obtain (or improve the accuracy of) certain information. A series may be continued for increasingly accurate measurement. Several series may be run contemporaneously, even within a given DEMO system, simultaneously or interleaved. This helps avoid the danger of players anticipating the pattern in use in a given game. Also, priorities can change, making completion of a given series more or less urgent. The parameters of a series may be adapted over time, according to results obtained and current goals. An exception is a pure series, where the operation of the parameters themselves is also of interest: in such cases the parameters are kept fixed.

The only objective approach seems to be to use a model that is no more complex than is warranted, bearing in mind the noise in the observations (as simple as possible, but no simpler, as Einstein said), and mainly to suggest what actual series ought to be performed. The subjective element then only affects the efficiency of use of available resources for experimentation, or accuracy for a given number of series – without systematically affecting the final conclusions, which would be based on actual results within the environment of interest.

Pattern

A pattern is the sets of powers in each group in a given game, designed for a specific series. Variant specific. Most patterns would probably have exactly two groups and usually as equal as possible in size, but sufficient variation should be used to keep contestants guessing, that is, to avoid any tendency to optimize on a spurious constraint (analogous to including a good mix of non-Standard variants, even if Standard dominates).

Round

A (full) round (of games) comprises all (n!) combinations of contestant assignments for given pattern of powers, thereby being fair to all contestants in the associated series (compare with duplicate bridge). (Typically, a pattern would only comprise two or or three regions of powers, all those in a given region being played by (clones of) the same contestant, thereby requiring only two or six games in a round, respectively. But even using only a random subset of possible combination (a partial round) should tend to reduce systematic bias.)

Variant

A variant is a specific combination of map topology, including home and supply centres, their initial distribution, and that of all units.

Assignment

An Assignment is a specific power in a specific variant.

Game

A game comprises a specific variant, all the contestant assignments, and all other (game) parameters that can be specified as options by the server, such as time limits and press level. Usually the parameters and results of a game comprise a basic unit of observation for DEMO.

Mission

A mission, specified by a director, constrains the choice of series run by MARS. At least initially, each will be defined in record(s) within SAGA.

Region

A region is a set of powers, used to specify specific power assignments. Patterns are generated from regions.

Persona

A persona is a specific setting of the attributes of a given entity. Bots may change persona according to, say, command-line arguments, control file and/or Registry entries. In principle, humans and teams may also play as different persona; for example, by being deliberately and consistently acting more than normally aggressive or trusting.

Version

A version of a bot is a manually or semi-manually modified instance (code and data), intended to correct or enhance it, or to explore other possibilities. Changing versions may or may not significantly change its persona. Even possible modifications due to "hints" should change (normally an insignificant part of) the version string, to indicate direct human assistance. An identifying string, clearly indicating which version supersedes which, if applicable, should be sent to the server in the version field of a NME message. (Strict superseding does not apply for a tree of versions.) A given version may have multiple adaptations; changing one may require a change to the other for optimal play.

Adaptation

A adaptation is an automatically modified instance (code and data) of a bot, intended, ultimately, to enhance it according to what has been observed from previous play or pre-computed analysis, whether or not it was playing in the games concerned or whether done by a separate program. (I say "ultimately", since, like manual changes, it may merely be exploring other possibilities, which may be unlikely to lead to an immediate improvement, albeit that should be the longer-term goal. Less targeted, lateral thinking, can help avoid getting stuck in local optima.) No explicit human guidance is allowed, not even "hints" (only implicit guidance; for example, choice of games played or plays within games). No indication is given to the server in the NME message. Changing adaptations may or may not significantly change its persona. (Strict superseding does not apply for a tree of adaptations.) A given adaptation may apply to multiple versions; changing one may require a change to the other for optimal play.

Training

Contestants could use the DEMO model as a training environment, whether optimization is manual or automated. By default, contestants would compete against others selected in proportion to their mean normalized scores, that is, the ones that would have most effect on the score of the contestant being tested, with games tending to be between the strongest contestants. (Who cares exactly how strong RandBot is compared to HoldBot?) By specifying a bias towards selecting the contestants under test, resources can be devoted to training them more than others. By specifying a bias against the very strong contestants, say, a novice contestant may gain experience without being slaughtered in every game, so can sometimes score other than zero. (The scores of other contestants would also be adjusted during training, but not much when playing against weak (low population) contestants, since a given game then represents only a small ensemble of "natural" games. So contestants would not tend to learn to be sloppy by being exposed to weaker contestants.)

Interface

A windowing interface would normally be used, but all requests and responses would be possible in a formal language for interfacing further automation, normally on a permanent resource, such as Yahoo (not a user PC). Members login to the facility.

A simple initial system would run all on one computer, initially in private. Ideally it should run on a dedicated computer to avoid interactive work adding uncertainly to the CPU power available in a given deadline. However, it could be useful to use spare CPU time during interactive sessions at least to obtain preliminary results. To avoid undue disruption to the interactive user, due autonomous creation and deletion of client main and taskbar windows, DEMO should run in special account as a background login. To minimize disruption to DEMO, it should not be run at lower than normal priority (which could cause total CPU starvation and deadline overrun when the interactive user were CPU bound.). Instead, the interactive user should have hotkeys to pause or unpause DEMO when he feels he is likely to be too disruptive to DEMO, or vice versa. Potentially, assuming clients are correctly implemented, organizer could automatically instruct its umpires to pause or unpause their games according to how much CPU demand was present from other logins.

All aspects of results of a game would be retained (in a SAGA database), so any type of score could be determined retrospectively. DAIDE Standard Score is preferred and assumed unless otherwise stated.

Warnings are generally logged, rather that stopping the game, or pausing to ask a question, which will cause timeout when automated and crash the game.

Competition Rules

Unless specified otherwise, the following competition rules would apply to any environment declared eligible for a prize. Some explanation is given in square brackets.

Ground Rules

  1. A prize-worthy environment would normally one that has been demonstrated, by a significant number of series, to be a realm of one of my contestants [normally some persona of DLS].
  2. Each entity may compete as any number of contestants, each of which shall have a unique name (sent to server by NME message).
  3. Any commonality should be summarised, but does not affect testing.
  4. The owner and manager of each contestant shall be identified by a unique Yahoo username that is registered on the DipAi Yahoo group. (Any or all may be the same person.)

Game Rules

  1. Level 8000 press would normally be allowed. [Allowed whether or not any humans were involved. Humans or bots may use DAIDE syntax, similar extensions using exotic press or equivalent, natural language, or any other syntax. The natural constraint to excessive use of such free forms is that, like all standard advanced or new forms of press, it would be of little value unless reasonably widely used. If a language extension does prove useful to its users it would provide evidence that it, or something similar, should be added to the DAIDE standard. Although an extension would provide an indication of known commonality (typically a clone) it would also, by a simple process of elimination, almost equally indicate commonality of the opponent, as usually two entities would dominate the powers in a given game. Anyway, I have spent so much effort on use of press in general that I would not usually be prepared to cripple it! Also, other press bots are starting to appear, so I think we are at the stage when it should be strongly encouraged. However, I may offer a prize in any realm of my bot, even if low or no-press.]
  2. Press would only be allowed on Movement turns, as in normal games (unless there is evidence that an entity is able to use it very productively in other phases).
  3. Time limits would be as short as practicable and reasonable for all players. [Not so small as to waste a significant proportion of time in game set up, nor unduly cripple any bot. Bots need time to do reasonable initialization and a few passes of their main methods. Probably not long enough for the likes of Seanail, I think! But it is up to the bot author to make best use of any time available, no matter how long or short.]
  4. Disconnection counts as elimination. [No reconnection allowed – tough – unless no players remain, in which case the game is replayed.]
  5. A short press-free period [PTL] shall be set [to allow final orders to be issued when it is clear that there can be no more press that turn].
  6. Games shall be killed after a pre-stated number of years with no change in SC assignments [-kill]. All remaining powers shall be considered to be in a draw.
  7. Partial draws shall be allowed (and encouraged to avoid boring long stalemated or near stalemated play). [That is, PDA. Presumably it is only rational to agree to a draw in which you are not part if you believe your (close) kin or clones are included, and that you expect your kin or entity to get a lower score if the game continues.]
  8. If a game fails to complete for any reason it shall be restarted from scratch. It shall be started up to a pre-stated number of times if apparently due to a problem with a DEMO component, including a bot or server; with no limit for any other reason. [If it failed after reaching its restart limit the round would have a bias, but as it should not be systematic, the rest of the round would be retained – we merely have another source of noise. The umpire can detect game hangs if the turn does not advance fast enough (as already done by DTDT).]
  9. A wide range of available variants would be used, chosen with pre-stated probabilities. [Or maybe exact proportions. No doubt biased towards Standard, but not normally near 100%, to encourage generality.]

Player Rules

  1. Standard DAIDE Etiquette, and any DEMO-specific extensions, shall be followed.
  2. Remote players shall be allowed, provided the hardware reasonably matches that specified in the environment.
  3. A bot should rarely have more than n CPU-bound threads, where n would be the effective processor count on my computer [currently 8]. [More would be considered cheating by trying to claim more than others – creating a silly arms race. Additional threads for CPU-bound work would be acceptable if extra ones tend to be suspended, but could be risky due to disqualification due to accidental excessive use. Mostly, competition for CPU would make more than one such thread pointless, but as I have spent so much time developing it, a bot should be allowed to exploit any free time available, which would also tend to improve quality of the tournament for a given elapsed time. Note that sophisticated bots would not always converse as fast as possible: instead sometimes having to wait for others to respond; or use brinkmanship, hoping that others may accept or improve offers, before the first bot decides he prefers to deal with a third party, or the time limit is reached.]
  4. [Initially] bots only. No human assistance that may affect quality of play shall be allowed during a game or between any games that are used for the final evaluation. [See Version. Human assistance may occur during informal "seeding" periods, which are an ongoing feature of DEMO, but sufficient formal final tests should be done to eliminate any chance that human intervention could improve its ability against the current contestants. A bot is allowed to learn during a game and from previous games, including those in formal final tests, see Adaptation.]

Evaluation Rules

  1. The winner in a given environment shall be the bot with highest DAIDE Standard Score. In the unlikely event of equal leaders, rather than split the prize (and kudos), rounds (see below) shall be added until there is a unique winner.
  2. Only bots running on a comparable computer shall be eligible for a prize. [I want to compare software, not hardware.]
  3. No allowance shall be made for the language, or implementation used. [It is up to the author to decide the best compromise between a use of a more difficult to develop compiled language, or easier to develop, but slower interpreted one, say.]
  4. After each game, the DAIDE Standard Score of each power shall be added to the accumulating score of its entity [type of bot and its settings for the environment, or individual human].
  5. A series of [whole] rounds shall be played, each comprising several games.
  6. Each round shall comprise a duplicate game pattern between all combinations of players . [This should give each entity an equivalent set of environments, albeit chance would still be an element as the game proceeded, as well as the fact that some environments may suit one entity more than another. Games where the same entity should play as all sets would not actually be played, as it could have no effect on the score for that entity.] "Duplicate" means a specific set of power assignments in one game, then permuted in a partner games between the same set of entities; repeating the same sets of powers for all permutation. [David Norman's Server -fixedpowers option would enable assigning entities to specific powers. Given this, no reliance need be made on the NME string sent by the player for their identification, so no rules are required here.]
  7. The subset of contestants is selected for each round [from the set selected to be in a series] would be selected by random draws, with probabilities proportional to their current scores. [Weaker entities therefore tend to play in fewer games, representing their reduced proportions in a simulated world due to lower survival rate. Competition should then tend to increase, but with some easy pickings (low grade "drones" being the main "food") while not being a totally predictable environment.]
  8. Contestants may be set to learn during the tournament, as their authors see fit, but all games by a given entity in a given round (see below) must use the same code and data. [The director may be able to assist; for example, by copying files. This rule is needed to ensure consistency play standard by and against each player as they cycle round a pattern; regrettably the rule cannot be applied to humans, even by repeated lobotomies!]
  9. Contestants that do not run reasonably reliably on my computer would be disqualified. [There are one or two!]

Roles

The following human and computer roles are formally defined. In many cases, one physical entity can have more than one role.

Analyst

An analyst is a person who analyses what is to be expected a given environment in future series. The environment may or may not have been deliberately tested when earlier series were planned: the accuracy of an analysis depends on the number or earlier series, and how well they (happen to) correlate with the environment of interest. To improve accuracy, an analyst may request a director (maybe himself) to plan future series within that environment.

Author

The author of a bot (or part of one) is the person who designed and wrote it. He is the owner of the bot (or part) by default. There is typically a primary author who is responsible for the distinctive aspect of the whole bot, such as its AI, others playing a supporting role, such as provision of library functions or general framework.

Broker

A broker is a listener that may forward requests to another listener, according to demand, load and availability. A broker must monitor the listeners that it may use, by their sending periodic messages to the broker to inform it of their loading and (in the absence of a timeout) availability. In principle, an arbitrarily long sequence of brokers could be used, but must be rejected if looping occurs. Caching could be used, including in the original requestor; for instance, all requests after the first from a given player could be directed to the same non-broker listener (which might typically share a ghost anyway).

Captain

A captain is the person who is the spokesperson of a team. In case of conflict, the captain decides. The captain can be changed, even during a game, according to any rules that the team may devise.

Client

A client is a bot or interface for a human player, that is, a DAIDE client.

Contestant

A contestant is a specific entity, representing a specific persona. Each may represent any number of players in a given game, or in multiple games in parallel. Any contestant may play individually and/or in one or more a teams.

Controller

A controller is a program that allows a director to control an organizer, such as viewing and adjusting controls, seeing what games are in progress and viewing any via an observer. A controller run on one computer may control organizers running on more than one computer. MARS can run as an organizer or a controller.

DemoCrossSea

DemoCrossSea (pronounced "democracy") or (less jokily) DEMO Combined Systems (DCS) is the collection of all known DEMO systems (labs) in the DAIDE community. They need not necessarily all have the same implementation, but need to use common formats and protocol to share data.

Demon

A demon is a DEMO node for hiving off work from the main node of the organizer; for example, to run games in parallel, to run a subset of the bots in a given game, or even to allow one bot to use several computers in a given game. To act as a demon it must run a listener.

Knowledgebase

A knowledgebase comprises all the persistent data of a lab, including that of all known variants, contestants and games played, parameters used and (by means of weights in various tables) the mission. "Strategic" data is stored in the SAGA (SQL) database, which is normally viewed and updated by the director using Microsoft Access; except for details of games played, which are inserted automatically by umpires of the organizer. "Tactical" data is stored elsewhere: parameters that control the loading, efficiency and reliability of the organizer and its umpires are stored in the Windows Registry key HKEY_CURRENT_USER\Software\MARS on the computer on which they are run, controllable by the director using a controller; the root of the folder subtree containing all DAIDE files used by DEMO is stored in the DAIDE Windows environment variable.

Director

The director is the person who controls a given lab.

Entity

An entity is a specific type and version of a bot or specific human, or a specific team of bots, humans and/or teams. (A team within a team is represented by its captain, rather than its original members. A bot or human may belong to zero or more teams, but teams containing members in common should not play in a given game, unless the will be oblivious to that fact: usually so for bots; dubious for humans.) Each may by represent any number of contestants, by consistently presenting different persona.

Ghost

A ghost is a slave copy of a bot to which CPU-intensive tasks are delegated on another computer. Typically it would be a clone of the master bot. The master would forward selected DAIDE messages (probably everything except press) so that it knows the map and current position, but generally not press nor heuristic evaluations, and so forth. Non-DAIDE messages would be sent for any further set up and then to request CPU-intensive tasks to be done. A reply message (not a DAIDE reply) would be sent via the listener to the requestor after processing each received message, as confirmation and with any immediate results. Other messages may also be sent in the reverse; for example, with delayed results, such as those after completing a CPU-intensive task. (Using a ghost copy of the same program allows most of the same data structures and functions of the main bot to be used – simpler than developing a distinct program; redundant code and data should not cause significant overhead. One instance of a ghost could act as multiple slaves for a given requestor.)

Group

A group a set of zero (normally two) or more entities. A group is similar to a team in most respects, except for members having no knowledge of others, nor prize (nor kudos). A group may be defined retrospectively, for instance to to investigate how well certain sets of entities tend play together; maybe to form future teams.

Lab

A lab is a complete DEMO system, comprising one or more nodes and associated programs and data, including an organizer, umpire, server, clients, listeners and SAGA database. There may be any number of labs. They may share data, but each generally has a different owner, with different objectives (typically preferring to evaluate and optimize their own bot). To share data, a lab fetches data from sources it trusts – no write access is needed or allowed, except to its director, or in a controlled automated way via its organizer.

Listener

The listener is a program on a demon that listens for requests for services from requestor programs on other nodes normally from an organizer or a bot. Trivial requests are dealt with by the listener itself. But if, as would be usual, an ongoing, contemporaneous dialogue with several requestors may be required, the service is normally hived off to a slave thread or process, which is created if an idle one is not available, together with a dedicated input queue. In this case the requestor is termed the master, and further communication is directly between master and slave. The requestor creates its own temporary input queue for all messages from the listener and any slave, the name of which is passed to the listener in the first message. The listener sends an initial message to the slave, including the master's input queue name; the slave then sends an initial message to that queue, including its own input queue name. The listener sends the name of the slaves input queue in its run argument if a thread, or in a system message queue if a process.

A slave process dedicated to a given requestor may itself have threads acting as sub-slaves, thereby allowing data sharing and a single set up step. This would often be appropriate for a ghost.

A slave may even run on a different computer from the listener. Such a listener is a broker

All communication would [initially] be via MSMQ, which is transaction-oriented (all or nothing) and organizes queuing, with timeout, when there is no connection. [An equivalent platform-independent queuing system would be used if and when non-Windows systems were to be incorporated.] The listener may have a permanent queue, so requests can be queued even if the listener is not running. (Although an immediate failure is probably appropriate here, so the queue could be temporary, it would still exist after a listener crash, so the requestor still needs its own timeout; the queue names certainly need to be fixed. Indeed, during initialization of the listener of the first requestor, any existing temporary queues should be deleted – or emptied pending reuse. For our purposes, any messages in a permanent queue should also be deleted, as they are likely to be out of date.) The listener is responsible for killing the slave and its input queue when no longer needed (either immediately, or before exit if retained for reuse), when no more service is required, or on an error indication, including timeout of response to a periodic poll.

Manager

A manager is the person who manages a contestant, including announcing availability, publishing the EXE files, documentation, and so forth. (Any human is usually his own manager.) The manager decides objectives for the contestant, liaising with the directors of labs, normally via director's automated organizer.

Master

A master is a program that delegates work to a slave, normally on a different node.

Member

A member of DipAi. Membership is a prerequisite to having any special access to DEMO or prizes.

Mongrel

A mongrel is a specific weighted mean of contestants.

Node

A node is one of a set of computers that can communicate for DEMO purposes. Communication is set up via a requestor on one node and the listener on the other, after which the requestor becomes a master, communicating directly with the assigned slave.

Observer

An observer is a program that observes, but does not play in, a game. It may allow a human to observe in real-time (like Mapper or my proposed Viewer would) and/or log for later analysis and/or updating the SAGA database (as Viewer may).

Organizer

An organizer is the program that organizes the running of games between suitable entities on suitable nodes of a lab, according to parameters set by the director via a controller and the database editor. Organizer defines future games in series, normally covering all symmetrical patterns – analogous to duplicate bridge – assigned to groups of interest. Results would be announced in a bulletin on the Web, including specific games for specific environments and current strengths of each entity and team. Details may be made available directly via ODBC from a read-only SAGA database.  MARS can be run as an organizer or a controller.

Owner

An owner is the person who owns all rights to (selected parts of) a program (e.g., a bot) and to the overall configuration and its use (subject to permission from owners of components). He is normally the author, but rights could be transferred. There is typically a primary owner who is owns the distinctive aspect of the whole bot, such as its AI, others playing a supporting role, such as provision of library functions or general framework. In such cases, any rights of the other, secondary, owners would have been subsumed in the standard conditions of use, or specific licences for use, of their parts.

Player

A player is a specific contestant in a specific game.

Requestor

A requestor is a program that communicates with a listener (normally on another computer). It is the organizer or a bot, say, that requires a remote service, not a separate program. Normally the listener assigns a slave thread or process to do the real work, but the main thread can act as its own slave if trivial and no ongoing communication is needed. If ongoing communication may be required, messages go directly between requestor and slave; the requestor is now termed the master and the slave's dedicated input queue is then used. Note that master or slave can initiate new messages, to which the other may send zero or more reply messages in response. The slave must inform the listener when it is free again. However, the listener should monitor its slaves and kill any that appear to have crashed.

Server

A server validates and routes DAIDE messages between bots, adjudicates final orders and terminates the game when appropriate, that is, a DAIDE server.

Slave

A slave is a program on one node that does work for a master program on another node.

Team

A team is a group where each member is aware of the others. Unlike a group, a team is a named entity but has a distinct captain. The score of a team is recorded similarly to that of an entity. A team is derived from a group. Groups and teams are also of value regarding commonality, that is, kinship and affinity. Unlike groups, teams are eligible for a prize. A team has its own assignment formula for selecting member entities in a given game, with duplication possible (and essential if too few members). For example, round robin from the leader or from the previous game of the team. The number of players depends on variant and pattern of current game.

Umpire

An umpire is the part of an organizer that controls a game, including starting the server and all the clients; aborts the game if it hangs; logs final scores to the SAGA database.


Tracking, including use of cookies, is used by this website: see Logging.
Comments about this page are welcome: please post to DipAi or email to me.