Diplomacy
|
Home > Diplomacy > Cogitations > Cooperation
Cogitations on AI in the Game of Diplomacy: Cooperation
Here is a brain dump of full details my current Cogitations about Cooperation. (Later Cogitations override earlier ones if contradictory.)
Here is my latest attempt at a methodology for managing cooperation. It is not yet quite precise nor properly analysed mathematically, but I hope it (or something like it) would tend to lead to optimum cooperative play when appropriate (and when well tuned).
I estimate the utility of each member in a specific camp (set of powers) assuming he acts selfishly for the rest of the game (maximizing his own utility; ignoring the utilities of other members). Then I estimate the utility of the whole camp if all members cooperate for the rest of the game (maximizing the sum of utility – or minimizing cost – of all members, ignoring own utility). The estimated bond ("utility of cooperation") for the camp is the estimated camp utility minus the sum of the member utilities. Cooperation should be worthwhile for any camp with a positive bond; if the overall gains can be shared out satisfactorily is should also be viable. I estimate the fraction of of the total bond for each member; I assume cooperation is not viable if any are negative, even if the total is positive. Note that, by definition, each power only attempts to maximize his utility; he does not explicitly try to minimize the utility of any other power, since any value in doing so is already incorporated in his own utility value.
Members have an incentive to make and maintain their camps viable for cooperation, so that the bonds can be realized as tangible gains by their members. (Conversely, they have an incentive to make non-viable any where they are not members, either by making the total bond negative, making them unstable so that bonds of some powers become negative.) However, in general, many individually viable camps may be incompatible in that cooperation with one will necessarily, or probably, disrupt that in the other. I cooperate with the set of compatible camps that provides the greatest total bond for me, where "compatible" means not having members who are likely to be in (significant) conflict in practice (before gains are expected to have been made, when cooperation can be ended). (NB: This is an NP-hard, knapsack problem (class) to solve perfectly, though for six other powers, as in the Standard variant, there are only 62 potentially useful combinations – since I can ignore all or none – so should be practicable in common instances. Conversely, if, naively, I expect to be in conflict with any non-member before I expect to have made gains, then all camps are mutually incompatible and I only need to find the single best – only linear complexity.)
For each camp, I maintain an account for each member, initialized to zero, representing the degree of injustice to that member caused by other members of the camp. Each camp member's account is credited by the (signed) utility gain he produces for other members; their accounts are debited by the utility they so receive. So the sum of accounts in any camp remains zero. (Any account is only meaningful within its camp; credit/debit cannot be transferred to accounts in other camps.)
Each account has two set of bounds: inner and outer, upper and lower, all independent, adjusted empirically to optimal values. An account is considered to be sound if within its inner bounds; pseudo-sound if only within its outer bounds. Too high and its power is owed too much, so is being unfairly treated; too low and its power owes too much and so has been unfairly treating others. If all accounts in a camp are sound, the distribution is considered to be fair (to all members), so cooperation is expected to be viable. (The bounds would be proportional to estimated individual bonds, which then would not need not be considered separately.) In which case I would try to cooperate and expect other members to do so too.
If I do not expect cooperation in a camp to be viable, but all accounts are within the outer bounds, then I expect cooperation to be pseudo-viable – I expect I can usefully pretend to believe it is viable, to mislead sufficient other members of the camp to make a net gain. Otherwise it is non-viable – I expect I would not be able to mislead sufficient other members of the camp to make a net gain – some members would have to be implausibly naive.
In practice, different players would have different values for everyone's accounts and bounds, and the payoff would differ for fooling each – they would generally not be exactly as I expect – but I can cannot observe these directly. My bounds are merely my empirically determined, heuristic estimates – given the game-state and observed play – of the net break-even points for actual, pretended or non-cooperation, which I use to control my play.
If any member of a camp is apparently fooled (as indicated by final valid orders, or press) by my pretended cooperation, then due credit is moved from my account to theirs. This would probably include a penalty for disruptive effects, which would also be deducted from the camp's total bond and those of all members – thereby reducing all the bounds, making future (actual or pseudo) cooperation even more unlikely. This is why such lies should not be made lightly.
To be viable it is not necessary to believe that others are correct in their beliefs, only that they hold them! An account's bounds are proportional to the power's estimated bond in that camp and other, empirically determined, heuristic factors. Estimated bonds and bounds, will, of course, vary throughout the game, and even during a turn (though it may be too expensive to continually revaluate them properly).
Actual play by a given member will generally be neither perfectly cooperative nor perfectly selfish; these merely represent his extreme options with respect to the camp. Indeed, the estimated bonds apply to the rest of the game. It may seem to become academic after a stab, say, but negative bonds represent an estimate of what must be repaid to regain trust. In a given turn it may not be possible to play at either extreme, or may not be practicable for various reasons: for example, due to conflicting requirements of different camps or uncertainties of what other may do. The net degree of selfishness of actual play with respect to the camp is the reduction in the camp's utility (total utility of all members, including self) compared to fully cooperative play (play that maximizes total utility of all members, including self).
I may also credit or debit "interest" or "damages" due to "opportunity costs" and general disruption saved or caused. To do this, each account would be multiplied by a factor greater than one, thereby duly debiting and crediting each account, maintaining a zero sum. Prompt payment is thereby encouraged – ideally balanced each turn. Conversely, it could be desirable to forgive debt – otherwise cooperation may be unlikely ever to become viable again. (This is analogous to bankruptcy, or forgiving third-world debt. It would sometime be desirable to correct for otherwise unstable mutual misjudgements of how other members have evaluated accounts, bonds and bounds.) To do this, each account would be multiplied by a factor less than one. The optimum factor at any time would have to be determined empirically – if in doubt assume one, which should anyway tend to be the geometric mean.
If a camp becomes so selfish that its bond becomes negative then it ceases to be worthwhile – and hence no longer viable – but recall that it also ceases to be viable if any member's account is beyond his bounds, so balance between members is also important. It is to the advantage to all members of a camp where cooperation is viable to keep it so, thereby enabling all the remaining bond value to be extracted (and shared – not necessarily equally, but any positive amount is worthwhile to the power concerned). This means not being so greedy that any member is likely to have (my estimate of) his account go beyond (my estimate of) his bounds – and discouraging any other member from being so greedy, even if only to the detriment of another member rather than me. Conversely, it may be possible for me to pump-prime a camp to make it viable, and with my expected bond big enough to make me a net expected profit.
Generally, it should be best to try to maintain all accounts within their inner bounds when a long-term stable relationship is important. If any account is allowed to move between its inner and outer bounds there will tend to be significant chances for big short-terms gains or losses, and significant risk a break up of the relationship, though it may be possible to stabilize it again – so best to allow such divergence when only a gamble will do. If any account is allowed to move beyond its outer bounds then the relationship is effectively non-existent – no gains or losses from it, since at least one party will distrust at least some of the other parties, or believe that others will so distrust – and the relationship will tend to be difficult to form or reform.
The more a member is in credit (after all treaties are complied with, or things merely happen as each says he expects them to) the more readily he will expect treaties to be agreed and actual orders to be issued by camp members that improve his estimated utility. Conversely, the more in debit (after all treaties are complied with, and so on), the less readily he will expect treaties to be agreed and actual orders to be issued by camp members that improve his estimated utility. Similarly, the more credit a member has, the less would his selfish behaviour be tolerated; the more debit the more would his selfish behaviour be tolerated. A zero balance has neutral effect – it would be equivalent to a camp where cooperation is not worthwhile, except that in the latter case all the bounds would be at zero. Accounts should be assigned a proportion of their due credit and debit upon agreeing a treaty (the proportion being the estimated probability of it being fully complied with, which depends on trust), with the balance when actually done; the advance payment, and a penalty (equal to the expected loss due to the misinformation) reclaimed, being the expected cost of incomplete compliance (which may or may not be the same for different degrees of non-compliance).
Although any member would, naively, like to behave selfishly, he should expect more net gain by cooperating when it is expected to be viable. The bigger the total and individual bonds, and corresponding wider bounds, the bigger the tendency of all to cooperate, though the incentive would generally not the same for all. As the members with more credit (or less debit) have tended to act less selfishly, there is an incentive for others to get them into more debit (or less credit) – which they can do by tending to help them – and their beneficiaries will tend to feel free to accept the help, and less inclined to give more help until they get repaid.
The more credit (or less debit) a member has the more other members will tend to trust him and the more they will tend to allow him to go into debit in future and, more importantly, the bigger the possibilities for unapproved selfishness (and bigger risk) would be tolerated, thereby allowing a wider choice of deals, which should tend to improve opportunities for cooperation, albeit with more scope for a selfishness which may never be repaid (a stab being the most extreme form). (The trust of one power for another would be the sum of trust in all camps in which they are both present; trust would not be camp specific.)
The rate at which credit or debit affects member's actions is for each to judge, according to experience against typical, hence expected, opponents, in similar scenarios. The more tolerant the member, the greater his potential gains (due to wider choice of play by honourable members) and losses (due dishonourable members).