Diplomacy
Cogitations

Theory of Moves

John Newbury       24 February 2010

Home > Diplomacy > Cogitations > Theory of Moves

Cogitations on AI in the Game of Diplomacy: Theory of Moves


Here is a brain dump of full details my currently Cogitations about the Theory of Moves (TOM). (Later Cogitations override earlier ones if contradictory.)

Contents

2010-02-24

Although it has evidently been around for a while, I only recently became aware of the Theory of Moves (TOM). See Theory of Moves: an Overview and Examples by Brams, which I refer to below. It complements classical Games Theory (GT), only requiring a ranking of utilities, not actual values, and incorporates look-ahead of rational responses to choices made. It requires and expects no mixed strategies. It can also give some insights into behaviours that GT cannot, including sometimes having equilibria that are not on Nash Equilibria.

Brams' introduction to TOM ends with an interesting comparison of how TOM and GT could analyse how President Carter failed to use the best strategy against Ayatollah Khomeini during the Iranian hostage crisis. TOM explains why Carter was rational, given what outcomes he probably assumed Khomeini would prefer. His mistake was in misunderstanding Khomeini, not realizing that his obstructive strategy was his dominant choice (independent of Carter's action) for his own rational reasons, due to internal power-politics. GT predicts what was actually the final state, given Carter's assumed game or the true game; thereby seeming to indicate that Carter was irrational in the way he started (threatening military force); thereby finally being humiliated into accepting what Khomeini had originally demanded.

I was particularly intrigued when I heard about TOM, because I thought it might address my concerns about classical or Objective GT. It does not – see that link and below – but at least it gives some insights, and I am pleased to see that others have concerns too, albeit here they different from mine.

TOM asserts that there is no point in worrying about exact utility values during at least the initial evaluation; just enough to rank them. As they say, accurate utility values are difficult to calculate anyway. This is intuitively the case: it is typically used iteratively in GT to eliminate dominated, and consequentially dominated, states, prior to considerations of mixed strategies when no more states can be eliminated in that way, but more than one remains. However, if realistic, unbounded (e.g., Gaussian) noise is introduced – due to errors in information available, calculation of the resultant utilities, calculation of the optimum solution, or implementation of the solution, due to bugs, or approximations necessitated by limited resources and intractable problem, then nothing can be eliminated for sure; every game should, strictly speaking, require a mixed strategy. However, states that are probably dominated, given the estimated error, could still usefully be eliminated as a heuristic to tackle the intractability of GT.


However, although TOM is of interest and gives some insights, I am not convinced of its wide applicability in the real world, or even in Diplomacy. Indeed, as the Brams states, although the 2*2 matrix that they fully analyse there would be fast to compute, general matrices would be intractable to compute exactly.

Furthermore, I believe that TOM has a trivially provable weakness in reality, due to the assumption that it is only necessary to know the absolute preference each player's preference ordering; never needs the actual utility values. For example, consider the main game that Brams analyses, #56, where there is a mutually worst case, ranked (1,1), where 1 indicates lowest utility. If that represented mutual destruction, whereas the other payoffs gained them only, say, $2, $3 or $4, then each rational player should virtually certainly avoid any possibility of the worst outcome if there were the slightest chance of any error in the apparent facts, reliability of calculations, or implementation of the conclusions, by either player, even if it then almost certainly loses each player a few dollars! That logic would apply to the destroyed player, even when only one player would be destroyed. It would apply to the other player too, if he has any moral sense, but that is already incorporated in his utility!

I am convinced that an assumption of perfect knowledge and rationality in Objective GT and TOM would often lead to inconsistent, therefore sometimes very wrong, conclusions, in Diplomacy and the real world. At the very least, non-zero error terms in the payoff matrix seems to be needed for a self-consistent model, thereby, if well chosen, tending to produce a more accurate solution, since the form of the model limits how well it can be tuned to represent reality.

I think that estimates of each player's knowledge and rationality, including recursive estimates of each other's estimates, rather than assuming common knowledge of your own estimates, could better still. There would, of course, have to be a cut-off to the recursion, which has unbounded cost, but diminishing returns. If worth going beyond assumed common knowledge of your own calculations of utilities and GT deductions, I think that, using your estimates of how each player would perturb your estimates in their decision would be worthwhile, whether or not it was worthwhile considering the complexity of generating a recursive tree of each player's world-view, as demanded by a more perfect model of minds, as I have discussed in Representing Beliefs.

I doubt that I would attempt a recursive model. I do plan to use a perturbation model in DeepLoamSea (DLS), where the probability of any simulated play would depend on what appear to be that player's heuristic weights, rather than what DLS considers optimum; each being a different perturbation of DLS's weights. Only one serial set of worlds would be considered and adjudicated, not a set serials worlds for each player, let alone a tree of them, as would be required for each level of such recursive model of minds within minds. That would at least provide the possibility of modelling, and hence exploiting, simple apparent randomness or bias in weaker players. By tending to adopt the heuristic weights and parameters for DLS's heuristic functions that appear to be used by the better players (including DLS when using different parameters and weights), and eschewing those of the weaker ones (even when DLS was a mere observer in some conflicts), he would tend to emulate the better player and distance himself from the play of the weaker players – within the limits of the heuristic functions that he has been given, or can devise from given primitives. He would, presumably, hopefully, tend to use classical, Objective GT against those players who tend to exploit his attempts to deviate from that, but he could similarly exploit other players who systematically fail to use Objective GT, due to plain randomness or bias.


Tracking, including use of cookies, is used by this website: see Logging.
Comments about this page are welcome: please post to DipAi or email to me.