Matt Corks
mvcorks [AT] alumni [dot] uwaterloo [dot] ca
The Prisoner’s Dilemma is a game devised by Merrill Flood & Melvin Dresher in 1950 to model human interaction and altruism. It is played between two players, who must choose simultaneously between two possible alternatives. The possible outcomes are as follows:
| The Prisoner’s Dilemma | ||
| C | C | |
| C | R,R | S,T |
| D | T,S | P,P |
Here, R denotes the reward for co-operation, P the punishment for mutual defection, T the temptation to defect, and S the sucker’s payoff. In the Prisoner’s Dilemma, T > R > P > S, and 2R > T + S. When playing only one game, it is clearly in your interest to defect regardless of your opponent’s move, because you will be better off either way. In game theory terms, you have a dominant strategy, and since your partner is in precisely the same position as you, the game has a dominant strategy equilibrium— mutual defection. This outcome, however, is Pareto-inferior to mutual co-operation: it is worse for all the players to end up in that state. The Prisoner’s Dilemma is the only strictly ordinal 2×2 game that has a dominant strategy equilibrium which is not Pareto-efficient, or is Pareto-inferior to some other state. This becomes important in the Iterated Prisoner’s Dilemma (IPD), in which you play several rounds of this game with your opponent. This talk will discuss optimal approaches for the IPD.
Although IPD has been widely discussed in psychology and game theory since 1950, it was not until Axelrod’s Evolution of Cooperation in 1984 that a theorist had attempted to find a best approach to the game. Axelrod noted that this had not been discussed in the literature of game theory, and decided to hold a computer tournament between programs which employed different strategies. He requested submissions from several prominent researchers in game theory from various disciplines, and received 13 responses. He played all of these against each other and a random strategy in a round robin tournament, and averaged their scores. The winner, famously, was the strategy tit-for-tat (TFT), as submitted by Anatol Rapoport of the University of Toronto. This strategy co-operates in the first move, and thereafter echos their opponents’ move (hence the name).
| Axelrod’s payoff values | ||
| C | C | |
| C | 3,3 | 0,5 |
| D | 5,0 | 1,1 |
Every round in the tournament consisted of 200 moves, and if two programs co-operated with each other in every round they would each obtain a score of 600 points. TFT won with an average score of 504.5, and the random strategy came last with an average score of 276.3. Many of the strategies were intended to exploit their opponents, but these fared poorly. The best strategies in this tournament were those which shared one or more of the main features of TFT: they were nice, meaning that they began by co-operating in the first move, they co-operated well, meaning that they were never the first to defect on an opponent, they retaliated, meaning that they did not ignore defection but responded in kind, and they were forgiving, meaning that they would resume co-operating if their opponent made just one move in that direction. However, Axelrod stressed that TFT was the best strategy merely in the context of that tournament. He points out three other strategies which would have performed better, including tit-for-two-tats, which will forgive one defection before retaliating.
Axelrod then held a second computer tournament, to which 63 people submitted entries. The contestants had clearly been influenced by the first tournament: Axelrod says that about half thought that the moral of the first tournament was that a strategy had to be sneakier about defecting, and half thought that the moral was that a strategy should be more forgiving and co-operative. Nevertheless, TFT was still the superior strategy. Axelrod then evaluated these 63 strategies by using a genetic algorithms approach. These are modeled after biological evolution. He started with a population of strategies, with one representative of each species, or competitor. If a strategy performed well, in the next generation it would be represented more than once, and if a strategy did poorly, it would die off. There were actually six of these ‘ecological’ tournaments held under different conditions (such as different values of R), and TFT eventually dominated five of these, and placed second in the sixth. Again, however, TFT is only performing well in the context of this tournament, relative to the other entries. It also performed much worse when an error rate of 1% was added to the tournament. Although the entries were written by a wide variety of game theoreticians, they represent only a small number of the possible strategies to the IPD. This genetic algorithms approach does suggest another way to determine an optimal approach to IPD, however.
In 1993, a biologist named Nowak and a mathematician named Sigmund began studying the IPD using models of biological evolution. In the terms of evolutionary biology, a strategy M can invade a population of strategy N if v(M|N) > v(N|N), that is, if the characteristic value (expected payoff) of a game between M and N is greater than the characteristic value of a game between N and N. Note that no single mutant strategy can do better than TFT against TFT if w>max{((T-R)÷(T-P)),((T-R)÷(R-S))}, where w is the factor by which future payoffs are discounted1, but AllC had this property as well.
Nowak and Sigmund decided to hold a computer tournament similar to Axelrod’s ‘ecological’ tournament, with the difference that the starting strategies would not be those they had predetermined. Instead, they described a strategy with two parameters, (p,q), which determined the probability that the player would co-operate given that their opponent had co-operated or defected with them in the previous round. Here, TFT would be represented as (1,0), AllC by (1,1), and a generous TFT (GTFT) by (1,⅓). Their playing field would be seeded with such strategies, with p and q randomly chosen. They reasoned that this was a better model of evolution, since their players had short memories and only considered the previous move, and since their decisions were at least partially determined by chance, mimicking imperfect communication systems which exist between biological organisms. In the real world, they reasoned, players could easily misinterpret each other’s moves, misidentify players, or misimplement their own intentions. The starting move of each player was unimportant, since their tournaments lasted for at least 107 moves, and used noisy communication.
They found that initially, players close to (0,0) would perform well, since there were many opponents for them to play which were close to (1,1). These would soon die off. If a strategy near (1,0) was present, it would begin to grow, since these players would profit from co-operation with each other, but be resistant to invasion by defectors. After the TFT-like strategies had cleared out the defectors, a different strategy would predominate: that closest to the theoretically optimal reactive strategy, (1,q), with q=min{1-((T-R)÷(R-S)),((R-P)÷(T-P))}, or q = ⅓ for Axelrod’s payoff values. Although this strategy was the most stable, since it was able to correct for mistakes in communication well, no co-operation could occur without the initial presence of a strategy similar to TFT.
Nowak & Sigmund continued their research by holding a similar tournament with organisms that had four parameters: (p1,p2,p3,p4). These determine the probability that the strategy will co-operate with another player after receiving the payoff R, S, T, or P. In addition to seeding the playing field with random players, they would replace a player with a random strategy in each round with probability p=0.01 to simulate genetic drift, and they required that none of the four parameters of any organism be closer to 1 or 0 than ε, with ε=0.001, to better model imperfect communication in nature. As in their previous study, the opening move of each strategy was unimportant.
In this simulation, a different strategy was found to be the most stable in the evolutionary sense. This strategy was approximately (1,0,0,1). It had previously been named ‘Pavlov’ by mathematicians David & Vivian Kraines, since it continued any action which resulted in a high (T or R) payoff, but otherwise changed its approach. Unlike TFT, it cannot invade the strategy AllD, and like GTFT it is tolerant of mistakes in communication. Its main advantage is that it cannot be invaded by a gradual drift of organisms close to AllC, unlike TFT and GTFT, since after a single mistake in communication Pavlov will exploit and kill the overly-generous co-operator. This keeps the population resistant to attack by AllD.
After seeding their playing field with the strategy (0.5,0.5,0.5,0.5), Nowak & Sigmund watched their organisms evolve. In most cases, the average payoff of each round remained very close to either R or P, meaning that all the organisms were co-operating or defecting with each other. After 104 generations, 27.5% of the tournaments had an average payoff greater than 2.95, meaning that co-operators predominated. 10% of these were dominated by the strategy Pavlov. After 107 generations, 90% of tournaments had an average payoff above 2.95, and 82.5% were dominated by Pavlov.
Nowak & Sigmund also ran tournaments in which R was varied, and in which strategies had access to the results of the past several moves. Pavlov still dominated these games. It should be noted that the Pavlov strategy does very poorly against the strategy AllD, given by (0,0,0,0), since it co-operates with every second move and gets the sucker’s payoff. Even this can be overcome: when the value of R is set to 3, the Pavlov-like strategy (0.999,0.001,0.001,0.995) cannot be invaded by AllD. As their earlier experiment showed, however, co-operation still evolved soonest when a TFT-like strategy cleared out the consistent defectors.
It is usually assumed that the best overall approach to the IPD is TFT, as
Axelrod’s tournaments imply. However, as Axelrod emphasized, TFT was merely
the best strategy to use against the other entries he received. The genetic
algorithms approach offers a less biased approach, and it implies that in a
population of organisms with strategies randomly distributed between
co-operators and defectors, TFT is a necessary catalyst to bring about
co-operation. In the long term, the most evolutionarily stable strategy is
Pavlov. Whether this means it is a better approach to the general problem of
the Iterated Prisoner’s Dilemma game, however, is for the reader to decide.
The Evolution of Cooperation. Robert Axelrod. Basic Books, 1985.
“Tit-For-Tat in a Heterogeneous Population”. Martin A. Nowak and Karl Sigmund in Nature, vol. 355, no. 6357, pp 250-253; January 16, 1992.
“A Strategy of Win-Stay, Lose-Shift That Outperforms Tit-For-Tat in the Prisoner’s Dilemma Game”. Martin A. Nowak and Karl Sigmund in Nature, vol. 364, no. 6432, pp 56-58; July 1, 1993.
“The Arithmetics of Mutual Help”. Martin A. Nowak, Robert M. May, and Karl Sigmund in Scientific American, pp 76-81; June 1995.