Prerequisites: All Theory Chapters, in particular Chapter 7, Chapter 8, Chapter 9, and the first part on poker.
Pre-Class Activity: Play ten rounds of KUHNPOKER(3,4,2,3) in this applet to refresh your familiarity with the game.
In the previous chapter on VNM POKER and KUHN POKER, we gave a description of the games, provided extensive forms, looked at pure strategies and reduced pure strategies, found that some are always weakly dominated, and described the Normal Form for a few cases. Although some small cases of the games had pure Nash equilibria, this is no longer true for KUHN POKER and S=3. In this Chapter we find a pure Nash equilibrium in mixed strategies of KUHNPOKER(3,4,2,3). Since for this game, behavior strategies are much more natural than mixed ones, we also discuss the relation between both.
Recall that after eliminating weakly dominated strategies, we arrive at the following matrix:
A slightly larger 18 × 16 matrix appears in the sheet "Start" in the file KUHN3.xls. In this Excel file, which we will use heavily throughout this Chapter, you can automatically look for domination, pure Nash equilibria, and also perform Brown's fictitious play.
Every semester a poker tournament is part of class. The announcement looks like this:
Class Activity: Create your own robot. What 12 numbers would you choose, and why?
In Spring 2011, students and teacher submitted the following robots:
|.||When having the first move (playing as Ann)||When having the second move move (playing as Beth)|
|.||probability for raising||probability for calling||probability for calling||probability for raising|
|Rupert the robot||0.3||0.8||1||0||0.75||1||0||0.75||1||0.5||0.8||1|
Since the string of these 12 numbers uniquely determines the robots behavior, we call it the DNA of the robot. To get 16 players for a knock-out tournament, three more robots were added. Two identical "Random" robots, who always decide randomly with 50% probability, and another one, called "SophRandom", which always raises and calls when facing a king, but otherwise always decides randomly with 50% probability.
To make the outcome in our robot tournament a little less dependent on luck and more on the quality of the robot, in each pairing 200 rounds were played, with the robot having increased the amount of money being the winner. You find all robots and the playground here. Who do you think did win the tournament. And who should have won?
The first six numbers and also the second six numbers the DNA form behavior strategies for the Ann player respectively the Beth player: For every information set, probabilities for the different options are given. First we translate these behavior strategies into mixed strategies, a mix of some pure strategies. The method is rather straightforward. For any given pure strategy, as for instance RCRFCC for the Ann (first moving) player, we just look at the six information sets and multiply the probabilities for the chosen options in the pure strategy as they appear in the behavior strategy. In case of a "•" in a reduced strategy, we multiply by 1.
Let me illustrate this on a concrete example of "Rupert the robot". Since Rupert playing first always raises with a king, never calls with a jack, and always calls with a king, all pure strategies of the form xyCzvw, xyzCvw, and xyzvwF are played with probability 0. For the other pure strategies, examples for the calculations are p(CCRFF•) = 0.7 · 0.2 · 1 · 1 · 0.25 · 1 = 0.035, and p(CRRF••) = 0.7 · 0.8 · 1 · 1 · 1 · 1 = 0.56, and so on. All together Rupert has nonzero probability only for the six reduced pure strategies CCRFF•, CCRFC•, CRRF••, RCR•F•, RCR•C•, and RRR•••. If Rupert is playing the Beth role, then it is never calling with a jack, always calling with a king, and always raising with a king, meaning that only pure strategies of the form FxCyzR have nonempty probability. Examples are p(FFCRRR) = 1 · 0.25 · 1 · 0.5 · 0.8 · 1 = 0.01, and p(FCCCCR) = 1 · 0.75 · 1 · 0.5 · 0.2 · 1 = 0.075, and so on. Rupert has nonempty probabilities for the eight pure strategies FFCCCR, FFCCRR, FFCRCR, FFCRRR, FCCCCR, FCCCRR, FCCRCR, and FCCRRR of Beth.
On the sheet "Behavior" in the Excel file KUHN3.xls these calculations are done automatically if you input the behavior strategies for two players on the top. The probabilities appear in the green and blue cells. However, this sheet does not use the full matrix. Remember that weakly dominated strategies have already been removed. Of course there are behavior strategies that use some weakly dominated pure strategies with nonempty probabilities---in such cases you would need to use the sheet "BehFull", which uses the full 27 × 64 matrix.
If we have found a corresponding mixed strategy for every behavior strategy of our robots, it is fairly easy to calculate the expected payoff for a play with two robots. As an expected value, it is the sum of products of probabilities of an outcome and payoff of that outcome. Each pair of a pure strategy SA of Ann and a pure strategy SB of Beth defines such an outcome, a cell in the payoff matrix corresponding to row SA and column SB, and the probability for that outcome, the probability for Ann choosing SA and Beth choosing SB is just the product of the probabilities of Ann choosing SA in her mix, and Beth choosing SB.
All the expected number of points for each pairing (for 200 rounds, with first mover role alternating) are displayed in the table below. Cells are colored green or red depending on the sign of the value. The next to last column contains the average of all values in the row, the average payoff if this robot plays against any other robot. The robots in the table are already sorted by these numbers. The last column counts the number of wins in each row, the number of other robots that this particular robot would beat. Note that there is a correlation between average expectation and number of wins, but only to some extend.
Of course, these expectations were not always met by reality. For instance, The Hierophant played against Bill Nye in the first round, and although the expected outcome of the 200 round game was 16.22, the actual outcome was -10, meaning that Bill Nye did win. So we see a deviation of -26,22 here. The deviations of the other plays in the first and second round of the knock-out tournament were -13.6, -46.2, -36.9, 52.8, -84.7, -71.5, 2.4, -32.1, and -26.5. Thus, although higher deviations occurred, the "typical" deviation was between about -50 and 50. That also implies that if the expectations are much higher than that, as the 167 of Rupert the robot versus the Random robot, the outcome seems to be rather clear.
According to Nash's Theorem, this game must have at least one Nash equilibrium in mixed strategies. Since it is a zero-sum game, it can be found by Brown's fictitious play. In the Excel file, this can be found on the sheet called "Brown". We get a pair of 2/3 of CCRFC• and 1/3 of RCR•C• for the first moving player, Ann, and of 2/3 of FCCCCR and 1/3 of FCCRCR for the second moving player, Beth. The expected payoff for Ann equals -0.02, thus KUHN POKER(3,4,2,3) has a very slight advantage for Beth.
Which one of the robot behavior strategies corresponds to these mixes, which one was the teacher's robot? To answer this question, we need the more complicated method to transform mixed strategies into corresponding behavior strategies explained in the Chapter on Behavior Strategies.
This is easier for Beth, since her information sets don't have a history for Beth: In each one of Beth's information sets, Beth is about to move the first time. In this case, the probability Beth for a move in such an information set is just the probability of mixed strategies doing that move. Beth would fold with a jack and call with queen or king when Ann has raised, and when Ann has checked, Beth would raise in 1/3 of the cases with a jack, never with a queen, and always with a king. Therefore the behavior strategy for Beth is 0,1,1,1/3,0,1.
The same can be done for Ann in her first three information sets. Therefore Ann would raise in 1/3 of the cases with a jack, never with a queen, but always with a king. The other three information sets of Ann have an information set history---Ann knows that she checked the move before. Therefore only those pure strategies in the Nash mix are considered that do that. Then Ann's percentage for calling or folding are the percentages of the pure strategies considered that call respectively fold in this information set. For instance, in Ann's fourth information set, if Ann has a jack and has to call or fold, we only consider CCRFC•, since RCR•C• will never result in this fourth information set. Therefore Ann folds here. In the fifth information set where Ann holds a queen, we consider both pure strategies CCRFC• and RCR•C• and since both these strategies always call with a queen, Ann will always call in this fifth information set. Finally, if Ann has a king and is about to call or fold, none of the two pure strategies agrees with the information set history (of checking with a king), therefore we consider no pure strategy, and we can choose the probability as we want, since we will never face this information set anyway. Therefore behavior strategies for the Ann part are 1/3,0,1,0,1,x with x any number 0 ≤ x ≤ 1. It is interesting, but a coincidence, that both recommendations to Ann and Beth look just the same---for other parameters this is no longer true.
These two behavior strategies for the Ann and the Beth roles describe Amadeus's behavior. Amadeus is the robot corresponding to the Nash equilibrium mix, and therefore it is clear that and why Amadeus has a nonnegative expectation against every other robot in the symmetric game consisting of an even number of rounds with alternating Ann/Beth roles.
One word about the most fun part of poker, the bluffing. In Kuhn poker, bluffing consists of raising with a jack, and both Ann and Beth can and should bluff occasionally. Note also that one should raise only with low (occasionally) and high (always) cards, but never for middle value cards, a pattern that von Neumann and Morgenstern already observed in their analysis of variants of VNM POKER.
Remember that usually, if we have a Nash equilibrium of MixA versus MixB, then MixB is not the only mixed strategy of Beth that behaves optimally against MixA. According to the Indifference Theorem, every one of Beth's pure strategies that occurs with positive probability in MixB is a best response for Beth against MixA. Also every mix (with different probabilities) of these strategies works. And there may be even more, even strategies that have been already eliminated as weakly dominated may be among them. In our example, the only best responses to Ann's Nash equilibrium mix of 2/3 of CCRFC• and 1/3 of RCR•C• are the ingredients FCCCCR and FCCRCR of Beth's Nash mix, but also FCCCRR and FCCRRR. The only best responses to Beth's Nash mix are the ingredients FCCCCR and FCCRCR of Ann's corresponding Nash mix.
Looking closely at these mixed strategies, and what behavior strategies can create them, we see that the only optimal behavior strategies against the Nash Mix (the Amadeus robot), the only ones achieving an expected payoff of 0 against Amadeus have the DNA of x,0,1|y,1,z|0,1,1|u,v,1 with y ≤ x. In the list above, none of the robots except Amadeus has this form. Therefore, although there are many robots possible that would on the long run tie against the Nash mix, in reality such strategies are rare.
The expected payoff for 200 rounds Amadeus versus Max is 10.3. Probably this implies some advantage for Amadeus, but what are the odds for Amadeus winning? Remember that typical deviations in 200 rounds range between about -50 and 50. In this section we will calculate the probabilities of Amadeus winning, a draw, and Max winning, in the 200 rounds play, and demonstrate how to do it for every pairing of robots.
What we need first are the probabilities for a pure Ann strategy like CCRFC• winning 3 units, 2 units, nothing, losing 2 units, and 3 units, against a pure Beth strategy like FFCCCR. We consider the 9 cases of card distributions to Ann and Beth separately, and check what the outcome will be if both players play according to their pure strategy. In our concrete example, CCRFC• versus FFCCCR, Ann's payoffs in the 0 cases are as follows:
In our example of Amadeus versus Max, if Amadeus plays the Ann role, Amadeus wins 3, 2, 0, -2, -3 with probabilities a3=0.097, a2=0.305, a0=0.242, a-2=0.162, and a-3=0.194. So Amadeus wins with probability 0.097+0.305 = 0.402, draws with probability 0.242, and loses with probability 0.162+0.194=0.356 when Amadeus moves first. If Max plays the Ann role, then Max wins 3, 2, 0, -2, -3 with probabilities b3=0.130, b2=0.230, b0=0.242, b-2=0.287, and b-3=0.110.
What about 2 rounds, where in the first round Amadeus moves first (has the Ann role), and in the second round Beth moves first? This is a multi-step experiment which is best described by the probability digraph to the right. The probabilities for the different payoffs for Amadeus can be expressed in terms of the ai and bj as shown in the digraph. the probabilities for Amadeus winning 6, 5, ... -5, -6 units in these two rounds are 0.011, 0.062, 0.088, 0.050, 0.144, 0.040, 0.209, 0.095, 0.095, 0.078, 0.037, 0.066, and 0.025. These numbers are displayed in the chart below. The probability for Amadeus winning in two rounds is the sum of the probabilities of the states 1, ... 6, which equals 39.4% in our example. It is the sum of the areas of the six rectangles on the left. The probability for a draw in two rounds equals 20.9%, and the probability for Max winning equals 39.7%, the sum of the areas of the six right rectangles. Although Amadeus has a higher expectation in the two-round game, Max is a tiny little bit more likely to win!
But wasn't Amadeus supposed to beat or draw every other robot? Well, only as far as the expected value of money won is concerned. Although the chart has more area to the right, the area to the left extends a little more from the middle, so that the schwerpunkt of the figure is a little left from the middle.
Of course, the probability digraph can be extended to more than 2 rounds. Such digraphs are implemented on the sheet "Odds" in the KUHN3Odds.xls file. Put behavior strategy values into the black rectangles in the blue area, and the winning, drawing, and losing probabilities for round numbers between 1 and 250 are shown on the left border (cells B, C, and D). The chart below was created by this Excel sheet and shows the probabilities for Amadeus winning, drawing, or losing against Max, for different number of rounds. With increasing number of rounds, the probability for Amadeus winning exceeds the probability for losing, and also increases overall. What do you think, will the probability for Amadeus winning ever hit the 70% line, for some very large number of rounds?
"The longer you play, the less luck is a factor. A single hand of poker is about 99 percent luck. After eight hours of playing, it is about 88 percent luck" was Chris Ferguson's answer on luck in poker [PP]. But although the numbers are surely not meant to be precise, isn't it interesting how little even this professional, who is well-known for his analytical approach to poker, estimates the skill factor even after 8 hours? What if we would play much longer than that, could the influence of luck be almost completely reduced to zero?
Let me explain things at a simpler example, a so-called Bernoulli experiment. Assume a coin falls "head" with 51% probability and "tail" with 49%. Ann wins one dollar from Beth if heads shows, and otherwise Ann has to give one dollar to Beth. Obviously the expected payoff for Ann in one round is 0.51·1 + 0.49·(-1) = 0.02. Then in 100 rounds, Ann is expected to win 2 dollars, in 400 rounds 8 dollars, and in 1600 rounds 32 dollars. But of course, in reality the payoffs will deviate again. In the 100 round case, the typical deviation from the expected value lies between -5 and 5, (meaning that in 70% of the cases the deviation lies between these numbers). In the 400 round case, the typical deviations are larger, between -10 and 10, but the size of the deviations are growing slower than the expected values. In the 1600 round case, the typical deviations double again and lie between -20 and 20, but since the expected value increased by a factor of 4 to 32, in more than 70% of the cases Ann will have won between 12 and 52 dollars. If we proceed to larger and larger number of rounds, we can increase the winning probability for Ann as close to 1 as we desire.
The same pattern occurs in our slightly more complicated case. Provided a robot has a positive expected payoff against another player, we can force the winning probability of that robot against the other to be close to 1 by (heavily) increasing the number of rounds played. Not only does the curve above hit 70%, it will even approach approach 100%, if we increase the number of rounds dramatically.
Let's return to our question at the end of section 1: Who should have won? If we would have played a very high number of rounds, as 1000000 rounds, the results from section 4 imply that the Nash mix robot, Amadeus, would win with very high probability against each one of the other robots, since it has a positive expectation against each of them. Therefore Amadeus is the favorite in that case. However, for smaller number of rounds as 200, this ain't necessarily so. Instead it depends on the population of the robots in the tournament.
Let us look at a a smaller case---a knock-out tournament of only four players, Amadeus, Rupert, and two clones of Max, called Max1 and Max2. Since draws are impossible in such tournaments, we assume that if a draw would occur, then 200 more rounds are played. Therefore odds of 56,3%:1.3%:42.3% for Ann winning, drawing, losing, may change into odds of 57%:43%. We divide the probabilities up, proportional to the probabilities for winning and losing. Now the probabilities for the row player to win in the 200 round version are given in the table below:
|..||Amadeus||Max1||Max2||Rupert the robot|
|Rupert the robot||42.7%||66.6%||66.6%||-|
How likely is the winning of the tournament for each one of the four players?
There are three possible pairings: Amadeus-Max1 and Max2-Rupert, or Amadeus-Max2 and Max1-Rupert, and Amadeus-Rupert and Max1-Max2. Each one of the three pairings is equally likely.
For knock-out tournaments of 8 or 16 players, the same analysis is possible once all odds are available. However, a 8 player tournament has 7 · 5 · 3 · 1 · 3 = 315 different pairing patterns for the first and second round, which makes analysis without the help of computers fairly tedious.
The sequence of the 12 probabilities that creates the behavior strategies for a player is the "DNA" of a robot. Such strands could be close or far apart, depending on how many numbers are identical, or at least close, in both strands. More precisely, one could calculate the sum of the 12 squares of the corresponding differences as a measure of closeness. For instance, the DNA of MP Hamster and Jew Bear are rather close, the distance is (0.75-0.75)2 + (1-0.99)2 + (1-1)2 + (0.5-0)2 + (1-1)2 + (1-1)2 + (0.45-0)2 + (1-0.67)2 + (1-1)2 + (0.3-0.1)2 + (0.8-1)2 + (1-1)2. This is equal to 0 + 0.0001 + 0 + 0.25 + 0 + 0 + 0.2025 + 0.1089 + 0 + 0.04 + 0.04 + 0 = 0.642. This sum, this measure of similarity, could actually be interpreted as a distance in 12-dimesional space. It is always between 0 and 12, with 0 meaning that both robots behave totally identical, whereas a distance of 12 would indicate that both play a pure strategy and both play the opposite in each information set. While one would, playing Ann, always raise with a jack, the other would always check with a jack, and so on. For another example, the distance of the DNAs of MP Hamster and Jocker equals 3.745. It may be interesting to investigate whether closeness of the DNA also implies similar results against other robots. For instance, in the expected payoff table above, there is some similarity between Amadeus and Jocker, since both beat the first three robots in the list. It turns out that the DNA of Amadeus is closest to the DNA of Jocker, having a distance of 1.22, whereas robo-boogie's DNA is farest apart with a value of 6.625. Closest to robo-boogie is the DNA of the random robot, having all 12 probabilities equal to 0.5. And both these robots are also close as far as performance is concerned.
This is the example analyzed by Kuhn in his 1950 paper. Ann has a contiuunm of optimal strategies, but Beth has only one. The game is slightly in favor of Beth, since the expected payoff for Ann is -1/18 when both play optimally.
In this variant, Ann and Beth both have just one optimal strategy: If Ann has a king she always raises. With a queen she never raises. All this seems reasonable. But now the "bluffing" occurs if Ann has a jack. Then she would raise (bluff) in 1/5 of the cases and check in the remaining 4/5 of the cases. If Ann has checked and Beth has raised, Ann would always fold with a jack, but will call in 4/5 of the cases with a queen, and she will not face this position with a king (since she has raised then at the very beginning), therefore it doesn't matter how she decides for this information set. As for Beth, if Ann raises, then Beth would always call with a king, in 3/5 of the cases with a queen, and never with a jack. If Ann checks in the first round, then Beth would always raise with a king, never with queen, and in 1/5 of he cases with a jack. Thus Ann's behavior strategy DNA is 0.2,0,1|0,0.8,x while Beth's one is 0,0.6,1|0.2,0,1. The two strategies differ for Ann and Beth how they decide between calling and folding when holding a queen. The game is still a little unfair to Ann, since he expected payoff for Ann when both play optimally is -1/30.
SIMULTANEOUS VNMPOKER(S,r,m,n): In the simultaneous version both players decide simultaneously whether they want to raise for m or for n. If one of them raises for n and the other for m, the one daring the higher amount (n) wins m from the other one, regardless what the cards show. If both raise the same amount, the one with the higher card wins n respectively m from the other one, again, no win for draws of identical cards.
Both player have four pure strategies: Betting low in both cases ("LL"), Betting low with a card of value "1" and high with a card of value "2" ("LH"), the counterintuitive strategy of raising with a card of value "1" and low with a card of "2" ("HL"), and raising high for both cards ("HH"). Remember that the probability of both having a card of calue "1" is (r-1)/(2(2r-1)), wheras the probability for Ann getting a "1" and Beth getting a "2" is the slightly higher r/(2(2r-1)). Then for each pair of strategies, the four cases of card distributions have to be condsidered, the payoffs noted, and the expected value as sum of probability multiplied by payoff, for all these four cases has to be computed. We get the following payoff matrix:
Choose m=1, n=3, r=4: