Two Envelope Paradox Solution
There are two envelopes containing money. One has twice as much money as the other. You choose an envelope, open it, and can keep it, or swap it for the other. Should you keep it or switch?
Method 1. Let the envelopes contain x and 2x. If you find x in your envelope then by switching to the other envelope you gain x. If you find 2x then switching loses x. The net gain is zero. Therefore switching is of no benefit.
Method 2. If your envelope contains x then the other one must have 2x or x/2. Switching means either a gain of x or a loss of x/2. Since these are equally probable you should switch.
For a discussion of the problem and a review of the literature see Discussion.
Part 1 demonstrates that 2x and x/2 are not equiprobable, though this is not enough to resolve the paradox. Part 2 revives the paradox by presenting a specific distribution that appears to give positive gain for switching for all x. This gain calculation is shown to be invalid. Part 3 generalises this result to show that switching is of no benefit in the general case, regardless of whether an envelope is opened or not. The Discussion that follows shows what happens in the finite case and attempts to explain the paradox intuitively. Essentially, the two envelope paradox is a zero-sum game in which the choice of the order of summation can give a false positive value for the gain.
Part 1: The original paradox undermined
Surprisingly, it is not true that all numbers can occur with equal probability in the envelopes. If it were so, and the probability of any number x was P( x ) = b, a constant value, then because there are infinitely many real numbers, the sum of all the probabilities would be b times infinity, whereas probabilities must add up to 1. So we know that the probability of large values must fall off. More precisely, P( x ) must approach 0 as x approaches infinity. This undermines the switching argument, which posited that the probability of 2x in the other envelope is the same as that of x/2.
You may find this explanation unsatisfactory. The paradox arises because we have the notion that the chain of pairs of envelopes can stretch forever, with each succeeding pair having equal probability but with double the values. In fact, either the chain must be finite or else the probabilities must diminish.
Does this dispose of the two envelope paradox? It doesn't, because the paradox can be resuscitated, as shown below.
Part 2: A special case of the paradox
Consider the distribution of envelopes containing the pairs ( 1, 2 ), ( 2, 4 ), ( 4, 8 ), ( 8, 16 ) and so on, where each such pair ( 2n, 2n+1 ) occurs with probability 2n/3n+1 for integer n >= 0. It is not hard to show that the probability sum, 1/3 + 2/9 + 4/27 + 8/81 + ... converges to a finite value (actually 1), so there is no problem with infinite probabilities. (Note also that large numbers occur less often than small ones.) The surprising feature of this distribution of envelopes is that the gain appears to be positive for every value of n. We use method 2 to calculate the net gain in switching from the value 2n to 2n+1 in the pair ( 2n, 2n+1 ) minus the loss in switching from 2n to 2n-1 in ( 2n-1, 2n ).
Since there is a 50% chance of picking either of the two values in each pair, each term begins with ½. The next part is the probability of this pair of envelopes being chosen and the third part is the gain of switching. The net gain is the sum of the two values.
1/2( 2n/3n+1 )( 2n+1 - 2n ) + 1/2( 2n-1/3n )( 2n-1 - 2n )
= 1/2( 2n/3n+1 )( 2n ) + 1/2( 2n-1/3n )( -2n-1 )
= 22n-1/3n+1 - 22n-3/3n
= ( 22n-3/3n )( 4/3 - 1 )
= 22n-3/3n+1 for n > 0 (G)
Because this is positive for every positive value of n, we are back with the original paradox. For n = 0 the gain is 1/6, but this case has to be treated separately since ‘1’ only appears in the pair ( 1, 2 ). It seems we should switch even without knowing the value of the contents of our envelope. Some people argue that the way out is to observe that the distribution has an infinite mean, so that the expected gain from switching is ∞ − ∞, which is not defined. The flaw in this argument is that we are not interested in the average gain but in the gain for each value of the distribution. Suppose that you could choose between the first and the second value in the pairs ( 1, 3 ), ( 10, 30 ), ( 100, 300 ), ( 1000, 3000 ), ... You certainly would choose the second in every pair. The fact that both have infinite means is irrelevant.
Let's look at the raw gain for each value in every pair of envelopes, ie the gain in switching from 1 to 2, the loss in switching from 2 to 1, the gain in going from 2 to 4, the loss in switching from 4 to 2, etc. Note that G gives us the net gain after these positive and negative values have been summed for each value of n. The raw gain is 1/6( 2 – 1 ) for the first, 1/6( 1 – 2 ) for the second, and 1/9( 4 – 2 ) for the third, giving:
1/6 - 1/6 + 2/9 - 2/9 + 8/27 - 8/27 + 32/81 - 32/81 + 128/243 - 128/243 + ... (S)
The terms in this infinite series of alternating values repeat with opposite sign then increase. It is a well-known property of such series that they cannot be summed. See Footnote 1. Note that the individual terms of S are given by (4/3)n/6, which increases steadily. So we can obtain nonsensical results by summing S in various ways. When we calculate the gain using method 2 we are in effect summing the terms of the series S by bracketing its terms as in (b) in the footnote:
1/6 + ( -1/6 + 2/9 ) + ( -2/9 + 8/27 ) + ( -8/27 + 32/81 ) + ( -32/81 + 128/243 ) + ... (A)
The sums of the bracketed terms (ie 1/18, 2/27, 8/81 etc) are given by the formula G for n > 0, ie each term gives the gain for a particular value of n. Of course, one is tempted to say that 1/6 - 1/6 + 2/9 - 2/9 + 8/27 - 8/27 + 32/81 - 32/81 + 128/243 - 128/243 + ... is simply zero, but in truth the sum is undefined. In fact, getting zero is exactly what we do when we apply method 1, ie observing that if we have x in our envelope then we gain x by going to 2x, and that if we have 2x then we lose x by going to x. Yet method 1 also sums A, and hence is as invalid as method 2 (the gain calculated from x to 2x vs from x to x/2), though unlike method 2, it does not lead us to a paradoxical conclusion - the insatiable itch to switch.
My contention is that taking the sum of an increasing oscillating series, which is what we do to calculate the individual gains in A, is invalid. Logically it makes sense to bracket the series as in A because this should give us the net gain for each value of n. The catch is that mathematically this is invalid due to the weird behaviour of infinite increasing series whose terms have alternating signs. To understand this intuitively, imagine that the "payback" grows, is being pushed ever further away, and is never realised due to the infinite length of the series.
Of course the problems with oscillating series only occur if they are infinite. Any finite series such as 1 - 1 + 2 - 2 + 3 - 3 + 4 - 4 + 5 - 5 (ie with only 10 terms), can be summed in any order and will always give the same result. Likewise, no finite distribution can generate the paradox because the value at the end of a finite chain of envelopes acts as the payback. This is shown in the postscript.
What about the open envelope version of the paradox? If we open an envelope and find that it contains 128, should we switch? Substituting n = 5 in G gives the gain of 27/36 = 128/729. It seems that we can validly extract just one bracketed element from the oscillating series A. Or can we? Recall that the reason we cannot sum all of A is that changing the order of the terms changes the sum. By extracting a single bracketed term out of A we are instantiating the same problem, ie we are choosing a specific order so as to obtain a specific result. This is arbitrary, and hence is just as fallacious in the case of a single term as in the infinite one.
Part 3: The general case
What of the general case? This involves every possible distribution as the source of the two envelopes. If you can see why I invoke the notion of distributions then read on, else see Footnote 2. Finite chains present no problems, so let's look at an infinite chain, where the pair ( 2n, 2n+1 ) occurs with probability P(n) for n >= 0. What is the gain for switching for each value of n? Here is the net gain for n = 0, 1, 2 and 3 (for simplicity I omit the factor of ½ that should precede every term):
P(0) * 1 = P(0)
P(0) ( 20 - 21 ) + P(1) ( 22 - 21 ) = - P(0) + P(1) * 2
P(1) ( 21 - 22 ) + P(2) ( 23 - 22 ) = - P(1) * 2 + P(2) * 4
P(2) ( 22 - 23 ) + P(3) ( 24 - 23 ) = - P(2) * 4 + P(3) * 8
Writing out the total gain for all values we see that our friend, the alternating series, is back:
P(0) - P(0) + P(1) * 2 - P(1) * 2 + P(2) * 4 - P(2) * 4 + P(3) * 8 - P(3) * 8 + P(4) * 16 + …
d0 - d0 + d1 - d1 + d2 - d2 + d3 - d3 + ... (A1)
where d0 ( = P(0) ) is the gain in going from n = 0 to n = 1, -d0 is the loss in going from n = 1 to n = 0, d1 is the gain in going from n = 1 to n = 2 and so on. A1 gives the gain calculated using method 1. The gain using method 2 is found by bracketing the series as we did in Part 2 (when we got the series A):
d0 + ( - d0 + d1 ) + ( - d1 + d2 ) + ( - d2 + d3 ) + ... (A2)
Is it valid to sum the above? It may be valid or not, depending on the nature of the series. Essentially there are two possibilities, the series is well-behaved or it isn't. By "well-behaved" I mean that its terms can be summed in any order to produce the same result (which may be ∞, -∞, or a real number). One thing we know for sure: if A2 is well-behaved then it will sum to zero, in which case there is no paradox, as 0 = (d0 - d0) + (d1 - d1) etc. If it is not well-behaved then we should not try to sum it. RIP.
Or are you still unconvinced? You may think that if the series converges when summed in a particular order then we should accept the total it calculates and hence that the paradox is unresolved. Clark and Shackel present a conditionally convergent series for the gain that is perhaps the sharpest instantiation of the paradox. My answer to this is in Footnote 3.
What about if we open an envelope in the case where A2 is well-behaved? We can choose to use the term ( - dn - 1 + dn ) for the gain for the value 2n. Does this revive the paradox? No. Some of these terms will give gains, others losses, and all will exactly balance out, since A2 is well-behaved. Finally, if we open an envelope in the case where A2 is not well-behaved then the argument at the end of Part 2 tells us that we cannot choose an arbitrary grouping of part of A1 and proclaim this to give a valid answer.
You may object that I have not taken the general case. For instance, the distribution could be a decreasing one, such as ( 2-n, 2-n-1 ) with probability P(n) for n >= 0. It could also be double-ended - an increasing distribution appended to a decreasing one. I claim that the points made in the preceding two paragraphs are still valid.
If the open envelope contains 100 you might counter that we are not faced with infinitely many cases but with just two possibilities: ( 100, 50 ) and ( 100, 200 ). This is the very heart of the paradox: the simple situation just described derives from the general case of infinitely many possible distributions of pairs of envelopes, and we cannot ignore the probabilistic origins of our deceptively simple two cases. To see it as an each way bet on ( 100, 50 ) vs ( 100, 200 ) is equivalent to saying that the original distribution consisted of only the two pairs ( 100, 50 ) and ( 100, 200 ) and that one of these has been randomly chosen. You cannot throw away the history of the problem. Still not convinced? Suppose you had a box containing two white pearls and one black one. You withdraw two pearls without looking at them, then you pick one of these and note that it is black. What is the chance that the second pearl is also black? Ms Maple, your sixth form teacher was right, history matters!
Whether we open an envelope or not, we should be indifferent to switching because of symmetry: nothing suggests that one envelope is preferable to the other.
It is sad that such a childishly simple problem has no simple solution. I believe that the two envelope paradox is the simplest logical puzzle that is (a) solvable and (b) has no easy solution, ie no solution that can be readily explained to people without mathematical training. Even now I find the solution unsatisfying.
A paradox such as the two envelope puzzle consists of two arguments which lead to contradictory conclusions. To resolve such a paradox it is not sufficient to claim that one argument is totally sound and therefore that the other must be wrong. The only way to resolve such a paradox is to show where there is a mistake in one of the competing arguments. We must show exactly at what point the argument in favour of switching goes wrong, without reference to the competing argument and even without reference to the contradictions that result from the switching argument. In this case the mistake is the assumption that it is valid to calculate the gain. Method 1 and method 2 sum the same series but in a different way, producing different sums. Both methods are actually invalid. This puzzle is an infinity paradox, though the infinity is hidden.
Essentially, the two envelope paradox is a zero-sum game in which the choice of the order of summation can give a false positive value for the gain.
It is vaguely analogous to Bertrand's Paradox  , in which the question asked can be answered in logically correct ways that give different answers. This forces us to conclude that Bertrand’s question is not sufficiently precise. In the two envelope paradox we are able to calculate the gain in a number of ways, each of which is mathematically sound, yet which give different answers for what must be a unique value. In fact we know that the value is actually zero by symmetry. From this we conclude that we cannot make the calculation. Specifically, that we cannot sum an oscillating or conditionally convergent series to give us the answer.
To understand the solution intuitively let’s look at a real-world version of the puzzle, where only finite amounts are possible. Let's assume that we know in advance that the envelopes can only contain one of the pairs ( 1, 2 ), ( 2, 4 ), ( 4, 8 ) up to ( 299, 2100 ) and that the chance of getting each pair is the same, ie 1/100. In the general case it did not matter whether we opened the first envelope or not. In this particular finite version it does. If the paradox statement does not include opening the envelope then indifference is still the correct solution. It is true that we will gain, statistically speaking, if our envelope contains 299 or less, but all these gains are exactly balanced by the loss if we happen to pick the envelope that contains 2100. Here is the calculation where we add the gain in going from x to 2x and subtract the loss in going from x to x/2 (method 2):
If our envelope contains 1 then the gain is 1/200( (2 - 1) ) = 1/200
If our envelope contains 2 then the gain is 1/200( (1 - 2) + (4 - 2) ) = 1/200
If our envelope contains 4 then the gain is 1/200( (2 - 4) + (8 - 4) ) = 2/200
If our envelope contains 8 then the gain is 1/200( (4 - 8) + (16 - 8) ) = 4/200
If our envelope contains 299 then the gain is 1/200( (298 - 299) + (2100 - 299) )
= 1/200( -298 + 299 ) = 298/200
If our envelope contains 2100 then the gain is 1/200( 299 - 2100 ) = -299/200
Summing, we find that the expected gain = 1/200( 1 + 1 + 2 + 4 + 8 + ... + 298 - 299 ) = 0.
We could have avoided this work. Summing a finite series is not affected by the order in which we add its terms, ie we know that the answer has to be zero because of method 1. Applying method 2 above we also got 0 gain because of the boundary conditions. By contrast, in the infinite case there is no payback, as there is in the last term of the calculation above. This is because in the general case, no matter what our envelope contains, the other one could contain twice as much. I assert that the general case of the finite version of the two envelope paradox (ie where the highest value is at most N, where N is a fixed number, though its value need not be known) will exhibit payback behaviour as above. Furthermore, I assert that the gain using both method 2 and method 1 will be zero.
If the paradox statement for the specific finite case above does include the extra information that we opened an envelope and found 128 then we should switch. Our expected gain is 1/2( (64 - 128) + (256 - 128) ) = 32. What about the symmetry argument? It no longer applies because we now know where we are in the distribution. Specifically we know that our envelope does not contain the maximum amount of 2100, so switching is statistically beneficial. Had we opened our envelope and found 2100 we would not switch. Finally, what about method 1? This too no longer applies because it only gives us the overall answer for all cases, whereas we are no longer interested in the solution for all values (which is described above), but only in the case where our envelope contains 128 and the other holds 64 or 256.
Posted 23 Oct 2006