Sequences of Coin Flips and Hamming Cubes
Published:
My colleagues and I were discussing the following probability brain teaser at lunch: Suppose I have a a fair coin and I use it to write down a 100-character sequence of heads and tails. I don’t show this process to you at all nor tell you any part of the sequence but you are allowed to ask me one yes/no question which I’ll answer truthfully. Then, you will write down your own 100-character sequence with the goal of maximizing the number of matches between our sequences. e.g. if my sequence is $THH…$ and you submit $HTH…$, the first two are not a match but the third is. What yes/no question would you ask and what would you then submit (and why)? What is the expected value of your strategy?
First, let’s just make this more general and assume there are $2n$ characters.
Claim 1: An optimal strategy is asking, “Are there at least $n$ heads?” If the answer is “yes”, submit the sequence of all heads. If the answer is no, submit the sequence of all tails.
Claim 2: The worst question is asking, “Are there an even number of heads?” Or well, another equally bad question is asking if there is an odd number of heads.
Two sequences are “close” to each other if they differ by only a few bits as opposed to many. For example, $THHTT$ is closer to $THTTT$ than to $HTTHT$ since there’s only a difference of 1 bit in the first comparison and 4 in the second. This reminds me of an idea from error correction codes: Hamming cubes. For sequences of length $k$, consider a $k$-dim hypercube where we label the vertices with $k$-character sequences of tails and heads such that immediate neighbors only differ by one bit. By neighbor, I mean that they are connected by one edge.
We can define a metric $d$ on the cube using this notion of distance and thus, define Hamming balls as well. A closed ball of radius $r$ centered at some vertex is simply the set of all vertices that are a distance of at most $r$ from the center.
Now, a yes/no question splits the vertices into two disjoint sets, $Y$ and $N$, where $Y$ contains the sequences to which an answer “yes” is supplied to your question and similar for $N$ and an answer of “no.” Well, okay, this isn’t quite true. Let’s suppose you ask, “Is the sky blue?” and I say “yes.” This question did not give you any information about my sequence and does not split the cube into two disjoint sets. Despite this, we still see that: each coin flip to generate my sequence was independent so the expected value of getting the first match is $\dfrac{1}{2}(1+0)$ and the same is true for all the rest. Thus, the expected value is $n$ in the situation where no information is gained from a question.
Now consider the question Q2 from Claim 2 above. Note that for every element of $Y$ (has an even number of $H$), all of its nearest neighbors (vertices of distance 1 away) are in the set $N$. And vice versa: every element of $N$ has nearest neighbors being elements of $Y$. In other words, $Y$ and $N$ are completely mixed together and each is completely disconnected in the sense that no element of $Y$ is connected to another element of $Y$ by a single edge (same for $N$). If we think about it, the even/odd parity of a sequence depends only on the last coin flip which is 50/50 for heads or tails. The expected number of matches for any submitted sequence is $n$! Though we do actually gain information about my secret sequence, it doesn’t help at all with increasing the expected value.
Why is this partition of $Y$ and $N$ not useful? We’re trying to have useful information about what my secret sequence is but the sequences most similar to any guess you make based on my “yes” are in the set $N$.But you want to have a good chance of getting as far as you can from the elements of $N$ and be close to elements of $Y$. Of course, this is framed a bit strangely since the very question you ask will determine $Y$ and $N$; they aren’t pre-fixed. But the idea is to find $Y$ and $N$ that are separated and then in $Y$, find the sequence that maximizes distance away from $N$ while minimizing distance to other elements of $Y$. Q2 fails this miserably.
But the question Q1 for Claim 1 succeeds. We can see that $Y$ and $N$ for Q1 are Hamming balls of radius $n$ with centers at the sequence of all heads and the sequence of all tails, respectively. These two centers are maximally far apart, a distance of $k=2n$. And if we placed embedded the cube as $[0,1]^k \subset \mathbb{R}^k$, then the two sets are completely separable by some connected surface. Of course, this idea of separation isn’t so intrinsic to the cube but the notion the two balls are each connected at least.
So after Q1 has split up the cube into two Hamming balls $Y$ and $N$, we want to pick an element in $Y$ so to maximize distance away from $N$ and minimize distance to other elements of $Y$. The center of the ball is the best choice for that.
Remark: Suppose we have an isometry of the cube; i.e. some bijection which preserves the distances of the vertices. Then that would map Hamming balls of radius $r$ to Hamming balls of radius $r$. In particular, there are many yes/no questions we could ask where $Y$ and $N$ are essentially an isometry applied to our original balls of radius $n$ centered at the all heads and all tails sequences. And then the sequence to submit is the center of the ball. Note that the two centers will still be distance $k=2n$ apart.
Expected Value
Let $T = \sum^{2n}_{i=n}i {2n \choose i}$ and $S=\sum^{2n}_{i=n} {2n \choose i}$. If the answer is “yes”, then the expected number of matches using the strategy S1 of Claim 1 is $T/S$. Note that the probability of being in $Y$ is $S/2^{100}$. If “no”, then we need to start the indexing at $n+1$. We then need to take the appropriate average to combine these two to get an expected value.
If $2n=100$, for example, the probability we’re in $Y$ is around $0.5398$ and the conditional expected value there is $53.69$. The probability of being in $N$ is $1-0.5398 =0.4602$ and the conditional expected value is $54.32$. So the weighted sum of these is about $53.9795$.
Now, for large $n$, this becomes unwieldly to calculate. But we can use the central limit theorem for this. The binomial distribution $B(2n,0.5)$ converges to a continuous normal distribution $N(\mu,\sigma^2)$ as $n \to \infty$. The mean is $\mu = n$, variance is $\sigma^2 = n/2$, and the condition $i \geq n$ cuts this in half. We want the positive half.
For the standard normal distribution $Z \sim N(0,1)$, the expected value of the positive half is $E[Z|Z\geq 0] = \dfrac{1}{\sqrt{2\pi}}\int^\infty_0 z e^{-z^2/2}\,dz \cdot \frac{1}{P(Z\geq 0)}$. Since $P(Z \geq 0) = \dfrac{1}{2}$ and the using $u=-z^2/2$ to compute the integral to find that it’s equal to 1, the expected value here is $\sqrt{\dfrac{2}{\pi}}$.
Rescaling back to our original distribution with $\mu+Z\sigma$, we have $E[i|i \geq n] \approx n + \sqrt{\dfrac{n}{2}} \cdot \sqrt{\dfrac{2}{\pi}} = n + \sqrt{\dfrac{n}{\pi}}$. Since a continuous normal distribution has $E[i|i<n] = E[i|i \geq n]$, then we won’t compute that. But note that in the discrete setting, these two are not equal though they will be close for large $n$.
Infinite Sequences
One interesting idea is to try to extend this to infinite sequences of $T$ and $H$. The issue is first that we can’t count the number of heads or tails but rather, try to get a percentage of them. But also, since the generation of the sequence is with a fair coin, then the expected proportion of $H$ and $T$ should be $1/2$. Perhaps one way to make this precise is if $h_n$ is the number of heads in the first $n$ coin flips, then $\limsup \dfrac{h_n}{n} = \liminf \dfrac{h_n}{n}=\dfrac{1}{2}$. So the probability of other sequences with different “density of heads” is 0.
What we want is still to be able to split the infinite dimensional hypercube into two “halves” in some sense. I think there is a way to do this with ultrafilters which are a part of set theory that I am not familiar with. A filter is a family of subsets of a set $X$ that is closed under superset and finite intersection. A filter $F$ is maximal if there does not exist a strictly larger collection of subsets $E$ of $X$ that is also a filter. By larger, I just mean with inclusion: $F \subset E$. If $F$ is maximal, we call it an ultrafilter.
An interesting thing is that ultrafilters can be used to prove there exists a winning strategy but in a sort of tautological way: we need to use them to define what winning even means. Moreover, we cannot actually write down the sequence constructively. The partitioning into two pieces entirely depends on the Axiom of Choice.
To see how ultrafilters and the Axiom of Choice interact here, we must transition from maximizing a count of matches (which doesn’t make sense if there’s infinitely many matches) looking at the set of matching indices, modeled as a game on the Cantor space $2^\omega$ (the set of all infinite binary sequences).
First, for infinite sequences $x, y \in 2^\omega$, since we cannot count matches, let the set of matching indices be $M(x,y) = {n \in \omega : x_n = y_n}$. We want a notion of whether $y$ is a good guess for $x$.
A free ultrafilter $\mathscr{U}$ on $\omega$ is a collection of subsets of natural numbers that acts as a finitely additive, ${0,1}$-valued measure. It satisfies:
- $\emptyset \notin \mathscr{U}$
- If $A \in \mathscr{U}$ and $A \subseteq B$, then $B \in \mathscr{U}$.
- For any $A \subseteq \omega$, either $A \in \mathscr{U}$ or the complement $\omega \setminus A \in \mathcal{U}$ (but not both)
- No finite set is in $\mathscr{U}$ (it is free).
We then define the winning condition of our game using a free ultrafilter $\mathscr{U}$. A sequence $y$ is a “winning match” for $x$ if $M(x,y) \in \mathscr{U}$. If you like, what this defines for us is a measure $\mu:\mathscr{P}(\omega) \to {0,1}$ where $\mu(A) = 1$ if $A \in \mathscr{U}$ and is 0 otherwise.
Let $\mathbf{1}$ be the sequence of all $1$s (heads) and $\mathbf{0}$ be the sequence of all $0$s (tails). Let $H(x) = {n \in \omega : x_n = 1}$ be the indices where my secret sequence $x$ has heads.
Question: Is $H(x) \in \mathscr{U}$?
- If Yes: Submit $y = \mathbf{1}$. Then $M(x, \mathbf{1}) = H(x) \in \mathscr{U}$. You win.
- If No: Submit $y = \mathbf{0}$. By property (3) of ultrafilters, if $H(x) \notin \mathscr{U}$, then its complement $\omega \setminus H(x)) \in \mathscr{U}$. Since $M(x, \mathbf{0}) = \omega \setminus H(x)$, your match set is in $\mathcal{U}$. You win.
Because property (3) forces an absolute dichotomy for any arbitrary subset of $\omega$, this question bisects the infinite Hamming cube $2^\omega$ into two pieces. It guarantees a winning submission for every possible $x$.
Remark 1: The question is equivalent to “Does your hidden sequence fall into the exact set of sequences for which submitting a sequence of all heads define a win?” So it’s tautological. Before you get upset, read on.
Remark 2: It may seem weird that the “density” of heads is 50% but we submit all heads or all tails. But this is not the right kind of comparison. It’s for the very fact that everything is expected to have a 50% density of heads that there is no way to get a “closer” match when there’s still an expected infinitely many mismatches. We can’t define winning/losing to mean “more” (mis)matches as infinite counts can’t be compared.
So we need something that makes it so that certain questions actually lead to a “competitive” strategy while being fair. An ultrafilter does this for us, even though it’s rather tautological to use it to define what winning means. There are some features that we want to preserve from the finite case:
- A complete mismatch cannot win since a match set in $\mathscr{U}$ implies the complete mismatch set is outside of it.
- There is consistency. Note that finite changes don’t matter to this measure but also, if $A$ is a winning match for $x$ and $B$ matches $x$ everywhere that $A$ does and then some, $B$ is also a winning match (by closure under supersets). This is certainly a property we wanted in the finite case: if $A$ is a decent match for $x$ and $B$ matches $x$ where $A$ does plus some more, then $B$ is also a decent match (and is considered better). Here, there’s no sense of something as better when trying to add a finite number to infinity.
- There is decidability. Every sequence has a defininitive win/loss verdict.
Remark 3: The strategy exists, but you cannot explicitly write down $\mathscr{U}$ in order to phrase the question. To construct a free ultrafilter, you must apply Zorn’s Lemma (which is equivalent to the Axiom of Choice) to extend the Fréchet filter (the filter of cofinite sets) to a maximal filter. Zorn’s Lemma is inherently non-constructive; it asserts the existence of a maximal element without providing an algorithm or formula to define it.
In fact, this limitation is a provable set-theoretic fact:
Sierpiński’s Theorem: One cannot prove the existence of a free ultrafilter on $\omega$ using only the Zermelo-Fraenkel axioms without the Axiom of Choice.
The moment you drop the Axiom of Choice, the ultrafilter vanishes, and with it, the guaranteed partition of the space.
