Sequences of Coin Flips, Hamming Cubes, and Ultrafilters

23 minute read

Published: June 26, 2026

My colleagues and I were discussing the following probability brain teaser at lunch: Suppose I have a fair coin and I use it to write down a 100-character sequence of heads and tails. I don’t show this process to you at all nor tell you any part of the sequence but you are allowed to ask me one yes/no question which I’ll answer truthfully. Then, you will write down your own 100-character sequence with the goal of maximizing the number of matches between our sequences. e.g. if my sequence is $THH…$ and you submit $HTH…$, the first two are not a match but the third is. What yes/no question would you ask and what would you then submit (and why)? What is the expected value of your strategy? If you want to pause to think about this (and other questions), don’t read on just yet. Instead, here is a photo of Lofoten.

label

First, let’s just make this more general and assume there are $2n$ characters.

Claim 1: An optimal strategy is asking, “Are there at least $n$ heads?” If the answer is “yes”, submit the sequence of all heads. If the answer is no, submit the sequence of all tails.

Claim 2: The worst question is asking, “Are there an even number of heads?” Or well, another equally bad question is asking if there is an odd number of heads.

Two sequences are “close” to each other if they differ by only a few bits as opposed to many. For example, $THHTT$ is closer to $THTTT$ than to $HTTHT$ since there’s only a difference of 1 bit in the first comparison and 4 in the second. This reminds me of an idea from error correction codes: Hamming cubes. For sequences of length $k$, consider a $k$-dim hypercube where we label the vertices with $k$-character sequences of tails and heads such that immediate neighbors only differ by one bit. By neighbor, I mean that they are connected by one edge.

We can define a metric $d$ on the cube using this notion of distance and thus, define Hamming balls as well. A closed ball of radius $r$ centered at some vertex is simply the set of all vertices that are a distance of at most $r$ from the center.

Now, a yes/no question splits the vertices into two disjoint sets, $Y$ and $N$, where $Y$ contains the sequences to which an answer “yes” is supplied to your question and similar for $N$ and an answer of “no.” Well, okay, this isn’t quite true. Let’s suppose you ask, “Is the sky blue?” and I say “yes.” This question did not give you any information about my sequence and does not split the cube into two disjoint sets. Despite this, we still see that: each coin flip to generate my sequence was independent so the expected value of getting the first match is $\dfrac{1}{2}(1+0)$ and the same is true for all the rest. Thus, the expected value for a submission is $n$ in the situation where no information is gained from a question.

Now consider the question Q2 from Claim 2 above. Note that for every element of $Y$ (has an even number of $H$), all of its nearest neighbors (vertices of distance 1 away) are in the set $N$. And vice versa: every element of $N$ has nearest neighbors being elements of $Y$. In other words, $Y$ and $N$ are completely mixed together and each is completely disconnected in the sense that no element of $Y$ is connected to another element of $Y$ by a single edge (same for $N$). If we think about it, the even/odd parity of a sequence depends only on the last coin flip which is 50/50 for heads or tails. The expected number of matches for any submitted sequence is $n$! Though we do actually gain information about my secret sequence, it doesn’t help at all with increasing the expected value.

Why is this partition of $Y$ and $N$ not useful? We’re trying to have useful information about what my secret sequence is but the sequences most similar to any guess you make based on my “yes” are in the set $N$.But you want to have a good chance of getting as far as you can from the elements of $N$ and be close to elements of $Y$. Of course, this is framed a bit strangely since the very question you ask will determine $Y$ and $N$; they aren’t pre-fixed. But the idea is to find $Y$ and $N$ that are separated and then in $Y$, find the sequence that maximizes distance away from $N$ while minimizing distance to other elements of $Y$. Q2 fails this miserably.

But the question Q1 for Claim 1 succeeds. We can see that $Y$ and $N$ for Q1 are Hamming balls of radius $n$ with centers at the sequence of all heads and the sequence of all tails, respectively. These two centers are maximally far apart, a distance of $k=2n$. And if we placed embedded the cube as $[0,1]^k \subset \mathbb{R}^k$, then the two sets are completely separable by some connected surface. Of course, this idea of separation isn’t so intrinsic to the cube but the notion the two balls are each connected at least.

So after Q1 has split up the cube into two Hamming balls $Y$ and $N$, we want to pick an element in $Y$ so to maximize distance away from $N$ and minimize distance to other elements of $Y$. The center of the ball is the best choice for that.

Remark 1: Suppose we have an isometry of the cube; i.e. some bijection which preserves the distances of the vertices. Then that would map Hamming balls of radius $r$ to Hamming balls of radius $r$. In particular, there are many yes/no questions we could ask where $Y$ and $N$ are essentially an isometry applied to our original balls of radius $n$ centered at the all heads and all tails sequences. And then the sequence to submit is the center of the ball. Note that the two centers will still be distance $k=2n$ apart.

Remark 2: We’ll see below that the expected value when using Q1 is close to 54. In machine learning contexts, we often want to do binary classification and a common type of model to use is a tree-based model. Decision trees typically use variables to split the data into two groups, much like what a yes/no question. The goal is to make a split so to gain the most information and thus, increase some quantity (or equivalently, decrease the loss function).

Expected Value

Let $T = \sum^{2n}_{i=n}i {2n \choose i}$ and $S=\sum^{2n}_{i=n} {2n \choose i}$. If the answer is “yes”, then the expected number of matches using the strategy S1 of Claim 1 is $T/S$. Note that the probability of being in $Y$ is $S/2^{100}$. If “no”, then we need to start the indexing at $n+1$. We then need to take the appropriate average to combine these two to get an expected value.

If $2n=100$, for example, the probability we’re in $Y$ is around $0.5398$ and the conditional expected value there is $53.69$. The probability of being in $N$ is $1-0.5398 =0.4602$ and the conditional expected value is $54.32$. So the weighted sum of these is about $53.9795$.

Now, for large $n$, this becomes unwieldly to calculate. But we can use the central limit theorem for this. The binomial distribution $B(2n,0.5)$ converges to a continuous normal distribution $N(\mu,\sigma^2)$ as $n \to \infty$. The mean is $\mu = n$, variance is $\sigma^2 = n/2$, and the condition $i \geq n$ cuts this in half. We want the positive half.

For the standard normal distribution $Z \sim N(0,1)$, the expected value of the positive half is $E[Z|Z\geq 0] = \dfrac{1}{\sqrt{2\pi}}\int^\infty_0 z e^{-z^2/2}\,dz \cdot \frac{1}{P(Z\geq 0)}$. Since $P(Z \geq 0) = \dfrac{1}{2}$ and the using $u=-z^2/2$ to compute the integral to find that it’s equal to 1, the expected value here is $\sqrt{\dfrac{2}{\pi}}$.

Rescaling back to our original distribution with $\mu+Z\sigma$, we have $E[i|i \geq n] \approx n + \sqrt{\dfrac{n}{2}} \cdot \sqrt{\dfrac{2}{\pi}} = n + \sqrt{\dfrac{n}{\pi}}$. Since a continuous normal distribution has $E[i|i<n] = E[i|i \geq n]$, then we won’t compute that. But note that in the discrete setting, these two are not equal though they will be close for large $n$.

$k$ Questions

Suppose now we get to ask $k$ yes/no questions, not sequentially, but all at once and we get a $k$-vector of yes/no in response (which I’ll represent with 1/0). This breaks the cube into $2^k$ disjoint subsets though it is possible that some of the subsets are actually empty. For example, say my first question is, “Are there an even number of heads?” and the second is, “Are there an odd number of heads?” Then the subset corresponding to $(1,1,…)$ is empty. Also, note that though there are infinitely many questions to ask, there’s finitely many ways in which we can split the cube into $2^k$ subsets. It was never really necessay that $N=2n$ be an even number so let’s just work with any positive integer $N$. Each set of $k$ questions $Q$ defines a map ${0,1}^N \to {0,1}^k$ by sending them to the subsets and since the target and domain are both finite, there are finitely many such maps.

Now, for a given $Q$, consider $S$, one of the subsets it defines. Let $v_S$ be the sequence that is constructed by popularity among sequences of $S$. That is, the first term will be 1 if most of the sequences of $S$ start with 1, 0 if most are 0. If it’s a tie, we can just pick one. In a sense, $v_S$ is like the centroid of $S$. The value we want to minimize is thus:

$V(Q) = \sum_S \sum_{s \in S} d(s,v_S)$

since if $d$ is large, this means there are more mismatches and we want to be increasing matches.

The idea, like in the $k=1$ case, is to find $v_S$ that are as far apart as possible so that within each $S$, there is maximum matching. Let’s consider using linear equations to form questions when $k \geq 2$. Form an $k \times N$ matrix $A$ where each row represents the bits. The $i$th question is: “Is the sum of the bits in the position indicated by row $i$ of matrix $A$ odd?

Example: $N=7,k=3$.

\[A =\begin{pmatrix} 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 \end{pmatrix}\]

There are $128=2^7$ sequences. The first question $Q_1$ requires either no 1s for the 4th, 5th, 6th, 7th terms, exactly two, or exactly four. That’s eight possibilities and since the first three can be anything, we have 64 sequences in the affirmative for $Q_1$. Simimilarly for $Q_2,Q_3$. If we look at sequences in the affirmative for all three questions, there are 16 of them. So we have $8=2^3$ disjoint subsets, each of size 16.

Note that the sequence $(1,0,1,0,1,1,0)$ is “no” for $Q_1, Q_2$; i.e. the sum of the relevant bits is even. But it is “yes” for $Q_3$. If we multiply this by $A$, the result is the answer $(0,0,1)$ or (no,no,yes).

So all of that is very nice and convenient. On the other hand, note that for $Q_1$, for each entry $x_i$, exactly half of the sequences have $x_i = 0$ and the other half has $x_i=1$. So everything is a tie for computing $v_S$ and hence, we’d have to make it completely arbitrarily! There are also questions encoded to reveal exactly one bit of information. So the claim is: either a coordinate is perfectly determined or $v_S$ is arbitrary.

So the best expected value we can do with linear codes is simply to ask about individual bits and we can only achieve $\dfrac{1}{2}(N+k)$.

How do we do better? Let’s shift gears a bit. Let $v=v_S$ be a coordinatewise majority vector (or centroid as we’ve been calling it). Let $p_i= P(x_i=1|x \in S)$, the probability the $i$th coordinate is 1, given $x$ is in the subset $S$.

Note that $d(x,v) = \sum^N_{i=1}1 {x_i \neq v_i}$. This is maybe not standard notation, but it’s 1 whenever the two do not equal. Therefore, $\sum_{x \in S}d(x,v) = \sum_{x \in S} \sum^N_{i=1}1{x_i \neq v_i} = \sum^N_{i=1}\sum_{x \in S}1{x_i\neq v_i}$. We can swap the sums since this is a finite sum and now, we can analyze one coordinate at a time.

If $v_i = 0$, then the inner sum is simply the size of the subset of $x \in S$ where the $i$th coordinate is 1. This is exactly equal to $p_i|S|$. If $v_i=1$, then the contribution is $(1-p_i)|S|$. The centroid chooses the majority value; if $p_i>1/2$, we choose $v_i$ and use $(1-p_i)|S|$ and similar when $p_i<1/2$. So in either case, the $i$th coordinate contributes $|S|\min(p_i,1-p_i)$.

So $\sum_{x \in S}d(x,v_S) = |S|\sum^N_{i=1}\min(p_i,1-p_i)$ and so our $V(Q) = \sum_S |S|\sum^N_{i=1}\min(p_i,1-p_i)$. From this point of view, perhaps centroids shouldn’t be the focus. The focus might be trying to create these answer classes $S$ in which each coordinate has as much bias as possible. In the case with a single question, we framed it as the two Hamming balls being biased towards a particular vector but we can also work coordinate wise.

Infinite Sequences

We might try to extend this game to infinite sequences of $T$ and $H$. The issue is first that we can’t count the number of heads or tails but rather, try to get a percentage of them. But also, since the generation of the sequence is with a fair coin, then the expected proportion of $H$ and $T$ should be $1/2$. Perhaps one way to make this precise is if $h_n$ is the number of heads in the first $n$ coin flips, then $\limsup \dfrac{h_n}{n} = \liminf \dfrac{h_n}{n}=\dfrac{1}{2}$. So the probability of other sequences with different “density of heads” is 0.

We still want to split the infinite dimensional hypercube into two “halves” in some sense. There is a way to do this with ultrafilters which are a part of set theory that I am not familiar with. A filter is a family of subsets of a set $X$ that is closed under superset and finite intersection. A filter $F$ is maximal if there does not exist a strictly larger collection of subsets $E$ of $X$ that is also a filter. By larger, I just mean with inclusion: $F \subset E$. If $F$ is maximal, we call it an ultrafilter.

Ultrafilters can be used to prove there exists a winning strategy but in a tautological way: we need to use them to define what winning even means. This may seem odd but once we have infinite counts of (mis)matches, the game changes. So first, we need to look at the set of matching indices, modeled as a game on the Cantor space $2^\omega$ (the set of all infinite binary sequences). Here, $\omega$ refers to the ordinal numbers.

First, for infinite sequences $x, y \in 2^\omega$, since we cannot count matches, let the set of matching indices be $M(x,y) = {n \in \omega : x_n = y_n}$. We want a notion of whether $y$ is a good guess for $x$.

A free ultrafilter $\mathscr{U}$ on $\omega$ is a collection of subsets of natural numbers that acts as a finitely additive, ${0,1}$-valued measure. It satisfies:

$\emptyset \notin \mathscr{U}$
If $A \in \mathscr{U}$ and $A \subseteq B$, then $B \in \mathscr{U}$.
For any $A \subseteq \omega$, either $A \in \mathscr{U}$ or the complement $\omega \setminus A \in \mathcal{U}$ (but not both)
No finite set is in $\mathscr{U}$ (it is free).

We then define the winning condition of our game using a free ultrafilter $\mathscr{U}$. A sequence $y$ is a “winning match” for $x$ if $M(x,y) \in \mathscr{U}$. If you like, what this defines for us is a measure $\mu:\mathscr{P}(\omega) \to {0,1}$ where $\mu(A) = 1$ if $A \in \mathscr{U}$ and is 0 otherwise.

Here are some observations of what $\mathscr{U}$ affords us.

A complete mismatch cannot win since that would give the empty set which is not in $\mathscr{U}$ by property (1).
Finite changes don’t matter by property (4) which makes sense for this situation. Say $A$ and $B$ are almost identical except in 3 indices and $A$ matches $x$ in infinitely many places (and hence, so does $B$). There is a bijection between the two sets of matching indices; we can’t say which has “more” matches as counting doesn’t have the same meaning in the infinite setting.
But also, $A$ is a winning match for $x$ and $B$ matches $x$ everywhere that $A$ does and then some, $B$ is also a winning match. This is property (2): closure under supersets. This is certainly a property we wanted in the finite case: if $A$ is a decent match for $x$ and $B$ matches $x$ where $A$ does plus some more, then $B$ is also a decent match (and is considered better).
There is decidability. Every sequence has a defininitive win/loss verdict due to property (3).

So an ultrafilter provides us some of the structures of an infinite game that we want to have, inspired by the finite game. Next, let $\mathbf{H}$ be the sequence of all heads and $\mathbf{T}$ be the sequence of all tails. Let $H(x) = {n \in \omega : x_n = H}$ be the indices where my secret sequence $x$ has heads.

Question: Is $H(x) \in \mathscr{U}$?

If Yes: Submit $y = \mathbf{H}$. Then $M(x, \mathbf{H}) = H(x) \in \mathscr{U}$. You win.
If No: Submit $y = \mathbf{T}$. By property (3) of ultrafilters, if $H(x) \notin \mathscr{U}$, then its complement $\omega \setminus H(x)) \in \mathscr{U}$. Since $M(x, \mathbf{T}) = \omega \setminus H(x)$, your match set is in $\mathscr{U}$. You win.

Because property (3) forces an absolute dichotomy for any arbitrary subset of $\omega$, this question bisects the infinite Hamming cube $2^\omega$ into two pieces. It guarantees a winning submission for every possible $x$.

Remark 1: Note that the question is equivalent to “Does your hidden sequence fall into the exact set of sequences for which submitting a sequence of all heads defines a win?” So as I said, this is tautological but also necessary in order for the infinite game to in fact, be a game where matches should dictate wins/losses.

Remark 2: Now, the strategy exists but you cannot explicitly write down $\mathscr{U}$ in order to phrase the question. To construct a free ultrafilter, you must apply Zorn’s Lemma (which is equivalent to the Axiom of Choice) to extend the Fréchet filter (the filter of cofinite sets) to a maximal filter. Zorn’s Lemma is inherently non-constructive; it asserts the existence of a maximal element without providing an algorithm or formula to define it.

In fact, this limitation is a provable set-theoretic fact:

Sierpiński’s Theorem: One cannot prove the existence of a free ultrafilter on $\omega$ using only the Zermelo-Fraenkel axioms without the Axiom of Choice.

So the moment you drop the Axiom of Choice, the ultrafilter vanishes, and with it, the guaranteed partition of the space.

An interesting question we might ask is whether there are “isometries” like in the finite case so that we may find some other ultrafilter for which it makes sense to then submit sequences different from all heads or all tails. Let’s switch to using 0 for tails and 1 for heads so that we can do bitwise operations more easily.

Let $z \in 2^\omega$ be a target sequence we’d like to submit and define $T_z:2^\omega \to 2^\omega$ be bitwise addition mod 2 (this is equivalent also to bitwise XOR). We’ll write it as $T_z(w) = w\oplus z$. So then the sequence of all 0’s is mapped to $z$ itself. We’ll let $\neg z$ mean the bitwise complement of $z$.

The new question we ask is then: Is $H(x \oplus \neg z) \in \mathscr{U}$? If yes, submit $z$. If no, submit $\neg z$. Note that $H(x \oplus \neg z) = {n \in \omega: x_n + z_n +1 \equiv 1 \pmod{2}}$ which simplifies to when $x_n = z_n$. So this is the match set $M(x,z)$.

Appendix

So in this problem, even/odd parity has no saving grace. But what is a problem in which there is? There’s the brain teaser with a mad prison warden who places 100 prisoners in a row, all facing the same way and then places a red or blue hat on their head. Each prisoner would be able to see the color of hats of everyone in front of them but not their own hat nor the hats of those behind them. The warden says that each person has only one chance to name the color of hat on their own head and he will start with the person in the very back (who can see 99 hats), and then go down the line in order. If they’re correct, they go free, if not, they are imprisoned forever. The prisoners can plan ahead of time to maximize the number of people who can be freed. What’s the strategy?

The prisoners agree ahead of time that the 1st prisoner will say red if there is an even number of red hats and blue if there is an odd number of red. Suppose the 1st person says, “Red.” Then the 2nd prisoner can count the number of red hats ahead of him. If he sees an odd number, he knows that he must have a red hat so that the 1st person could see an even number. If an even number, he knows he has a blue hat. The 2nd person will say the correct color and the 3rd person, having listened in, will have kept track of the parity of red hats in front of the 2nd person and therefore deduce his own hat color. And so on.

Let’s now extend it to a countably infinite amount of prisoners, labeled $1, 2, 3,…$ Assume each prisoner can process any amount of information instantly. What is the strategy now? Pause here to think about it; in the meantime, a photo of distance galaxies.

label

Consider the equivalence relation on the set of sequences of 1’s and 0’s: $x \sim y$ if they eventually agree. That is, there exists $N$ such that for all $n \geq N$, $x_n = y_n$. There are uncountably many equivalence classes $[x]$. The prisoners agree on a representative $x$ for every class $[x]$ (this definitely needs the Axiom of Choice). Then the first prisoner will be able to tell which equivalence class $[x]$ the prisoners are in and will name the color that is in the first slot of their agreed upon representative for that class. Each $n$th prisoner will name $x_n$ as the color. Since $x$ will eventually agree with the sequence of prisoners’ hats, all but finitely many prisoners go free.

Share on

Twitter Facebook LinkedIn

Sam Auyeung

Sequences of Coin Flips, Hamming Cubes, and Ultrafilters

Expected Value

$k$ Questions

Infinite Sequences

Appendix

Share on

You May Also Enjoy

Time Dilation and Length Contraction

Notes on Exact Lagrangian Immersions with a Single Double Point by Ekholm-Smith

Notes on Framed Bordism and Lagrangian Embeddings of Exotic Spheres by Abouzaid

Failure of the H-Cobordism Theorem in 4 Dimensions