High Dimensional Phenomena
Published:
This post is largely based on Richard Hamming’s The Art of Science and Engineering. The book has a large technical focus on error correcting codes, including Hamming codes but this part is just about high dimensional phenomena that may break our intuition.
Stirling’s Approximation
We’ll begin with Stirling’s Approximation as a warmup. We want to approximate $n!$ which grows quickly. If we take the natural log, we have: $\ln n! = \sum^n_{k=1}\ln k$. The sum reminds us of the integral $\int^n_1 \ln x \, dx$ and if we use integration by parts, we get $\int^n_1 \ln x \, dx = (x \ln -x)|^n_1 = n \ln n -n + 1$.
On the other hand, we can use the trapezoid rule to approximate this integral. Recall that the area of a trapezoid is the height times the average of the two base lengths. Or in our case, the trapezoids will have baselength 1 and the two sides will have different heights of the form $\ln k$. So the first trapezoid degenerates to a triangle as it has area $\frac{1}{2}(\ln 1 + \ln 2) = \frac{1}{2} \ln 2$. The next one has area $\frac{1}{2}(\ln 2 + \ln 3)$ and so on until $\frac{1}{2}(\ln(n-1) + \ln n)$. So in total after combining like terms, we have the integral is approximately $\ln 2 + \ln 3 +.. + \ln(n-1) + \frac{1}{2}\ln n$. So we now have:
$n \ln n -n + 1 = \int^n_1 \ln x \, dx \approx (\sum^n_{k=1}\ln k) - \frac{1}{2}\ln n$. Adding $\frac{1}{2}\ln n$ to both sides and then exponentiating, we have $n! \approx C n^n e^{-n}\sqrt{n}$. One can prove that $C =\sqrt{2\pi}$ and the ratio between the $n$th Stirling number, call it $S_n$, and $n!$ satisfies $\lim_{n \to \infty}S_n/n! =1$. On the other hand, the difference diverges: $\lim_{n \to \infty}(S_n -n!) =\infty$ similar to how $f_n = n$ and $g_n = n -\sqrt{n}$ satisfy $\lim_{n \to \infty}f_n/g_n = 1$ but $\lim_{n\to \infty}(f_n-g_n) = \infty$.
Gamma Function
Define $\Gamma(n) = \int^\infty_0 x^{n-1}e^{-x}\, dx$; this converges for all $n>0$. For $n>1$, we can integrate by parts using $u=x^{n-1}, dv = e^{-x}\, dx$. At the two limits, the $uv$ part is 0. So we’re left with the recursive relation $\Gamma(n) = (n-1)\Gamma(n-1)$ and $\Gamma(1) = 1$. So we see that if $n$ is a positive integer, then $\Gamma(n)=n!$.
Let’s compute $\Gamma(1/2) = \int^\infty_0 x^{-1/2}e^{-x}\, dx$ by setting $x = t^2, dx = 2t \, dt$. Our integral becomes $2\int^\infty_0 e^{-t^2}\, dt = \int^\infty_{-\infty}e^{-t^2}\,dt$ which is the Gaussian, well-known to equal $\sqrt{\pi}$. To obtain this, we switch to polar coordinates and integrate $\int^{2\pi}_0 \int^\infty_0 re^{-r^2}\, dr \, d\theta$.
Volume of $n$-Balls
We expect that the volume of a radius $r$ $n$-ball to be of the form $C_n r^n$. For example, $C_1 = 1$, $C_2 = \pi$ and $C_3 = 4\pi/3$. We know that the volume of a 2-ball incrementally increases by a thin shell whose volume is almost just equal to its surface area times a small thickness. That is, surface area equals the derivative of the volume: $\frac{dV_n(r)}{dr} = nC_n r^{n-1}$.
If we integrate this, it looks a lot like the integral of the Gaussian from above. Indeed, if we integrate $e^{-r^2}$ from 0 to $\infty$ ($r$ is the radius for spherical coorinates), we just have the multivariate Gaussian so the result should be $\Gamma(1/2)^n = \pi^{n/2}$.
On the other hand, this integral equals: $\dfrac{n C_n}{2}\int^\infty_0 e^{-r^2} r^{n-1}\, dr$; set $t = r^2$, $dt = 2r \,dr$ and now we have $\dfrac{n C_n}{2}\int^\infty_0 e^{-t}t^{n/2 - 1}\, dt = \dfrac{n C_n}{2}\Gamma(n/2) = C_n \Gamma(n/2 + 1)$ when we use the recursive identity. Thus, $C_n = \dfrac{\pi^{n/2}}{\Gamma(n/2 +1)}$ and this satisfies $C_n = 2\pi C_{n-2}/n$. Interestingly, $C_5 = 8\pi^2/15$ is the largest and then we see that $C_n$ decreases to 0: $C_{2k} r^{2k} = (\pi r^2)^k/k! \to 0$ as $k \to \infty$.
So for example, the volume of the unit $n$-ball goes to 0 as $n$ increases!
Moreover, take a ball of radius $1-\epsilon$. Then the volume of the thin shell of the unit ball is $C_n -C_n(1-\epsilon)^n$. The percentage of this volume (so divide this number by $C_n$) is $1-(1-\epsilon)^n \to 1$ as $n \to \infty$. So the “thin” shell is where the majority of the volume is when $n$ is large! Even for $n=3$, if $\epsilon = 1/2$, then $7/8$ of the volume is in the outer “half” shell.
This is important because say we’re trying to optimize a problem in high dimensions with some kind of constraint such as wanting a vector of length at most 1. Then it’s rather likely that the optimal solution lies in a thin shell of radius almost 1. The best design is when we push one or more parameters to the extreme.
$n$-Hypercube
We know that the length of the $v=\langle 1,1,1,…,1 \rangle$ is $\sqrt{n}$ in $\mathbb{R}^n$. If we pick any of the unit vectors on the axes, the dot product of $v$ with them is simply 1. Dividing by the norms of the vectors, we can find the cosine of the angle between them: $\cos \theta = 1/\sqrt{n} \to 0$ as $n \to \infty$. This means that $\theta \to \pi/2$. Thus, the diagonal is almost perpendicular to all of the axes!!
In fact, take the vectors of the form $\langle \pm 1, \pm 1,…,\pm 1\rangle$. All of these, when we take the dot product with unit vectors on the axes give us $\pm 1$ and the cosine of the angle is again nearly 0 for large $n$. So the $2^n$ vectors here are all almost perpendicular to the axes.
Additionally, let’s take dot products of these vectors with each other but we’ll pick out two randomly. That is, we can flip a fair coin for each coordinate to assign $\pm 1$. The cosine of the angle between two such vectors is $\frac{1}{n}\sum^n(\pm 1)$. The product for each of the coordinates has 50% chance of being $+1$ and 50% chance of being $-1$. For large $n$, by the weak law of large numbers, this should approach the expected value of 0. So almost surely, all these random diagonal vectors are also almost perpendicular to each other! We have found $2^n +n$ vectors that are almost all almost perpendicular to each other! This is a phenomenon that might explain why Large Language Models are so effective with embedding tokens into high dimensional space. There’s a lot of room for “almost orthogonal” semantic meaning.
Inner Tangent Sphere
Consider a $n$-hypercube where each side length is 4. There are naturally $2^n$ smaller cubes of side length 2 that make up this larger hypercube and we can place $2^n$ spheres of radius one into these smaller hypercubes. Then, all the spheres are tangent to each other and the cube. At the center of all the spheres is enough space to put a smaller sphere that’s tangent to all $2^n$ unit spheres.
For example, when $n=2$, we have a smaller circle. The distance from the center of this smaller circle to any of the centers of the larger spheres is $\sqrt{2}$; that’s the length of the diagonal of a unit square. Thus, we see that the radius of the smaller circle is $\sqrt{2}-1$ since we subtract off the radius of the unit circle.
In general, the radius of the smaller sphere is $\sqrt{n}-1$. This is very strange! We see that the radius of this sphere increases unbounded as $n$ increases and yet it is always contained inside of a hypercube of side length 4. Indeed, for $n=10$, we see that $\sqrt{10}-1 > 2$. Now, 2 is the distance from the center of the hypercube to any one of its sides. So it seems we must conclude that for $n \geq 10$, the inner tangent sphere is actually not an inner sphere at all because it is not contained in the hypercube!! And yet the ball that it bounds is convex and it is tangent to all $2^n$ unit spheres which are contained inside of large hypersphere.
