The Entropic Central Limit Theorem and Berry-Esseen Theorem

7 minute read

Published: December 02, 2025

In this post, I’ll revisit the Central Limit Theorem again, this time with a focus on entropy. I’ll state a few results at the end without proof including the Berry-Esseen Theorem which tells us about the convergence of the usual Central Limit Theorem. We’ll work with continuous random variables and their density functions. Recall that for a random variable $X:(\Omega,\mu) \to \mathbb{R}$ with density $f$, the entropy is defined as $H[f] = -\int_\Omega f(x) \ln f(x)\,d\mu$. We’ll just work with $\Omega = \mathbb{R}$ and the usual Lebesgue measure.

Entropy Fundamentals

Proposition: Entropy is a concave functional.

Note that the function $x \ln x$ is concave up on $(0,\infty)$ which may suggest to us that the functional $H[f] = -\int_\mathbb{R} f(x) \ln f(x) \, dx$ is concave. We’ll use calculus of variations to check. I’ll just give the definition below and the reader may verify the calculation; we’ll assume the derivative operators commute with integration.

\[\delta^2 H[f] = \left. \dfrac{d^2}{d\epsilon^2} H[f+\epsilon h] \right|_{\epsilon=0} = -\int \dfrac{h(x)^2}{f(x)}\, dx \leq 0.\]

So the functional is concave and hence, when restricted to a convex set, if there is a local maximum, it will be a unique global maximum.

Let’s figure out what our convex set is and find a local maximum. Let’s work with continuous random variables with mean 0 for simplicity and variance $\sigma^2$. So if such a random variable has PDF $f$, then $\int f(x)\, dx = 1, \int x f(x)\,dx = 0, \int x^2 f(x)\, dx = \sigma^2$. If $f,g$ satisfy these conditions, then it’s easy to check that so does $tf + (1-t)g$ for $t \in [0,1]$. So The set of all such continuous random variables is a convex set.

Next, we form the following Lagrangian; we’ll be using Lagrange multipliers to deal with optimization given some constraints.

\[\mathcal{L}[f] = H[f] -\lambda_0\left(\int f(x) \, dx -1\right) - \lambda_1 \left(\int x f(x)\, dx\right) - \lambda_2\left(\int x^2 f(x) \, dx - \sigma^2 \right)\]

Next, we vary using $f+\epsilon h$ where the 0th, 1st, and 2nd moments of $h$ are all 0 so that $f+\epsilon h$ stays in the set. We find that

\[\delta \mathcal{L}[f](h) = \left.\dfrac{d}{d\epsilon}\right|_{\epsilon=0} \mathcal{L}[f+\epsilon h] = -\int (\ln f +1)h\,dx -\lambda_0\int h\, dx - \lambda_1 \int x h\,dx -\lambda_2 \int x^2 h\, dx.\]

For this to equal 0 for all the admissible $h$, we need $\ln f +1 +\lambda_0 + \lambda_1 x + \lambda_2 x^2 = 0$. For convenience, let’s right that we need $ln f(x) = A + Bx + Cx^2$ or $f(x) = e^A e^{Bx} e^{Cx^2}$. Since the mean should be 0, we see that if $B \neq 0$, this would off-center the PDF to not have mean 0. So it must be that $B=0$. For the function to be integrable, we need $C<0$ and in fact, should be $-\dfrac{1}{2\sigma^2}$ so that we get the correct variance. And we’ll find that $e^A = \dfrac{1}{\sqrt{2\pi}\sigma}$. In other words, $f$ is the PDF for the normal distribution with mean 0 and variance $\sigma^2$.

Putting things together, we have found the normal distribution $N(0,\sigma^2)$ to be a local maximum for entropy for a given variance and mean 0. Since entropy is concave on the relevant convex set, we know that $N(0,\sigma^2)$ is a global maximum. One can compute this entropy to be $\dfrac{1}{2}(1 + \ln(2\pi \sigma^2)$. Note that for small $\sigma$, the entropy is actually negative so $H$ takes values on all of $\mathbb{R}$. This is different from the entropy of discrete random variables who have entropy bounded below by 0.

Proposition: Let $X,Y$ be two independent random variables with densities $p,q$. It is well known the density of $X+Y$ is the convolution $p*q$. Moreover, $H[p*q]\geq \max(H[p],H[q])$.

Proof: We noted that $\phi(x) = x \ln x$ is a convex function (so $-\phi(x)$ is concave). Jensen’s inequality tells us that for any random variable $Z$ and any convex function $\phi$, $\phi(E[Z]) \leq E[\phi(Z)]$.

Now, $(p*q)(x) = \int p(x-y)q(y)\, dy = E_Y[p(*-y)]$ can be expressed as expectation of something. We can think of $Z=p(*-y)$ as a random variable. So by Jensen’s inequality, $\phi(E_Y[p(*-y)]) \leq E_Y[\phi(p(*-y))]$.

If we integrate both sides with respect to $x$, the left hand side is $\int \phi(p*q)(x)\, dx = -H[p*q]$.

The right hand side unpacked is $\int \int \phi(p(x-y))q(y)\, dy \, dx$. By Fubini, we can switch the order of integration to integrate with respect to $x$ firt. Since $q(y)$ doesn’t depend on $x$, we can move that out of the inner integral: $\int q(y) \int \phi(p(x-y))\, dx \, dy.$

Using the substituion $u=x-y, du=dx$, the inner integral is just $\int \phi(p(u))\, du = -H[p]$ which is just a number. So the RHS equals $-H[p]\int q(y)\, dy$. Since $q$ is a density, then $\int q(y)\,dy =1$ so we have at the end $-H[p*q] \leq -H[p]$ and so $H[p*q] \geq H[p]$. The same argument also shows $H[p*q] \geq H[q]$ and hence $H[p*q]\geq \max(H[p],H[q])$. $\square$

Entropic Central Limit Theorem

Let $X_1,…,X_n$ be iid continuous random variables with mean $\mu$, variance $\sigma^2$ and $S_n = \sum^n X_i$. Let $Y_n = (S_n - n\mu)/\sigma\sqrt{n}$. Then we can check that the mean of $Y_n$ is 0 and the variance is 1.

We’d like to show that $H[Y_n] \leq H[Y_{n+1}]$ where equality holds if and only if the $X_i$ are normal distributions. Also, we know $H[Y_n] \leq H[N(0,1)]$. If we can show this, then we know that monotonically increasing sequences bounded above must converge to some $L \leq H[N(0,1)]$. If we can then show that $L$ is in fact equal to this number, we’ll have proven a kind of entropic Central Limit Theorem. This should require more machinery because I think this is a stronger type of convergence than the usual CLT which is about convergence in distribution. There is a 1986 paper with this a result along these lines.

Definition: Let $D(f,\phi) = \int f \log(f/\phi)$ be relative entropy where $\phi$ is the standard normal distribution’s density and the mean of $f$ is 0 while the variance of $f$ is 1.

Observe $D(f,\phi) = \int f \log f - \int f \log \phi = -H[f] - \int f \log(\phi)$. Since $\phi = \frac{1}{\sqrt{2\pi}} \exp(-x^2/2)$, log of this is $-\frac{1}{2}(\log(2\pi) - x^2)$. Plugging this in, the integral $-\int f \log(\phi)$ is then $\frac{1}{2}(\log(2\pi) -\int x^2 f)$. But the integral $\int x^2 f$ is the variance which is 1. So all told, $D(f,\phi) = H[\phi] - H[f]$, the difference in entropies.

Theorem: Let $f_n$ be the density for $Y_n$ as defined earlier. Then $D(f_n,\phi) \to 0$ which means $\lim_{n \to \infty} H[f_n] = H[\phi]$ if and only if $H[f_n]$ is finite for some $n$.

Berry–Esseen Theorem

Above, we see that we have $L^1$ convergence when we assume finite entropy for at least one standardized average. In a similar vein, if we assume finite 3rd absolute moment, we can also arrive at a theorem about the rate of convergence in the Central Limit Theorem.

Theorem (Berry, Esseen): Let $X_1, X_2, …$ be i.i.d. random variables with $E[X_i] = 0, E[X_i^2] = \sigma^2 > 0$, and $E[|X_i|^3] = \rho < \infty$. Define $Y_n=\frac{1}{n}(X_1+X_2+…+X_n)$ to be the sample mean, with $F_n$ the cumulative distribution function of $\dfrac{Y_n\sqrt {n}}{\sigma}$ and $\Phi$ the cumulative distribution function of the standard normal distribution. There exists a universal $C>0$ such that for all $x$ and $n$, $|F_n(x)-\Phi(x)|\leq \dfrac{C\rho}{\sigma^3\sqrt{n}}$.

Remark: So the rate of convergence is $O(n^{-1/2})$; meaning, the error decreases on the order of $n^{-1/2}$. More interestingly, the constant $C>0$ works for any random variable with finite 3rd absolute moment and some progress has shown that $C < 0.4748$.

Share on

Twitter Facebook LinkedIn

Electromagnetism from a Gauge Theory Perspective

20 minute read

Published: February 24, 2026

Let $M:=(\mathbb{R}^4,g)$ be the usual Minkowski space where the Lorentzian metric $g=-dt^2 + dx^2 + dy^2+dz^2$ has signature $(-+++)$ with a special time direction. Being a contractible space, there is only one principal $U(1)$-bundle (up to isomorphism) on $M$ and we may equip the bundle with a $U(1)$ connection $A$ which in physical terms, is a potential. The curvature is $F = dA + \dfrac{1}{2}[A,A]$ where $[A,A]$ is the Lie bracket but since $U(1)$ is abelian, this part is 0 and we only have the usual exterior derivative (but it’s hopefully clear we can generalize to other gauge theories with a choice of Lie group $G$). This 2-form $F$ is called the Faraday 2-form and we should think of it as being the fundamental object from which we’ll observe electric and magnetic fields. We’ll also need to discuss some structure coming from the metric. The metric can be extended to other tensors and of interest, to differential forms. Let $\eta = dt\wedge dx\wedge dy\wedge dz$ be the volume form. The Hodge star operator on differential forms is written as $*\beta$ where $\beta$ is any differential $k$-form and $*\beta$ is a $(n-k)$-form, $n$ being the dimension of the manifold. Let $\alpha,\beta$ be two $k$-forms. This operator is defined by $\alpha \wedge *\beta = g(\alpha,\beta) \eta$. Since $g$ is symmetric, then this also equals $\beta \wedge *\alpha$. For example, $*dt$ is defined by $dt \wedge *dt = g(dt,dt)\eta = -\eta$. So $*dt = -dx\wedge dy \wedge dz$.

$M\text{String}$, Topological Modular Forms, and the Witten Genus

12 minute read

Published: February 12, 2026

This is the third in a series about maps from some kind of Thom spectrum to a K-theory of vector bundles. However, we depart from the target being K-theory to replacing it with a different spectrum which is still periodic. Let’s begin with defining a string manifold. Suppose $M$ is a spin $n$-manifold; i.e. $w_2(M)=0$ and we have chosen a choice of lift of the orthonormal frame bundle which is a spin bundle. Then in this case, the 1st Pontryagin class is canonically twice some other class which we denote $\frac{1}{2}p_1(M)$. It’s possible this class is 0 or that it is 2-torsion and so $p_1(M)=0$. In any case, a string manifold has a lift to the $\text{String}(n)$ group which trivializes this class. There are several ways to think of this group. Topologically, it fits into a Whitehead tower $…\to \text{Fivebrane}(n) \to \text{String}(n) \to \text{Spin}(n) \to SO(n) \to O(n)$ where as we move left, the groups become increasingly connected.

Complex Cobordism, Landweber Exactness, and the Todd Genus

14 minute read

Published: February 12, 2026

In my previous blog, I wrote about the $\widehat{A}$-genus and its refinement $\alpha: \Omega^{\text{Spin}}_* \to KO^{-*}$ which is a graded ring morphism. This map comes from a $\mathbb{E}_\infty$ ring morphism which I’ll also call $\alpha:M\text{Spin} \to KO$ of ring spectra. Since we see real K-theory featured and spin manifolds, we can ask if there’s any ring spectrum map which features complex K-theory and complex manifolds that then gives rise to a map of graded rings and a genus. The short answer is yes, there’s a map from $MU \to KU$ which is complex cobordism to complex K-theory and the genus is the so-called Todd genus $td: \Omega^U_* \to KU^*$. Here, we’re dealing with stably almost complex manifolds. Complex K-theory also exhibits Bott periodicity and so $KU^* \cong \mathbb{Z}[x,x^{-1}]$ where $\deg x=2$. On the other hand, $\Omega^U_* \cong \mathbb{Z}[x_2,x_4,x_6,…,x_{2i},…]$. I’ll try to explain some of the ideas behind this but in an anachronistic way.

Spinors, Dirac Operators, the $\widehat{A}$-Genus, and its Refinement

22 minute read

Published: February 04, 2026

Let $M$ be an oriented smooth $n$-manifold; then there is a a coherent way of choosing oriented bases on $T_x M$ for every $x \in M$. If we choose a metric, then we can make these orthonormal bases and hence, we get a principal $SO(n)$-bundle called the orthonormal frame bundle (the isomorphism class is independent of metric). A spin structure is a choice of principal $\text{Spin}(n)$-bundle which is a lift of the orthonormal frame bundle. This structure group can be thought of as the even units in the Clifford algebra $Cl_n$ and provides us with an action. The obstruction to a spin structure is the cohomology class $w_2(M)$ so we say $M$ is spinnable if $w_2(M)=0$. It turns out that spin structures are parametrized by $H^1(M,\mathbb{Z}/2)$.

Sam Auyeung