Existence and Uniqueness of Solutions to 1st Order ODEs and the Central Limit Theorem

14 minute read

Published: January 04, 2024

I’ll be teaching a class on differential equations and another on statistics this coming Spring 2024. I’ve been wondering what similarities or relations there are between the two classes. One of the important theorems of differential equations is that given a 1st order differential equation $\frac{dy}{dt} = f(t,y)$ where $f$ and $\partial_y f$ are continuous and has some initial conditions, there exists a unique solution on a small neighborhood of the initial condition. One of the important theorems of statistics is the Central Limit Theorem which, in a very weak form, says: Suppose that $X_i$ are independent, identically distributed random variables with zero mean and variance $\sigma^2$. Then $\frac{1}{\sqrt{N}}\sum^N_{i=1}X_i \to \mathcal{N}(0,\sigma^2)$ as $N \to \infty$. Here, $\mathcal{N}(0,\sigma^2)$ means a normal distribution with mean $\mu = 0$ and variance $\sigma^2$. It’s not so obvious but there is some similarities between these two theorems in that there are proofs for them which involve the notion of a fix point. The goal of this post is to spell this out a bit more (though not fully rigorously).

Existence and Uniqueness Theorem for 1st Order Ordinary Differential Equations

Let’s begin with differential equations. The existence and uniqueness theorem is more precisely as follows. Let $f(t,y)$ be a continuous function on $(a,b) \times (c,d)$, an open rectangle in the $ty$-plane. If $(t_0,y_0)$ is a point in this rectangle, there exists an $\epsilon > 0$ and function $y(t)$ defined for $t$ such that $|t-t_0|<\epsilon$ that solves the initial value problem $\frac{dy}{dt} = f(y,t)$ with initial condition $y(t_0) = y_0$. Furthermore, if $\partial_y f$ is also continuous on the rectangle and $y_1$ and $y_2$ are solutions to the above initial value problem for $t$ satisfying $|t-t_0|<\epsilon$, then $y_1(t) = y_2(t)$ on this domain.

So without the continuity condition on $\partial_y f$, we certainly can have nonuniqueness of solutions but often, we assume that $f$ is at least once differentiable and so we get both existence and uniqueness. One proof of this theorem has the following sketch. We note that a solution $y(t)$ to the differential equation would also satisfy the integral equation $y(t)-y_0 = \int^t_{t_0} f(s,y(s))\, ds$. We let $\phi_0(t) = y_0$ and $\phi_{k+1}(t) = y_0+\int^t_{t_0} f(s,\phi_k(s))\, ds$. These $\phi_k$ are called Picard iterates and can be viewed as being obtained by iteratively performing the same kind of integration operation on $\phi_0$. In other words, we have an operator $T$ on some space of differentiable functions. We emphasize that this operator depends on the function $f$.

This space of differentiable functions has a topology coming from a metric; let’s denote it with $(X,d)$. One can show that the operator $T:X \to X$ we have is a contraction map; i.e. if $x,y\in X$, then the distance $d(Tx,Ty)$ is less than $d(x,y)$. Put another way, the operator always sends two elements to points that are closer to each other than they were previously. Then, as you iterate $T$ more and more, one expects $T^kx$ and $T^ky$ to converge to the same point as $k \to \infty$. One important thing to say is that it’s not necessarily the case that $T$ “uniformly contracts.” For example, for some $x,y$, perhaps $d(Tx,Ty) < \frac{1}{2}d(x,y)$ while for another pair, the new distance is only a quarter of the original distance. The Banach Fixed Point Theorem says that if we’re dealing with a complete metric space $(X,d)$ and contracting map $T:X \to X$ so that there is a $q \in [0,1)$ such that $d(Tx,Ty) \leq qd(x,y)$ for all $x,y\in X$, then there exists a unique fixed point. That is, we have a point $z \in X$ so that $Tz = z$. The way to find this $z$ is to just begin with any point and just iteratively apply $T$ to it many, many times. When applied to our initial value problem, we find that the Picard iterates will converge to a unique fixed point in the space of functions. So we’ll get a unique function which satisfies the integral equation and hence, the differential equation with intial condition.

I haven’t thought too much about why we lose uniqueness if $\partial_y f$ isn’t continuous but I suspect it’s because we would then have to deal with a different space of functions and it might not be complete as a metric space. This does in fact happen; if we let $f(t,y) = 3y^{2/3}$, then $\partial_y f = 2y^{-1/3}$ which is not continuous. Then, the ODE $y’ = 3y^{2/3}$ with $y(0) = 0$ has $y_1 \equiv 0$ and also $y_2 = t^3$ as solutions but they are evidently not equal. So the uniqueness of solutions can certainly fail when $\partial_y f$ is discontinuous on the domain of interest.

Let’s end this section with a simple example. Let $f(x) = \cos x$. By the Mean Value Theorem, for any $a,b\in \mathbb{R}$, there exists $z \in (a,b)$ such that $\dfrac{f(b)-f(a)}{b-a} = f’(z)$. Since $f’(x) = -\sin x$, this is bounded between $\pm 1$ and so we see that $|f(b)-f(a)| \leq |b-a|$. There are only isolated points $z$ where $f’(z) = \pm 1$ but these aren’t ever realized by a pair $a,b$; it only happens as $a,b$ both approach, say, $-\pi/2$. So we have that $|f(b)-f(a)| < |b-a|$. Okay, but this does not mean $f(x) = \cos x$ is a contracting map on all of $\mathbb{R}$ because we need to find an actual $q < 1$. What we can deduce is that on particular intervals, $f$ is a contracting map. If we’re interested in finding a fixed point satisfying the equation $\cos x = x$, we know that $x \in [-1,1]$ (the Intermediate Value Theorem tells us there exists a solution in this interval). In that case, we can use $q = \sin(1) \approx 0.8415$. Thus, iteratively applying $f$ on this interval shows that the solution looks to be about $0.739$; call it $z$. For any smooth function, we can do a Taylor expansion around a point. If we do so around the fixed point $z$, we see that the rate of convergence is dominated by the linear term with a factor of $|\sin(z)| \approx 0.6736$. This isn’t as fast as using Newton’s Method on finding a root of the function $\cos x - x$. Perhaps the only knowledge we gained is knowing that this fixed point is unique though a quick look at the graph would also convince us.

Another reason this iterative method might feel unsatisfactory is because we weren’t working on the whole domain of $\cos x$. However, let’s have $g(x) = f(f(x))$. Then $g’(x) = \sin(\cos(x))\sin(x)$. Since $|\cos x| \leq 1$, then $|g’(x)| \leq \sin(1)$, the value above. Now we have a contracting map on all of the reals and we see that iteratively applying $g$ converges to the same thing as interatively applying $f$ on $[-1,1]$. Below, we’ll see a more impressive application of the Banach Fixed Point Theorem.

The Central Limit Theorem

Let’s now turn to the Central Limit Theorem (CLT) from statistics. Above, I stated a weak version of it which I’ll repeat here. Suppose that $X_i$ are independent, identically distributed random variables with zero mean and variance $\sigma^2$. Then $\frac{1}{\sqrt{N}}\sum^N_{i=1}X_i \to \mathcal{N}(0,\sigma^2)$ as $N \to \infty$. Here, $\mathcal{N}(0,\sigma^2)$ means a normal distribution with mean $\mu = 0$ and variance $\sigma^2$ so the probability density function looks like $Ae^{-Bx^2}$ where $A,B$ are some constants.

The convergence here is “almost everywhere” meaning that the random variable (a function) to which the sum converges agrees with the normal distribution on a set with full measure. So the points of disagreement forms a measure 0 set. Now, if we want to deal with a nonzero mean, that’s fine because we can always normalize things by subtracting off the expectation from the variables. Then, we’d be back to the situation as this weak version of the CLT.

Now, suppose we want a proof of the CLT that involves some kind of contraction map and fixed point. What might we do? First, we need a space of probability density functions to work with. So we want functions $f$ so that $\int_\mathbb{R} f(x) \, dx = 1$ and we also want them to have finite variance. We might want a more specific space but let’s think about what kind of contracting map we’d want. Well, both for discrete and continuous random variables, when we add them, the probability density function for the new random variable that is the sum is obtained by a convolution. The nice thing about convolutions is that, one, it puts an algebra structure on functions (a new kind of multiplication, if you will). Secondly, for continuous functions, it has a smoothing property. Even if a random variable is very jagged, as we convolve, it becomes more and more differentiable. So one might imagine a contracting map being built from convolutions since this process of smoothing things out seems to have a contracting feature. One would then need to prove that the fixed point of the map is the normal distribution $\mathcal{N}(0,\sigma^2)$. But this isn’t so difficult in fact. The convolution of two normal distributions is again a normal distribution. One approach to proving this is to use the Fourier transform. If $\hat{f}$ is the Fourier transform of $f$, then $\widehat{f*g} = \hat{f}\hat{g}$; i.e. the Fourier transform is an algebra homomorphism between the algebra of functions under convolution and the algebra of functions with the usual multiplication. But now, the Fourier transform of $Ae^{-Bx^2}$ is something of essentially the same form and it’s easy to multiply two functions of this form.

So as long as our contracting map is properly normalized somehow, we can make the normal distribution a fixed point! Okay, but how do we make sure the map is actually contracting? It seems there’s a way to use entropy; see here. The entropy functional is defined by $S(\rho) = -\int_\mathbb{R} \rho(x) \log \rho(x) \, dx$ and the normal distribution’s PDF maximizes entropy among all PDFs with variance $\sigma^2$. This integrand is fascinating and was first studied by Claude Shannon. It’s a fact (that I’m not sure how to prove) that convolving increases entropy; so there’s a monotonic behavior here which one can then try to translate over to contraction (another monotonic kind of behavior as distances only decrease). It’s at least intuitive that convolution—the result of adding random variables—should increase entropy. After all, we’re adding in more randomness. Thus, consider a continuous random variable $X$ of mean $\mu$ and variance $\sigma^2$. If we convolve its PDF with itself twice and divide by $\sqrt{2}$, then the variance of the outcome is still $\sigma^2$ but the entropy should increase. This is because $\text{Var}(aX) = a^2 \text{Var}(X)$ and also the variance of a sum of independent random variables is the sum of the variances. So I think we’d be able to show that the entropy is monotonically increasing towards the entropy of $N(\mu,\sigma^2)$, a fixed constant. We’d also like to show the convolution operator is a contracting map on the set of random variables of mean $\mu$, variance $\sigma^2$.

We should check that this subspace of random variables with mean $\mu$ and variance $\sigma^2$ is a complete metric space. If it is, then the Banach Fixed Point Theorem implies the CLT. I think this overall rough idea of a proof exists in the literature but I haven’t been able to pinpoint it. If someone knows, please tell me. I should also say that I started thinking about a fixed point style proof for the CLT after watching this video about Gaussian distributions.

Appendix: Entropy

Note that the function $x \ln x$ is concave up on $(0,\infty)$ which may suggest to us that the functional $H[f] = -\int_\mathbb{R} f(x) \ln f(x) \, dx$ is concave. We’ll use calculus of variations to check. I’ll just give the definition below and the reader may verify the calculation; we’ll assume the derivative operators commute with integration.

\[\delta^2 H[f] = \left. \dfrac{d^2}{d\epsilon^2} H[f+\epsilon h] \right|_{\epsilon=0} = -\int \dfrac{h(x)^2}{f(x)}\, dx \leq 0.\]

So the functional is concave and hence, when restricted to a convex set, if there is a local maximum, it will be a unique global maximum. Let’s figure out what our convex set is and find a local maximum. Let’s work with continuous random variables with mean 0 for simplicity and variance $\sigma^2$. So if such a random variable has PDF $f$, then $\int f(x)\, dx = 1, \int x f(x)\,dx = 0, \int x^2 f(x)\, dx = \sigma^2$. If $f,g$ satisfy these conditions, then it’s easy to check that so does $tf + (1-t)g$ for $t \in [0,1]$. So The set of all such continuous random variables is a convex set.

Next, we form the following Lagrangian; we’ll be using Lagrange multipliers to deal with optimization given some constraints.

$\mathcal{L}[f] = H[f] -\lambda_0\left(\int f(x) \, dx -1\right) - \lambda_1 \left(\int x f(x)\, dx \right) - \lambda_2 \left(\int x^2 f(x) \, dx - \sigma^2 \right)$.

Next, we vary using $f+\epsilon h$ where the 0th, 1st, and 2nd moments of $h$ are all 0 so that $f+\epsilon h$ stays in the set. We find that

\[\delta \mathcal{L}[f](h) = \left.\dfrac{d}{d\epsilon}\right|_{\epsilon=0} \mathcal{L}[f+\epsilon h] = -\int (\ln f +1)h\,dx -\lambda_0\int h\, dx - \lambda_1 \int x h\,dx -\lambda_2 \int x^2 h\, dx.\]

For this to equal 0 for all the admissible $h$, we need $\ln f +1 +\lambda_0 + \lambda_1 x + \lambda_2 x^2 = 0$. For convenience, let’s right that we need $ln f(x) = A + Bx + Cx^2$ or $f(x) = e^A e^{Bx} e^{Cx^2}$. Since the mean should be 0, we see that if $B \neq 0$, this would off-center the PDF to not have mean 0. So it must be that $B=0$. For the function to be integrable, we need $C<0$ and in fact, should be $-\dfrac{1}{2\sigma^2}$ so that we get the correct variance. And we’ll find that $e^A = \dfrac{1}{\sqrt{2\pi}\sigma}$. In other words, $f$ is the PDF for the normal distribution with mean 0 and variance $\sigma^2$.

Putting things together, we have found the normal distribution $N(0,\sigma^2)$ to be a local maximum for entropy for a given variance. Since entropy is concave on the relevant convex set, we know that $N(0,\sigma^2)$ is a global maximum. One can compute this entropy to be $\dfrac{1}{2}(1 + \ln(2\pi \sigma^2)$. Note that for small $\sigma$, the entropy is actually negative so $H$ takes values on all of $\mathbb{R}$. This is different from the entropy of discrete random variables who have entropy bounded below by 0.

Share on

Twitter Facebook LinkedIn

$M\text{String}$, Topological Modular Forms, and the Witten Genus

12 minute read

Published: February 12, 2026

This is the third in a series about maps from some kind of Thom spectrum to a K-theory of vector bundles. However, we depart from the target being K-theory to replacing it with a different spectrum which is still periodic. Let’s begin with defining a string manifold. Suppose $M$ is a spin $n$-manifold; i.e. $w_2(M)=0$ and we have chosen a choice of lift of the orthonormal frame bundle which is a spin bundle. Then in this case, the 1st Pontryagin class is canonically twice some other class which we denote $\frac{1}{2}p_1(M)$. It’s possible this class is 0 or that it is 2-torsion and so $p_1(M)=0$. In any case, a string manifold has a lift to the $\text{String}(n)$ group which trivializes this class. There are several ways to think of this group. Topologically, it fits into a Whitehead tower $…\to \text{Fivebrane}(n) \to \text{String}(n) \to \text{Spin}(n) \to SO(n) \to O(n)$ where as we move left, the groups become increasingly connected.

Complex Cobordism, Landweber Exactness, and the Todd Genus

14 minute read

Published: February 12, 2026

In my previous blog, I wrote about the $\widehat{A}$-genus and its refinement $\alpha: \Omega^{\text{Spin}}_* \to KO^{-*}$ which is a graded ring morphism. This map comes from a $\mathbb{E}_\infty$ ring morphism which I’ll also call $\alpha:M\text{Spin} \to KO$ of ring spectra. Since we see real K-theory featured and spin manifolds, we can ask if there’s any ring spectrum map which features complex K-theory and complex manifolds that then gives rise to a map of graded rings and a genus. The short answer is yes, there’s a map from $MU \to KU$ which is complex cobordism to complex K-theory and the genus is the so-called Todd genus $td: \Omega^U_* \to KU^*$. Here, we’re dealing with stably almost complex manifolds. Complex K-theory also exhibits Bott periodicity and so $KU^* \cong \mathbb{Z}[x,x^{-1}]$ where $\deg x=2$. On the other hand, $\Omega^U_* \cong \mathbb{Z}[x_2,x_4,x_6,…,x_{2i},…]$. I’ll try to explain some of the ideas behind this but in an anachronistic way.

Spinors, Dirac Operators, the $\widehat{A}$-Genus, and its Refinement

22 minute read

Published: February 04, 2026

Let $M$ be an oriented smooth $n$-manifold; then there is a a coherent way of choosing oriented bases on $T_x M$ for every $x \in M$. If we choose a metric, then we can make these orthonormal bases and hence, we get a principal $SO(n)$-bundle called the orthonormal frame bundle (the isomorphism class is independent of metric). A spin structure is a choice of principal $\text{Spin}(n)$-bundle which is a lift of the orthonormal frame bundle. This structure group can be thought of as the even units in the Clifford algebra $Cl_n$ and provides us with an action. The obstruction to a spin structure is the cohomology class $w_2(M)$ so we say $M$ is spinnable if $w_2(M)=0$. It turns out that spin structures are parametrized by $H^1(M,\mathbb{Z}/2)$.

Brieskorn Spheres

9 minute read

Published: January 31, 2026

Let $a = (a_1,…,a_n) \in \mathbb{N}^n$ be an $n$-tuple of positive integers; we typically want $a_k > 1$. Let $V_a = {z_1^{a_1}+…+z_n^{a_n}=0} \subset \mathbb{C}^n$ be the complex algebraic hypersurface with an isolated singularity at the origin and let $M_a = {z_1^{a_1}+…+z_n^{a_n}=\delta}$ be the Milnor fiber which is a smooth manifold for small $\delta>0$. Actually, we’ll also call $M_a$ the Milnor fiber which is what I just wrote but intersected with a small ball $B^{2n}$ centered as 0. We also have $\Sigma_a = S^{2n-1}_\epsilon \cap V_a$ which is called the link of the singularity, obtained by intersecting the singular variety with a small sphere centered at 0. This link is diffeomorphic to the boundary of $M_a$. We will explain some of the ideas for showing how $\Sigma_a$ is a homotopy sphere in many cases and how one finds their oriented diffeotype. We follow this post, these slides, and Brieskorn’s original paper.

Sam Auyeung