Existence and Uniqueness of Solutions to 1st Order ODEs and the Central Limit Theorem
Published:
I’ll be teaching a class on differential equations and another on statistics this coming Spring 2024. I’ve been wondering what similarities or relations there are between the two classes. One of the important theorems of differential equations is that given a 1st order differential equation $\frac{dy}{dt} = f(t,y)$ where $f$ and $\partial_y f$ are continuous and has some initial conditions, there exists a unique solution on a small neighborhood of the initial condition. One of the important theorems of statistics is the Central Limit Theorem which, in a very weak form, says: Suppose that $X_i$ are independent, identically distributed random variables with zero mean and variance $\sigma^2$. Then $\frac{1}{\sqrt{N}}\sum^N_{i=1}X_i \to \mathcal{N}(0,\sigma^2)$ as $N \to \infty$. Here, $\mathcal{N}(0,\sigma^2)$ means a normal distribution with mean $\mu = 0$ and variance $\sigma^2$. It’s not so obvious but there is some similarities between these two theorems in that there are proofs for them which involve the notion of a fix point. The goal of this post is to spell this out a bit more (though not fully rigorously).
Existence and Uniqueness Theorem for 1st Order Ordinary Differential Equations
Let’s begin with differential equations. The existence and uniqueness theorem is more precisely as follows. Let $f(t,y)$ be a continuous function on $(a,b) \times (c,d)$, an open rectangle in the $ty$-plane. If $(t_0,y_0)$ is a point in this rectangle, there exists an $\epsilon > 0$ and function $y(t)$ defined for $t$ such that $|t-t_0|<\epsilon$ that solves the initial value problem $\frac{dy}{dt} = f(y,t)$ with initial condition $y(t_0) = y_0$. Furthermore, if $\partial_y f$ is also continuous on the rectangle and $y_1$ and $y_2$ are solutions to the above initial value problem for $t$ satisfying $|t-t_0|<\epsilon$, then $y_1(t) = y_2(t)$ on this domain.
So without the continuity condition on $\partial_y f$, we certainly can have nonuniqueness of solutions but often, we assume that $f$ is at least once differentiable and so we get both existence and uniqueness. One proof of this theorem has the following sketch. We note that a solution $y(t)$ to the differential equation would also satisfy the integral equation $y(t)-y_0 = \int^t_{t_0} f(s,y(s))\, ds$. We let $\phi_0(t) = y_0$ and $\phi_{k+1}(t) = y_0+\int^t_{t_0} f(s,\phi_k(s))\, ds$. These $\phi_k$ are called Picard iterates and can be viewed as being obtained by iteratively performing the same kind of integration operation on $\phi_0$. In other words, we have an operator $T$ on some space of differentiable functions. We emphasize that this operator depends on the function $f$.
This space of differentiable functions has a topology coming from a metric; let’s denote it with $(X,d)$. One can show that the operator $T:X \to X$ we have is a contraction map; i.e. if $x,y\in X$, then the distance $d(Tx,Ty)$ is less than $d(x,y)$. Put another way, the operator always sends two elements to points that are closer to each other than they were previously. Then, as you iterate $T$ more and more, one expects $T^kx$ and $T^ky$ to converge to the same point as $k \to \infty$. One important thing to say is that it’s not necessarily the case that $T$ “uniformly contracts.” For example, for some $x,y$, perhaps $d(Tx,Ty) < \frac{1}{2}d(x,y)$ while for another pair, the new distance is only a quarter of the original distance. The Banach Fixed Point Theorem says that if we’re dealing with a complete metric space $(X,d)$ and contracting map $T:X \to X$ so that there is a $q \in [0,1)$ such that $d(Tx,Ty) < qd(x,y)$ for all $x,y\in X$, then there exists a unique fixed point. That is, we have a point $z \in X$ so that $Tz = z$. The way to find this $z$ is to just begin with any point and just iteratively apply $T$ to it many, many times. When applied to our initial value problem, we find that the Picard iterates will converge to a unique fixed point in the space of functions. So we’ll get a unique function which satisfies the integral equation and hence, the differential equation with intial condition.
I haven’t thought too much about why we lose uniqueness if $\partial_y f$ isn’t continuous but I suspect it’s because we would then have to deal with a different space of functions and it might not be complete as a metric space. This does in fact happen; if we let $f(t,y) = 3y^{2/3}$, then $\partial_y f = 2y^{-1/3}$ which is not continuous. Then, the ODE $y’ = 3y^{2/3}$ with $y(0) = 0$ has $y_1 \equiv 0$ and also $y_2 = t^3$ as solutions but they are evidently not equal. So the uniqueness of solutions can certainly fail when $\partial_y f$ is discontinuous on the domain of interest.
The Central Limit Theorem
Let’s now turn to the Central Limit Theorem (CLT) from statistics. Above, I stated a weak version of it which I’ll repeat here. Suppose that $X_i$ are independent, identically distributed random variables with zero mean and variance $\sigma^2$. Then $\frac{1}{\sqrt{N}}\sum^N_{i=1}X_i \to \mathcal{N}(0,\sigma^2)$ as $N \to \infty$. Here, $\mathcal{N}(0,\sigma^2)$ means a normal distribution with mean $\mu = 0$ and variance $\sigma^2$ so the probability density function looks like $Ae^{-Bx^2}$ where $A,B$ are some constants.
The convergence here is “almost everywhere” meaning that the random variable (a function) to which the sum converges agrees with the normal distribution on a set with full measure. So the points of disagreement forms a measure 0 set. Now, if we want to deal with a nonzero mean, that’s fine because we can always normalize things by subtracting off the expectation from the variables. Then, we’d be back to the situation as this weak version of the CLT.
Now, suppose we want a proof of the CLT that involves some kind of contraction map and fixed point. What might we do? First, we need a space of probability density functions to work with. So we want functions $f$ so that $\int_\mathbb{R} f(x) \, dx = 1$ and we also want them to have finite variance. We might want a more specific space but let’s think about what kind of contracting map we’d want. Well, both for discrete and continuous random variables, when we add them, the probability density function for the new random variable that is the sum is obtained by a convolution. The nice thing about convolutions is that, one, it puts an algebra structure on functions (a new kind of multiplication, if you will). Secondly, for continuous functions, it has a smoothing property. Even if a random variable is very jagged, as we convolve, it becomes more and more differentiable. So one might imagine a contracting map being built from convolutions since this process of smoothing things out seems to have a contracting feature. One would then need to prove that the fixed point of the map is the normal distribution $\mathcal{N}(0,\sigma^2)$. But this isn’t so difficult in fact. The convolution of two normal distributions is again a normal distribution. One approach to proving this is to use the Fourier transform. If $\hat{f}$ is the Fourier transform of $f$, then $\widehat{f*g} = \hat{f}\hat{g}$; i.e. the Fourier transform is an algebra homomorphism between the algebra of functions under convolution and the algebra of functions with the usual multiplication. But now, the Fourier transform of $Ae^{-Bx^2}$ is something of essentially the same form and it’s easy to multiply two functions of this form.
So as long as our contracting map is properly normalized somehow, we can make the normal distribution a fixed point! Okay, but how do we make sure the map is actually contracting? I’m not sure but I read here that there may be a way to use the notion of entropy. The entropy functional is defined by $S(\rho) = -\int_\mathbb{R} \rho(x) \log \rho(x) \, dx$ and the normal distribution’s PDF maximizes entropy among all PDFs with variance $\sigma^2$. This integrand is fascinating and was first studied by Claude Shannon. It’s a fact (that I don’t know how to prove) that convolving increases entropy; so there’s a monotonic behavior here which one can then try to translate over to contraction (another monotonic kind of behavior as distances only decrease).
And that’s it! We can then use the Banach Fixed Point Theorem to get the result of CLT. I think this rough idea of a proof exists in the literature but I haven’t been able to pinpoint it. If someone knows, please tell me. I should also say that I started thinking about a fixed point style proof for the CLT after watching this video about Gaussian distributions.