Algorithmic Complexity of Newton’s Method for Computing $\sqrt{x}$
Published:
Suppose I want to solve the following Leetcode problem: Given a non-negative integer x
, return the square root of x
rounded down to the nearest integer. The returned integer should be non-negative as well. You must not use any built-in exponent function or operator. For example, do not use x** 0.5
in Python.
In general, if you have a positive integer $n$ and want to find out what factors divide it, you only need to check up to $\sqrt{n}$ for potential factors. We can use that same idea here and write the following code.
def mySqrt(x):
y = 0
while y*y <= x:
y+=1
if y*y > x:
return y-1
else:
return y
As one expects, the time complexity is $O(\sqrt{x})$. We’ll briefly review big and little O-notation below.
Newton’s Method
On the other hand, if we’re looking for roots to a differentiable function $f:\mathbb{R} \to \mathbb{R}$, then Newton’s Method has us begin with an initial guess $x_0$. Then the equation of the tangent line through $(x_0,f(x_0))$ is $y-f(x_0) = f’(x_0)(x-x_0)$. This line crosses the $x$-axis at $x_1 = x_0 - \dfrac{f(x_0)}{f’(x_0)}$. So for this to work, we need $f’(x_0) \neq 0$. We then iterate this process $x_{k+1} = x_k - \dfrac{f(x_k)}{f’(x_k)}$ and we hope that $x_k \to r$ where $r$ is the true root.
In our situation, we want to approximate $\sqrt{a}$. Then consider $f(x) = x^2 -a$; we want its positive root so we can let $x_{k+1} = x_k - \dfrac{x_k^2-a}{2x_k} = \dfrac{1}{2}(x_k - \dfrac{a}{x_k})$. Since we only care about rounding to the nearest integer in the Leetcode problem, we can stop our iteration once $|x_{k+1}-x_k| < 1$.
As it turns out, the convergence is quadratic for twice-differentiable functions in the sense that roughly with each step, we would get double the number of accurate digits as before (we’ll explain this below). More precisely, let $e_k$ be the error at the $k$ th step which means that $e_{k+1} \approx Ce^2_k$ for some constant $C$ (see below). So $e_1 \approx Ce^2_0$ and $e_2 \approx C(Ce^2_0)^2 = C^3 e^4_0$ and $e_k \approx C^N e^{2^k}_0$ where $N$ is some number. So if we want an answer within some error $\epsilon <1$, we have $e^{2^k}_0 \approx \epsilon$ and so $2^k \ln(e_0) \approx \ln(\epsilon)$.
SInce $\ln \epsilon < 0$ when $\epsilon < 1$, we want $2^k \approx \ln(1/\epsilon)$ and $k \approx \log_2 \ln(1/\epsilon) \approx \log_2 \log_2(1/\epsilon)$ by a change of logarithmic base. If our initial guess has error $e_0$ roughly on the order of $a$, then the number of iterations needed is $O(\log_2 \log_2 a)$. For example, we could just let our intial guess be $a$ and on the interval $[0,a]$, $x^2-a$ is strictly increasing. Here’s a much faster algorithm for finding the integer square root.
def mySqrt(a):
if a < 2:
return a
a0 = a
a1 = (a0 + a / a0) / 2
while abs(a0 - a1) >= 1:
a0 = a1
a1 = (a0 + float(a) / a0) / 2
return int(a1)
Let’s go back to the more general situation of a twice-differentiable function $f$. The quadratic convergence of Newton’s Method for such a function $f$ comes from the Taylor expansion. Again, let $r$ be the true root (so $f(r) = 0$) and $x_k$ our guess. So $e_k = x_k - r$. The Taylor series expansion around $r$ gives $f(x_k) = f’(r)e_k + \dfrac{f’‘(r)}{2!}e^2_k+…$
Thus, $e_{k+1} = x_{k+1} - r = x_k -r - \dfrac{f(x_k)}{f’(x_k)} = e_k -\dfrac{f(x_k)}{f’(x_k)}$. Subbing in $f(x_k) \approx f’(r)e_k + \dfrac{f’‘(r)}{2!}e^2_k$
and assuming $f’(x_k) \approx f’(r) \neq 0$, then we have $e_{k+1} \approx e_k - \dfrac{f’(r) e_k}{f’(r)} + \dfrac{f’‘(r)}{2f’(r)} e^2_k$.
The first two terms cancel and so we have $e_{k+1} \approx \dfrac{f’‘(r)}{2f’(r)} e^2_k$.
Note that if $f$ is 3-times differentiable or more, we could expand to have more terms in the Taylor expansion but because Newton’s method only uses the 1st derivative, the 2nd derivative is the only relevant derivative in our estimate of the error and so the convergence is still quadratic, not cubic.
Appendix: Big/Little O-Notation
We say $f(n)$ is $O(g(n))$ if there exists $n_0,C$ such that for all $n \geq n_0$, $|f(n)| \leq C |g(n)|$. So $g(n)$ provides an asympotic upperbound (up to constants) on the growth of $f(n)$.
We say that $f(n)$ is $o(g(n))$ if $\lim_{n \to \infty} \dfrac{f(n)}{g(n)} =0$.