Algorithmic Complexity of Newton’s Method for Computing $\sqrt{x}$
Published:
Suppose I want to solve the following Leetcode problem: Given a non-negative integer x, return the square root of x rounded down to the nearest integer. The returned integer should be non-negative as well. You must not use any built-in exponent function or operator. For example, do not use x**0.5 in Python.
In general, if you have a positive integer $n$ and want to find out what factors divide it, you only need to check up to $\sqrt{n}$ for potential factors. We can use that same idea here and write the following code.
def mySqrt(x):
y = 0
while y*y <= x:
y+=1
if y*y > x:
return y-1
else:
return y
As one expects, the time complexity is $O(\sqrt{x})$. We’ll briefly review big and little O-notation below.
Newton’s Method
On the other hand, if we’re looking for roots to a differentiable function $f:\mathbb{R} \to \mathbb{R}$, then Newton’s Method has us begin with an initial guess $x_0$. Then the equation of the tangent line through $(x_0,f(x_0))$ is $y-f(x_0) = f’(x_0)(x-x_0)$. This line crosses the $x$-axis at $x_1 = x_0 - \dfrac{f(x_0)}{f’(x_0)}$. So for this to work, we need $f’(x_0) \neq 0$. We then iterate this process $x_{k+1} = x_k - \dfrac{f(x_k)}{f’(x_k)}$ and we hope that $x_k \to r$ where $r$ is the true root.
In our situation, we want to approximate $\sqrt{a}$. Then consider $f(x) = x^2 -a$; we want its positive root so we can let $x_{k+1} = x_k - \dfrac{x_k^2-a}{2x_k} = \dfrac{1}{2}(x_k - \dfrac{a}{x_k})$. Since we only care about rounding to the nearest integer in the Leetcode problem, we can stop our iteration once $|x_{k+1}-x_k| < 1$.
As it turns out, the convergence is quadratic for twice-differentiable functions in the sense that roughly with each step, we would get double the number of accurate digits as before (we’ll explain this below). More precisely, let $e_k$ be the error at the $k$ th step which means that $e_{k+1} \approx Ce^2_k$ for some constant $C$. So $e_1 \approx Ce^2_0$ and $e_2 \approx C(Ce^2_0)^2 = C^3 e^4_0$ and $e_k \approx C^N e^{2^k}_0$ where $N$ is some number. So if we want an answer within some error $\epsilon <1$, we have $e^{2^k}_0 \approx \epsilon$ and so $2^k \ln(e_0) \approx \ln(\epsilon)$.
Put in simpler terms in the context of finding the integer square root, when the error shrinks quadratically, we can think of this as doubling the number of correct digits (if we want to work base 10) or doubling the number of correct bits (if we want to work in base 2). Let’s use bits (so this will all be log base 2) and say we need $d$ bits to be correct in order to have the correct integer part. So we want to iterate $k$ times and roughly we’ll have $2^k$ correct bits after $k$ iterations; we want to find a $k$ such that $2^k > d$. But also, $d = \log(\sqrt{a}) = \log(a)/2$, the number of necessary bits to get the integer part correct. So $k \approx \log_2(\log_2(a)/2)$ is about the number iterations we need.
Let’s write up this much faster algorithm in Python.
def mySqrt(a):
if a < 2:
return a
a0 = a
a1 = (a0 + a / a0) / 2
while abs(a0 - a1) >= 1:
a0 = a1
a1 = (a0 + float(a) / a0) / 2
return int(a1)
Returning to the more general situation of a twice-differentiable function $f$, the quadratic convergence of Newton’s Method for such a function $f$ comes from the Taylor expansion. Again, let $r$ be the true root (so $f(r) = 0$) and $x_k$ our guess. So $e_k = x_k - r$. The Taylor series expansion around $r$ gives $f(x_k) = f’(r)e_k + \dfrac{f’‘(r)}{2!}e^2_k+…$
Thus, $e_{k+1} = x_{k+1} - r = x_k -r - \dfrac{f(x_k)}{f’(x_k)} = e_k -\dfrac{f(x_k)}{f’(x_k)}$. Subbing in $f(x_k) \approx f’(r)e_k + \dfrac{f’‘(r)}{2!}e^2_k$
and assuming $f’(x_k) \approx f’(r) \neq 0$, then we have $e_{k+1} \approx e_k - \dfrac{f’(r) e_k}{f’(r)} + \dfrac{f’‘(r)}{2f’(r)} e^2_k$.
The first two terms cancel and so we have $e_{k+1} \approx \dfrac{f’‘(r)}{2f’(r)} e^2_k$. In the case of $f(x) = x^2-a$, algebra shows that $e_{k+1} = \dfrac{1}{2(r+e_k)}e^2_k$ and we can assume that $r\geq 1$ since we were only looking for the integer square root.
Note that if $f$ is 3-times differentiable or more, we could expand to have more terms in the Taylor expansion but because Newton’s method only uses the 1st derivative, the 2nd derivative is the only relevant derivative in our estimate of the error and so the convergence is still quadratic, not cubic.
Appendix: Big/Little O-Notation
We say $f(n)$ is $O(g(n))$ if there exists $n_0,C$ such that for all $n \geq n_0$, $|f(n)| \leq C |g(n)|$. So $g(n)$ provides an asympotic upperbound (up to constants) on the growth of $f(n)$.
We say that $f(n)$ is $o(g(n))$ if $\lim_{n \to \infty} \dfrac{f(n)}{g(n)} =0$.
