If you want to have a nicer theme for your jupyter notebook, download the cascade stylesheet file tma4320.css and execute the next cell:
or systems of equations, such as, for example $$ \begin{align*} xe^y &= 1, \\ -x^2+y &= 1. \end{align*} $$
NB!
A scalar equation is given by $$ \begin{equation} f(x) = 0, \label{_auto1} \end{equation} $$
where \( f\in C[a,b] \) for some interval \( [a,b] \). A solution \( r \) of the equation is called a zero or a root of \( f \). A nonlinear equation may have one, more than one or no roots.
Example 1: Given the function \( f(x)=x^3+x^2-3x-3 \) on the interval \( [-2,2] \). Plot the function on the interval, and locate possible roots.
See the function example1()
in nonlinearequations.py
.
According to the plot, the function \( f \) has three real roots in the interval \( [-2, 2] \).
The function can be rewritten as $$ f(x) = x^3+x^2-3x-3 = (x+1)(x^2-3). $$
Thus the roots of \( f \) are \( -1 \) and \( \pm \sqrt{3} \). We will use this example as a test problem later on.
The first condition guarantees that the graph of \( f \) will pass the \( x \)-axis at some point \( r \), the second guarantees that the function is either strictly increasing or strictly decreasing.
We note that the condition that \( f(a) \) and \( f(b) \) are of opposite sign can be summarised in the condition that \( f(a)\cdot f(b) < 0 \).
Use \( c_k \) as the approximation to the root. Since \( c_k \) is the midpoint of the interval \( [a_k,b_k] \), the error satisfies \( |c_k-r|\leqslant (b_k-a_k)/2 \). The loop is terminated when \( (b_k-a_k)/2 \) is smaller than some user specified tolerance. Since the length of interval \( I_k \) satisfies \( (b_k - a_k) = 2^{-k} (b-a) \), we also have an a priori estimate of the error after \( k \) bisections, $$ |c_k-r| \leqslant \dfrac{1}{2}(b_k-a_k) \leqslant \dfrac{1}{2^{k+1}}(b-a) $$ Consequently, we can estimate how many bisections we have to perform to compute a root up to a given tolerance \( \texttt{tol} \) by simply requiring that $$ \begin{align*} \dfrac{1}{2^{k+1}}(b-a) \leqslant \texttt{tol} \Leftrightarrow \dfrac{b-a}{2\;\texttt{tol}} \leqslant 2^k \Leftrightarrow \dfrac{ \log \Bigl( \dfrac{b-a}{2\;\texttt{tol}} \Bigr) }{\log 2} \leqslant k. \end{align*} $$ We will of course also terminate the loop if \( f(c_k) \) is very close to 0.
The algorithm is implemented in the function bisection()
.
See the function bisection()
in nonlinearequations.py
.
Example 2: Use the code above to find the root of $$f(x)=x^3+x^2-3x-3$$ in the interval \( [1.5,2] \). Use \( 10^{-6} \) as the tolerance.
See the function example2
in nonlinearequations.py
.
Control that the numerical result is within the error tolerance:
fixedpoint
.
The iterations are terminated when either the error estimate \( |x_{k+1}-x_k| \) is less than a given tolerance tol
, or the number of iterations reaches some maximum number max_iter
.
See the function fixpoint
in nonlinearequations.py
.
Example 3: The equation $$ x^3+x^2-3x-3=0 \qquad \text{can be rewritten as} \qquad x=\frac{x^3+x^2-3}{3}. $$ The fixed points are the intersections of the two graphs \( y=x \) and \( y=\frac{x^3+x^2-3}{3} \), as can be demonstrated by the following script:
See the function example3
in nonlinearequations.py
.
We observe that the fixed points of \( g \) are the same as the zeros of \( f \).
Apply fixed point iterations on \( g(x) \). Aim for the fixed point between 1 and 2, so choose \( x_0=1.5 \).
Do the iterations converge to the root \( r=\sqrt{3} \)?
See the function example3_iter
in nonlinearequations.py
.
In this section we will discuss the convergence properties of the fixed point iterations, under the conditions for existence and uniqueness given above.
The error after \( k \) iterations are given by \( e_k = r-x_k \). The iterations converge when \( e_k \rightarrow 0 \) as \( k\rightarrow \infty \). Under which conditions is this the case?
This is the trick: For some arbitrary \( k \) we have $$ \begin{align*} x_{k+1}&=g(x_k), && \text{the iterations}\\ r &= g(r). && \text{the fixed point} \end{align*} $$ Take the difference between those and use the Mean Value Theorem (see Preliminaries), and finally take the absolute value of each expression in the sequence of equalities: $$ \begin{equation} |e_{k+1}| = |r-x_{k+1}| = |g(r)-g(x_k)| = |g'(\xi_k)|\cdot|r-x_k| = |g'(\xi_k)|\cdot |e_k|. \label{_auto2} \end{equation} $$
Here \( \xi_k \) is some unknown value between \( x_k \) (known) and \( r \) (unknown). We can now use the assumptions from the existence and uniqueness result.
and \( L^k \rightarrow 0 \) as \( k\rightarrow\infty \) when \( L < 1 \).
In addition, we have that $$ \lvert x_{k+1} - x_{k}\rvert = \lvert g(x_{k}) - g(x_{k-1})\rvert = \lvert g'(\xi_{k})\rvert \cdot \lvert x_{k}-x_{k-1}\rvert $$ for some \( \xi_k \) between \( x_{k-1} \) and \( x_{k} \), which shows that $$ \lvert x_{k+1} - x_{k} \rvert \le L \lvert x_{k}-x_{k-1}\rvert. $$ Repeating this argumentation \( k \)-times yields the estimate $$ \lvert x_{k+1} - x_k\rvert \le L^k \lvert x_1 - x_0 \rvert. $$ As a consequence, since \( r = \lim_{k\to\infty} x_k \), we obtain that $$ \lvert x_1 - r \rvert = \Bigl\lvert \sum_{k=0}^\infty (x_{k+1}-x_{k+2})\Bigr\rvert \le \sum_{k=0}^\infty \lvert x_{k+1}-x_{k+2}\rvert \le \sum_{k=0}^\infty L^{k+1} \lvert x_1 - x_0\rvert = \frac{L}{1-L} \lvert x_1 - x_0 \rvert. $$ A similar argumentation (but now starting at \( x_k \) instead of \( x_1 \)) yields that $$ \lvert x_{k+1} - r \rvert \le \frac{L}{1-L}\lvert x_{k+1}-x_k\rvert \le \frac{L^{k+1}}{1-L} \lvert x_1 - x_0\rvert. $$
In summary we have:
If there is an interval \( [a,b] \) such that \( g\in C^1[a,b] \), \( g([a,b])\subset (a,b) \) and there exist a positive constant \( L < 1 \) such that \( |g'(x)|\leqslant L < 1 \) for all \( x\in[a,b] \) then
From the discussion above, we can draw some conclusions:
It is clear that \( g \) is differentiable. We already know that \( g \) has three fixed points, \( r=\pm\sqrt{3} \) and \( r=-1 \). For the first two, we have that \( g'(\sqrt{3}) = 3+\frac{2}{3}\sqrt{3} = 4.15 \) and \( g'(-\sqrt{3})=3-\frac{2}{3}\sqrt{3}=1.85 \), so the fixed point iterations will never converge towards those roots. But \( g'(-1)=1/3 \), so we get convergence towards this root, given sufficiently good starting values. The figure below demonstrates that the assumptions of the theorem are satisfied in some region around \( x=-1 \), for example \( [-1.2, -0.8] \).
Let us take, for example, the initial guess \( x_0 = -0.9 \), from which we want to find an upper bound for the number of iterations \( \hat{k} \) needed to meet a tolerance of \( 10^{-3} \). You can verify that the maximum value of \( |g'(x)| \) for \( x\in[-1.2,-0.8] \) is \( 0.96 \), so we can use \( L=0.96 \) in the error estimate. The equation to solve is $$ \frac{0.96^{\hat{k}+1}}{1-0.96}\lvert g(-0.9) - (-0.9)\rvert = 10^{-3}\, , $$ which gives us $$ 0.96^{\hat{k}+1}= \frac{0.04\times 10^{-3}}{|-0.073|} \approx 5.48\times 10^{-4}\quad \Rightarrow \quad \hat{k} \approx \frac{\log (5.48\times 10^{-4})}{\log (0.96)} - 1 \approx 182.95\, , $$ that is, we will need at most \( 183 \) iterations (but probably fewer). We get such a large estimate because our \( L \) is quite close to \( 1 \). Ideally, we would like to have \( |g'(x)| \) (and hence \( L \)) as low as possible, so that few iterations are required.
See example3_revisited
in nonlinearequations.py
.
The plot to the left demonstrates the assumption \( g([a,b])\subset(a,b) \), as the graph \( y=g(x) \) enters at the left boundary and exits at the right and does not leave the region \( [a,b] \) anywhere in between. The plot to the right shows that \( |g'(x)|\leqslant |g'(a)| = L < 1 \) is satisfied in the interval.
As we have seen in the previous section, it is possible to use fixed point iteration in order to solve non-linear equations. However, this method comes with potential issues:
Our idea is to set up an iterative method for the solution of this equation. That is, we assume that we have already found some \( x_k \) that is reasonably close to the solution. We want to find an update \( \Delta_k \) such that \( x_{k+1} := x_k + \Delta_k \) satisfies \( f(x_{k+1}) \approx 0 \). Using a first order Taylor series expansion, we now obtain that $$ 0 \approx f(x_k + \Delta_k) = f(x_k) + \Delta_k f'(x_k) + O(\Delta_k^2) \approx f(x_k) + \Delta_k f'(x_k). $$ Solving this (approximate) equation for the update \( \Delta_k \), we see that we should choose it as $$ \Delta_k = - \frac{f(x_k)}{f'(x_k)}. $$ Or, using that \( x_{k+1} = x_k + \Delta_k \), we get the iterations $$ x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}. $$ This yields the following method:
The method is implemented in the function newton()
. The iterations are terminated when
\( |f(x_k)| \) is less than a given tolerance.
See newton()
in nonlinearequations.py
.
Example 4: Solve \( f(x)=x^3+x^2-3x-3=0 \) by Newton's method. Choose \( x_0=1.5 \). The derivative of \( f \) is \( f'(x)=3x^2+2x-3 \).
See example4
in nonlinearequations.py
.
For the error analysis, we will start by trying to use the results of the previous section on fixed point iterations.
Assume to that end that \( r \) is a solution of the equation \( f(x) = 0 \). Since Newton's method requires a division by \( f'(x) \), we have to assume that \( f'(x) \neq 0 \), at least if \( x \) is sufficiently close to \( r \). Then we can regard Newton's method as a fixed point iteration with the fixed point function \( g \) defined as $$ g(x) := x - \frac{f(x)}{f'(x)}. $$ Indeed, we see that the fixed point equation \( g(x) = x \) holds, if and only if \( f(x) = 0 \). Now we can apply the fixed point theorem from the previous section, which states that this iteration will, for initial values \( x_0 \) sufficiently close to \( r \), converge to \( r \), provided that \( \lvert g'(r) \rvert < 1 \). For the derivative of \( g \) we get $$ g'(x) = 1 - \frac{f'(x)}{f'(x)} + \frac{f(x) f''(x)}{f'(x)^2} = \frac{f(x) f''(x)}{f'(x)^2}. $$ Since \( f(r) = 0 \) (and \( f'(r) \neq 0 \)), we obtain in particular that $$ g'(r) = 0, $$ which obviously satisfies \( \lvert g'(r) \rvert < 1 \). Thus Newton's method converges towards \( r \) if \( x_0 \) is sufficiently close to \( r \).
In addition, we would obtain from the fixed point theorem a (at least) linear convergence rate and error estimates. However, since \( g'(r) = 0 \), we can actually do better than that:
Denote by \( e_k := r-x_k \) the error after the \( k \)-th Newton iteration. Then we can perform a Taylor series expansion of \( f(r)=0 \) around \( x_k \) and obtain $$ 0 = f(r) = f(x_k) + f'(x_k)(r-x_k) + \frac{1}{2}f''(\xi_k)(r-x_k)^2 $$ for some \( \xi_k \) between \( x_k \) and \( r \). Now recall that, by the definition of Newton's method, $$ f(x_k) + f'(x_k)(x_{k+1}-x_k) = 0. $$ Inserting this expression in the Taylor expansion above, we obtain that $$ 0 = f(x_k) + f'(x_k)(r-x_{k+1}+x_{k+1}-x_k) + \frac{1}{2}f''(\xi_k)(r-x_k)^2 = f'(x_k)(r-x_{k+1}) + \frac{1}{2}f''(\xi_k)(r-x_k)^2. $$ Finally we divide by \( f'(x_k) \) and use the definition of \( e_k = r-x_k \) and \( e_{k+1} = r-x_{k+1} \), which yields the equality $$ \begin{equation}\label{eq:error_Newton} e_{k+1} = -\frac{1}{2}\frac{f''(\xi_k)}{f'(x_k)} e_k^2. \end{equation} $$
Thus, if there exists a constant \( M \ge 0 \) such that $$ \frac{1}{2}\frac{\lvert f''(\xi_k)\rvert}{\lvert f'(x_k)\rvert} \le M $$ for all \( k \), then we have that $$ \lvert e_{k+1}\rvert \le M \lvert e_k \rvert^2, $$ which means that the sequence \( x_k \) converges quadratically to \( r \) (see the note Preliminaries, section 3.1 on "Convergence of an iterative process").
Finally, we can also see from \eqref{eq:error_Newton} how close we have to be for the whole method to actually converge: If \( \lvert e_0\rvert < 1/M \), then it follows that $$ \lvert e_1 \rvert \le M \lvert e_0 \rvert^2 < \lvert e_0 \rvert, $$ which shows that the error after the first step actually decreases. By repeating this argument (or applying induction), we see that the same holds in each step, and the error actually always decreases.
All these considerations together prove the following theorem:
Assume that the function \( f \) has a root \( r \), and let \( I_\delta=[r-\delta, r+\delta] \) for some \( \delta > 0 \). Assume further that
or short by $$ \mathbf{f}(\mathbf{x}) = \mathbf{0}. $$ where \( \mathbf{f}:\mathbb{R}^n \rightarrow \mathbb{R}^n \).
Example 5: We are given the two equations $$ \begin{align*} x_1^3-x_2 + \frac{1}{4} &= 0 \\ x_1^2+x_2^2 - 1 &= 0 \nonumber \end{align*} $$
This can be illustrated by
example5_graphs
in nonlinearequations.py
.
The solutions of the two equations are the intersections of the two graphs. So there are two solutions, one in the first and one in the third quadrant.
We will use this as a test example in the sequel.
Let \( \mathbf{r}= [r_x, r_y]^T \) be a solution to these equations and some \( \hat{\mathbf{x}}=[\hat{x},\hat{y}]^T \) a known approximation to \( \mathbf{r} \). We search for a better approximation. This can be done by replacing the nonlinear equation \( \mathbf{f}(\mathbf{x})=\mathbf{0} \) by its linear approximation, by a multidimensional Taylor expansion around \( \hat{\mathbf{x}} \): $$ \begin{align*} f(x, y) &= f(\hat{x}, \hat{y}) + \frac{\partial f}{\partial x}(\hat{x}, \hat{y})(x - \hat{x}) + \frac{\partial f}{\partial y}(\hat{x}, \hat{y})(y - \hat{y}) + \dotsc \\ g(x,y) &= g(\hat{x}, \hat{y}) + \frac{\partial g}{\partial x}(\hat{x}, \hat{y})(x - \hat{x}) + \frac{\partial g}{\partial y}(\hat{x}, \hat{y})(y - \hat{y}) + \dotsc \end{align*} $$ where the \( \dotsc \) represent higher order terms, which are small if \( \hat{\mathbf{x}}\approx \mathbf{x} \). By ignoring these terms we get a linear approximation to \( \mathbf{f}(\mathbf{x}) \), and rather than solving the nonlinear original system, we can solve its linear approximation: $$ \begin{align*} f(\hat{x}, \hat{y}) + \frac{\partial f}{\partial x}(\hat{x}, \hat{y})(x - \hat{x}) + \frac{\partial f}{\partial y}(\hat{x}, \hat{y})(y - \hat{y}) &= 0\\ g(\hat{x}, \hat{y}) + \frac{\partial g}{\partial x}(\hat{x}, \hat{y})(x - \hat{x}) + \frac{\partial g}{\partial y}(\hat{x}, \hat{y})(y - \hat{y}) &= 0 \end{align*} $$
or more compact $$ \mathbf{f}(\hat{\mathbf{x}}) + J(\hat{\mathbf{x}})(\mathbf{x}-\hat{\mathbf{x}}) = 0, $$
where the Jacobian \( J(\mathbf{x}) \) is given by $$ J(\mathbf{x}) =\left(\begin{array}{cc} \frac{\partial f}{\partial x}(x, y) & \frac{\partial f}{\partial y}(x, y) \\ \frac{\partial g}{\partial x}(x, y) & \frac{\partial g}{\partial y}(x, y) \end{array} \right) $$
It is to be hoped that the solution of the linear equation \( \mathbf{x} \) provides a better approximation to \( \mathbf{r} \) than our initial guess \( \hat{\mathbf{x}} \), so the process can be repeated, resulting in
The strategy can be generalized to systems of \( n \) equations in \( n \) unknowns, in which case the Jacobian is given by: $$ J(\mathbf{x}) = \left(\begin{array}{cccc} \frac{\partial f_1}{\partial x_1}(\mathbf{x}) & \frac{\partial f_1}{\partial x_2}(\mathbf{x}) & \dotsm & \frac{\partial f_1}{\partial x_n}(\mathbf{x}) \\ \frac{\partial f_2}{\partial x_1}(\mathbf{x}) & \frac{\partial f_2}{\partial x_2}(\mathbf{x}) & \dotsm & \frac{\partial f_2}{\partial x_n}(\mathbf{x}) \\ \vdots & \vdots & & \vdots \\ \frac{\partial f_n}{\partial x_1}(\mathbf{x}) & \frac{\partial f_n}{\partial x_2}(\mathbf{x}) & \dotsm & \frac{\partial f_n}{\partial x_n}(\mathbf{x}) \end{array} \right) $$
newton_system
. The numerical solution is accepted when all components of \( \mathbf{f}(\mathbf{x}_k) \) are smaller than a tolerance in absolute value, that means when \( \|\mathbf{f}(\mathbf{x}_k)\|_{\infty} < \) tol
. See Preliminaries, section 1 for a description of norms.
See the function newton_sys
in nonlinearequations.py
.
Example 6: Solve the equations from Example 5 by Newton's method. The vector valued function \( \mathbf{f} \) and the Jacobian \( J \) are in this case $$ \mathbf{f}(\mathbf{x}) = \left(\begin{array}{c} x_1^3-x_2 + \frac{1}{4} \\ x_1^2+x_2^2 - 1 \end{array} \right) \qquad \text{and} \qquad J(\mathbf{x}) = \left( \begin{array}{cc} 3x_1^2 & -1 \\ 2x_1 & 2x_2 \end{array} \right) $$
We already know that the system has 2 solutions, one in the first and one in the third quadrant. To find the first one, choose \( \mathbf{x}_0=[1,1]^T \) as starting value.
See the function example6
in nonlinearequations.py
.
Finding solutions of nonlinear equations is difficult. Even if the Newton iterations in principle will converge, it can be very hard to find sufficient good starting values. Nonlinear equations can have none or many solutions. Even when there are solutions, there is no guarantee that the solution you find is the one you want.
If \( n \) is large, each iteration is computationally expensive since the Jacobian is evaluated in each iteration. In practice, some modified and more efficient version of Newton's method will be used, maybe together with more robust but slow algorithms for finding sufficiently good starting values.