---
jupytext:
  formats: ipynb,md:myst
  notebook_metadata_filter: all
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.13.0
kernelspec:
  display_name: Python 3 (phys-581)
  language: python
  name: phys-581
---

```{code-cell} ipython3
:cell_style: center
:hide_input: false

import mmf_setup;mmf_setup.nbinit()
import logging;logging.getLogger('matplotlib').setLevel(logging.CRITICAL)
%pylab inline --no-import-all
```

(sec:Assignment1)=
# Assignment 1: Recursion and Invariants

## Fibonacci Numbers

At the end of {ref}`sec:RecursionAndInvariants`, we showed how to use invariants to
prove that a simple iterative approach for finding the n'th Fibonacci number works.  Do
the same with the following more sophisticated algorithm.

First note that we can write the Fibonacci recurrence in terms of a matrix equation
\begin{gather*}
  \begin{pmatrix}
    F_{n+1} \\
    F_{n}
  \end{pmatrix}
  =
  \underbrace{
    \begin{pmatrix}
      1 & 1 \\
      1 & 0
    \end{pmatrix}
  }_{\mat{M}}
  \begin{pmatrix}
    F_{n} \\
    F_{n-1}
  \end{pmatrix}.
\end{gather*}
The idea is to use this to express the final answer in as a product of some matrix power
$\mat{M}^{p}$ times the initial conditions: (You must determine exactly what power $p$
you need to compute $F_{n}$.)
\begin{gather*}
  \begin{pmatrix}
    F_{n} \\
    F_{n-1}
  \end{pmatrix}
  =
  \mat{M}^{p}
  \begin{pmatrix}
    F_{1} \\
    F_{0}
  \end{pmatrix} =
  \begin{pmatrix}
    1 \\
    0
  \end{pmatrix}.
\end{gather*}
The strategy is to compute $\mat{M}^{p}$ efficiently in order $\log_2(p)$ operations.
This is most easily done when $p = 2^{q}$ is a power of 2:
The idea is based on the fact that we can construct $\mat{M}^{2^{q}}$ by repeatedly
squaring the result:
\begin{gather*}
  \mat{M}^{2^{q}} = (\cdots ((\mat{M}\overbrace{^{2})^{2})^{\cdots 2}}^{q = \log_{2}p \times}.
\end{gather*}

To make this work for general $p$, you will need to keep track of more information.
*(Hint, consider the binary representation of $p$.)*

```{code-cell}
:tags: [margin]
import numpy as np
M = np.array([[1, 1],
              [1, 0]])
M2 = M @ M
M4 = M2 @ M2
print(M2)
print(M4)
```
:::{admonition} Assignment: Efficient Fibonacci Numbers
Write a simple program to compute $F_{n}$ based on this algorithm, and prove that it is
correct using invariants.  For matrix multiplication, you can use {py:func}`numpy.array`
and the matrix multiplication {py:func}`operator.matmul` which you use as `A @ B`.  (See
code in the margin.)

*If you want to use this code in the 1ms challenge, then you need to implement your own
version of matrix multiplication.*
:::

:::{solution}

First we solve the problem of raising $\mat{M}^{p}$.  The key is to consider the binary
representation of $p$.  For example, suppose $p = 10110 = 2^4 + 2^2 + 2^1 = 22$: we can
express
\begin{gather*}
  \mat{M}^{22} = \mat{M}^{2^4+2^2 + 2^1}=\mat{M}^{2^4}\mat{M}^{2^2}\mat{M}^{2^1}.
\end{gather*}
The strategy is to iterate through the powers of $2^{q}$ and then accumulate those that
are needed.

Using a custom implementation of matrix multiplication, I can compute up to $F_{16383}$ on
CoCalc, depending on the server load.
:::
```{code-cell}
:tags: [hide-cell]

class Mat2:
    """Class for 2D matrices."""
    def __init__(self, M):
        self.M = M
    
    @property
    def M(self):
        return [self.f00, self.f01, self.f10, self.f11]
    
    @M.setter
    def M(self, M):
        self.f00, self.f01, self.f10, self.f11 = M
        
    def __repr__(self):
        return repr([self.M[:2], self.M[2:]])
        
    def __imatmul__(self, M):
        self.M = (
            self.f00 * M.f00 + self.f01 * M.f10,
            self.f00 * M.f01 + self.f01 * M.f11,
            self.f10 * M.f00 + self.f11 * M.f10,
            self.f10 * M.f01 + self.f11 * M.f11)
        return self
        
    def pow(self, p):
        res = Mat2([1, 0, 0, 1])
        Mp = Mat2(self.M)
        while p > 0:
            if p % 2:
                res @= Mp
            Mp @= Mp
            p //= 2
        return res
    
def pow(M, p):
    """Return M^p."""
    res = np.eye(len(M), dtype=M.dtype)
    Mp = M.copy()
    while p > 0:
        if p % 2:
            res = res @ Mp
        Mp = Mp @ Mp
        p //= 2
    return res
    
rng = np.random.default_rng(seed=2)
M = rng.random(size=(3,3))
M_to_the_p = np.eye(len(M))
for p in range(10):
    assert np.allclose(pow(M, p), M_to_the_p)
    M_to_the_p = M_to_the_p @ M


def fib0(n):
    M = np.array([[1, 1], [1, 0]], dtype=object)
    return pow(M, n)[0, 1]

def fib(n):
    M = Mat2([1, 1, 1, 0])
    return M.pow(n).f01


import timeit

def mytimeit(n, fib=fib, number=10, samples=5):
    """Return an estimate of the time it takes to compute `fib(n)`"""
    ts = [
        timeit.timeit('fib(n)', globals=dict(fib=fib, n=n), number=number)
        for _n in range(samples)
    ]
    return min(ts) / number

def get_n(fib, target=1e-3):  # 1ms
    """Return the largest `n` where `fib(n)` takes < target."""
    n0 = 1
    t0 = mytimeit(n0, fib=fib)
    n1 = 2
    t1 = mytimeit(n1, fib=fib)
    while t1 < target:
        n0, t0 = n1, t1
        n1 *= 2
        t1 = mytimeit(n1, fib=fib)
    
    # Now n0 and n1 should bracket our desired n.  We can use bisection.
    while n1 - n0 > 1:
        n = (n1 + n0)//2
        t = mytimeit(n, fib=fib)
        if t < target:
            n0 = n
        else:
            n1 = n
    return n0

print(get_n(fib))
```

## Evaluating Functions

### Lambert W function
Write a function {func}`phys_581.assignment_1.lambertw` that computes  $w = W_k(x)$, the
[Lambert W function](https://en.wikipedia.org/wiki/Lambert_W_function), for the two
branches $k=0$ and $k=-1$.

This function satisfies:
\begin{gather*}
  z = we^w
\end{gather*}
and $k$ determines which branch you should compute as shown below:

```{code-cell} ipython3
w0 = np.linspace(-1, 1.2, 100)
w1 = np.linspace(-5, -1, 100)
fig, ax = plt.subplots()
for w, k, ls in [(w0, 0, '-'), (w1, -1, '--')]:
    z = w*np.exp(w)
    ax.plot(z, w, linestyle=ls, 
            label=f"Branch $k={k}$: $w=W_{{{k}}}(z)$")
ax.legend()
ax.set(xlabel='z', ylabel='w');
```

### Riemann zeta function

Write a function {func}`phys_581.assignment_1.zeta` that computes the [Riemann zeta
function](https://en.wikipedia.org/wiki/Riemann_zeta_function)
\begin{gather*}
  \zeta(s) = \sum_{n=1}^{\infty} \frac{1}{n^{s}}.
\end{gather*}
Make sure your function returns a result with relative accuracy within a few orders of
magnitude of machine precision for all values of $s > 1$ where the result does not
overflow.

## Differentiation
Write a function {func}`phys_581.assignment_1.derivative` that numerically computes the
derivatives of a function $f(x)$ at a point $x$: 
```{code-cell}
from phys_581.assignment_1 import derivative

x = 1.0
for d, exact in [
    (0, np.sin(x)),
    (1, np.cos(x)),
    (2, -np.sin(x)),
    (3, -np.cos(x)),
    (4, np.sin(x))
]:
    res = derivative(np.sin, x=x, d=d)
    rel_err = abs(res-exact)/abs(exact)
    print(f"d={d}: numeric={res:.8g}, exact={exact:.8g}, rel_err={rel_err:.4g}")
```

The default code carefully chooses an optimal value of $h$ as discussed below, then uses
a recursive approach to compute higher derivatives.  Can you improve the accuracy of
this?

**Bonus**: Why does the following trick work to machine precision?

```{code-cell} ipython3
def derivative1(f, x):
    """Compute the first 1 derivative extremely accurately for *some* functions."""
    h = 1e-64j
    return ((f(x+h) - f(x))/h).real

print(f"f=sin(x): err = {abs(derivative1(np.sin, x) - np.cos(x))}")
print(f"f=exp(x): err = {abs(derivative1(np.exp, x) - np.exp(x))}")
f = lambda x: np.exp(-x**2)
df = lambda x: -2*x*np.exp(-x**2)
print(f"f=exp(-x**2): err = {abs(derivative1(f, x) - df(x))}")
```

## More Details 
### Roundoff vs Truncation Error

Consider computing the derivative of a function using finite-difference techniques:

\begin{gather*}
  f'(x) = \lim_{h\rightarrow 0} \frac{f(x+h) - f(x)}{h}.
\end{gather*}

We can estimate the error computing this by using the Taylor series for the function:

\begin{gather*}
  f(x\pm h) = f(x) \pm h f'(x) + \frac{h^2}{2}f''(x) \pm \frac{h^3}{3!}f'''(x) +  \sum_{n=4}^{\infty} \frac{(\pm h)^n}{n!}f^{(n)}(x),\\
  D^+_h f(x) = \frac{f(x+h) - f(x)}{h} = f'(x) + \frac{h}{2}f''(x) + \order(h^2).
\end{gather*}

This *forward-difference* approximation thus has a **truncation-error** that scales as $hf''(x)/2$.  We can do a bit better if we take the centered-difference approximation:

\begin{gather*}
  D_h f(x) = \frac{f(x+h) - f(x-h)}{2h} + \frac{h^2}{6}f'''(x) + \order(h^3). 
\end{gather*}

However, this is not the only source of error.  If all goes well, the values $f(x\pm h)$ will be computed with a relative error of machine precision $\epsilon$, which means they will have have an absolute error of $\approx \epsilon f(x)$.  Dividing this error by $2h$, however, means that the whole expression will have an absolute error of about $\epsilon f(x)/2h$ because of **roundoff error**.  Notice that this gets significantly **worse** as $h$ gets small.

With this formula, the best we can do is to choose

\begin{gather*}
  h \approx \sqrt[3]{3\epsilon \left\lvert\frac{f(x)}{f'''(x)}\right\rvert}
\end{gather*}


The following plot shows that these estimates are accurate.

```{code-cell} ipython3
:hide_input: false

eps = np.finfo(float).eps
h = 10 ** np.linspace(-12, 1, 100)
x = 1.0
f = np.sin
df = np.cos
d3f = lambda x: -np.sin(x)
Df_x = (f(x + h) - f(x - h)) / 2 / h  # Centered difference approx
err = abs(Df_x - df(x))
truncation_err = abs(h ** 2 * d3f(x) / 6)
roundoff_err = abs(eps * f(x) / 2 / h)
h_opt = (3*eps*abs(f(x)/d3f(x)))**(1/3)
fig, ax = plt.subplots()
ax.loglog(h, err)
ax.plot(h, truncation_err, "--", label="Truncation error: $h^2 f'''(x)/6$")
ax.plot(h, roundoff_err, ":", label="Roundoff error: $f(x)/2h$")
ax.axvline(h_opt, c='y', ls='-.', label='optimal $h$')
ax.set(xlabel="h", ylabel="abs err", ylim=(1e-16, 1))
ax.legend()
```

```{code-cell}
:tags: [margin, hide-input]
eps = np.finfo(float).eps
h = np.linspace(0.99979, 0.99983, 100)*1e-11
Df_x = (f(x + h) - f(x - h)) / 2 / h
err = abs(Df_x - df(x))

# Slightly better approach.  Compute denominator to include some roundoff error.
h2 = (x+h) - (x-h)
Df2_x = (f(x + h) - f(x - h)) / h2
err2 = abs(Df2_x - df(x))

fig, axs = plt.subplots(3, 1, figsize=(4, 6), constrained_layout=True)
ax = axs[0]
ax.semilogy(h, err)
ax.semilogy(h, err2)
ax.plot(h, 2 * abs(f(x)*eps) / 2 / h, "--y")
ax.plot(h, abs(f(x)*eps/2) / 2 / h, ":y")
ax.set(xlabel="h", ylabel="abs err", ylim=(1e-7, 3e-5));

ax = axs[1]
ax.plot(h, (x + h) - x, label="(x+h) - x")
ax.plot(h, x - (x - h), '--', label="x - (x-h)")
ax.set(xlabel="h");
ax.legend()

ax = axs[2]
ax.semilogy(h,f(x + h) - f(x - h))
ax.set(xlabel="h", ylabel="$f(x+h) - f(x-h)$");
```
Notice that the truncation error is smooth, but the roundoff error appears random.  As
Alex discusses {cite:p}`Gezerlis:2020` (see Fig. 2.3), roundoff errors are not random.
We can see this by zooming in:

```{code-cell}
eps = np.finfo(float).eps
h = 10 ** np.linspace(-11, -11.0002, 1000)
Df_x = (f(x + h) - f(x - h)) / 2 / h  # Centered difference approx
err = abs(Df_x - df(x))

# Slightly better approach.  Compute denominator to include some roundoff error.
h2 = (x+h) - (x-h)
Df_x2 = (f(x + h) - f(x - h)) / h2  # Centered difference approx
err2 = abs(Df_x2 - df(x))

fig, ax = plt.subplots(figsize=(10,2))
ax.semilogy(h, err, label="$(f(x+h) - f(x-h))/2/h$")
ax.semilogy(h, err2, label="$(f(x+h) - f(x-h))/((x+h) - (x-h))$")
ax.plot(h,  abs(f(x)*eps) / 2 / h, "--y", label="$|f|\epsilon/2h$")
ax.plot(h, abs(f(x)*eps / 2) / 2 / h, ":y", label="$|f|\epsilon/4h$")
ax.set(xlabel="h", ylabel="abs err", ylim=(1e-7, 3e-5))
ax.legend();
```

:::{margin}
Note that $h\sim 10^{-12}$ is not at machine precision, but the range explored is small:
`h.max() - h.min()`$= 4\times 10^{-16}$.  Hence, only a few different floating point
numbers $x+h$ and $x-h$ are actually explored here.
:::
If you want to understand this in detail, consider the structure near $h=0.9998\times
10^{-11} \approx 2^{-36}$ shown in the margin.  This shows that $h$ is small enough that
$x+h$ and $x-h$ step through only a few different values due at machine precision.

:::{admonition} Assignment: Floating Point Representation

Write a function that prints a summary of the floating point representation of a given
number.  See the code below for some hints (and useful python tools).
:::

```{code-cell}
import struct   # Tools for unpacking bits of data
import numpy as np

pi = np.pi

print(np.finfo(pi))  # Floating point info about the variable.
```

Here are some ways of manipulating data and types.

```{code-cell}

# Convert the type of a numpy scalar so we can get the dtype:
pi_ = np.array(pi)
pi_.dtype
```

You can query the {py:class}`numpy.dtype` object for information.  Once you have an
array, you can {py:meth}`~numpy.ndarray.view` it as an integer:

```{code-cell}
pi_.view(np.uint64)
```

This can then be converted to a binary number with the {py:func}`bin` function.  The
result is a string that we might need to zero-pad to get the full 64-bit representation:
```{code-cell}
bin_str = bin(pi_.view(np.uint64)) 
print(bin_str)
print(f"{bin_str[2:].zfill(64)}")  # Strip of `0b` prefix, then zero-pad for 64 bits
```

Which portion of these bits are the mantissa? Sign? Exponent? etc.  Hint:

```{code-cell}
bin(int(pi * 2**53))
```

## References
* <https://en.wikipedia.org/wiki/IEEE_754>.
* {py:func}`numpy.format_float_scientific`.
* <https://github.com/numpy/numpy/issues/10288>: Meaning of `np.float128`... it is not
  IEEE 128.