Hide code cell content

import mmf_setup

mmf_setup.nbinit()
import logging

logging.getLogger("matplotlib").setLevel(logging.CRITICAL)
%matplotlib inline
import numpy as np, matplotlib.pyplot as plt

This cell adds /home/docs/checkouts/readthedocs.org/user_builds/wsu-phys-581-computation/checkouts/latest/src to your path, and contains some definitions for equations and some CSS for styling the notebook. If things look a bit strange, please try the following:

  • Choose "Trust Notebook" from the "File" menu.
  • Re-execute this cell.
  • Reload the notebook.

Assignment 4: Modeling Data#

In this assignment you will test your model against sample “experimental” in order to try to make a “discovery” with properly quantified errors and confidence values. In these notes, we will perform a sample analysis with a periodic signal, but you should use your chosen model for your analysis.

Scenario#

Imagine that your colleagues have performed an expensive experiment and obtained a small number \(N_d\) of sets of data consisting of tabulated data \(D_{d} = \{(x_n, y_n)\}\) which you believe are generated from your model

\[\begin{gather*} y_n = f(x_n, \vect{a}) + e_n(x_n) \end{gather*}\]

where \(\vect{a}\) are unknown parameters, and \(e_{n}\) are the experimental errors.

Your task is to learn as much as possible about the parameters \(\vect{a}\) from the data, and to express this succinctly and accurately. In particular, you should provide:

  1. An estimate of the “best fit” values of the parameters \(\vect{a}\).

  2. An estimate of the errors/uncertainties in these parameters.

  3. A statistical measure of the goodness-of-fit.

(See the discussion at the end of section 15.0.0 of [Press et al., 2007].)

Here we will use the least squares approach, attempting to find parameters \(\vect{a}\) that minimize

\[\begin{gather*} \chi^2(\vect{a}) = \sum_{n} \left(\frac{f(x_n,\vect{a})-y_n}{\sigma_n}\right)^2, \qquad \min_{\vect{a}} \chi^2(\vect{a}) = \chi^2(\bar{\vect{a}}). \end{gather*}\]

The minimum \(\bar{\vect{a}}\) here will provides the first objective, the parameter estimates. Near this, \(\chi^2(\vect{a})\) will be quadratic:

\[\begin{gather*} \chi^2(\vect{a}) = \chi^2(\bar{\vect{a}}) + (\vect{a} - \bar{\vect{a}})^T\cdot \mat{C}^{-1}\cdot (\vect{a} - \bar{\vect{a}}) + \order(\delta\vect{a}^3). \end{gather*}\]

The matrix \(\mat{C}\) is called the covariance matrix. This will form the basis for our characterization of the errors or uncertainties in the parameters. The uncertainties in the parameters will be characterized by ellipsoids of constant \(\chi^2(\vect{a})\).

Finally, we shall see that under appropriate conditions – namely, that the \(e_n\) be normally distributed and that the model \(f(x, \vect{a})\) is approximately linear over the region of parameter uncertainties – then we can characterize the distribution of \(\chi^2(\vect{a})\) to determine the confidence region. Specifically, in this case, the reduced chi-square statistic \(\chi^2_r \approx 1\) should be close to 1 if the model is good and the errors have been appropriately characterized:

\[\begin{gather*} \chi^2_r(\vect{a}) = \frac{\chi^2(\vect{a})}{\nu}, \qquad \nu = N_{\mathrm{data}} - N_{\mathrm{parameters}}. \end{gather*}\]

Important

If the errors \(e_n\) are normally distributed, and the model is linear, then you can use the results in section 15.6.5 of [Press et al., 2007] to determine which contours of constant \(\chi^2(\vect{a})\) correspond to which confidence regions.

If the errors are not normally distributed, then you should not rely on \(\chi^2_r(\vect{a})\) to characterize your goodness-of-fit, or your confidence regions. Instead you must perform a Monte Carlo simulation to see how \(\chi^2(\vect{a})\) is distributed, then from this determine the contours of \(\chi^2(\vect{a})\) which correspond to which contour region.

If you model is not sufficiently linear, then you likely need to consider more advanced techniques such as MCMC to characterize your uncertainties, which will likely be more complicated regions than ellipsoids in parameter space.

Model: Periodic Signals#

Here we will work with the following model, looking for periodic signals in data:

\[\begin{gather*} y = c + A\cos(\omega t + \phi) = f(t,\vect{a}), \qquad \vect{a} = (\omega, c, A, \phi). \end{gather*}\]

Our primary focus here will be to find the frequency \(\omega\) or period \(T=2\pi / \omega\) (and possibly the phase \(\phi\)). The other parameters \(c\), \(A\), (and \(\phi\)) are called nuisance parameters: we need to include them in our analysis, but we don’t really care about their values.

Obtaining Fits#

Before looking at real data, it is a good idea to generate some sample data with known errors, and see what this means. Here we start without thinking too much, applying the standard approach of least squares fitting.

from collections import namedtuple
from uncertainties import correlated_values, unumpy as unp

Nt = 100
t_max = 10.0

# Exact parameter values
Params = namedtuple("Params", ["w", "c", "A", "phi"])
w, c, A, phi = a_exact = Params(w=2 * np.pi / 5, c=1.2, A=3.4, phi=5.6)


def f(t, w, c, A, phi):
    """Model function."""
    return c + A * np.cos(w * t + phi)


def f_wrong(t, w, c, A, phi):
    """Wrong model function."""
    return c + A * np.cos(w * t + phi) ** 3


# Exact data and experimental errors
t = np.linspace(0, t_max, Nt)
y = f(t, *a_exact)
sigmas = 0.5 * np.ones(Nt)
wrong_sigmas = sigmas / 2.0

rng = np.random.default_rng(seed=2)
ydata = rng.normal(loc=y, scale=sigmas)

Hide code cell source

# Best fit of wrong model for plotting
from scipy.optimize import curve_fit

a_wrong = curve_fit(f=f_wrong, xdata=t, ydata=ydata, p0=a_exact)[0]

fig, axs = plt.subplots(2, 1, figsize=(10, 10))
ax = axs[0]
ax.errorbar(t, ydata, yerr=wrong_sigmas, fmt="C0.", ecolor="C1", label="data")
ax.plot(t, y, "-C2", label="exact")
ax.plot(t, f_wrong(t, *a_wrong), "--C3", label="wrong model")
ax.set(xlabel="$t$", ylabel="$f(t)$", title="Wrong (underestimated) errors")
ax.legend()

ax = axs[1]
ax.errorbar(t, ydata, yerr=sigmas, fmt="C0.", ecolor="C1", label="data")
ax.plot(t, y, "-C2", label="exact")
ax.plot(t, f_wrong(t, *a_wrong), "--C3", label="wrong model")
ax.set(xlabel="$t$", ylabel="$f(t)$", title="Correctly estimated errors")
ax.legend()
<matplotlib.legend.Legend at 0x756301017d90>
../_images/7e8e729e998aa2f532e666bc2afe51903e620fe500009f57a4bfb20f94a0b593.png

Important

Look closely at the errors in the bottom figure and how they do not all overlap the exact model. These are \(1\sigma\) error bars, meaning that about 68% of the time, the data should lie within this distance from the model, but that 32% of the data should lie further away. As we have carefully generated the deviations here, this gives a visual idea of what a model and consistent data (with errors) should look like. Can you “see” that the wrong model is inconsistent with the data and errors?

We perform three fits to demonstrate the different functions available in SciPy. Note: we use the wrong_sigmas here which underestimate the errors by a factor of 2. This will result in an excessively large \(\chi^2_r \gg 1\), and demonstrate the meaning of the absolute_sigma=False flag provided by curve_fit that effectively scales the errors so that \(\chi^2_r = 1\) before providing the covariance matrix \(\mat{C}\).

Hide code cell source

def show(p):
    """Nicer displaying of parameters with covariance.
    
    Assumes `p` is a `namedtuple` of values with correlated uncertanties::
    
        p = Params(*correlated_values(a, covariance_mat=C))
    """
    # Format .2uS means show 2 digits of precision in the uncertainties with the
    # short-form: e.g. 0.12(34)
    values = ",".join(map("{0[0]}={0[1]:.2uS}".format, zip(p._fields, p)))
    print(f"{p.__class__.__name__}({values})")


print("Defined function show() for displaying parameter estimates")
Defined function show() for displaying parameter estimates

curve_fit#

scipy.optimize.curve_fit is a good choice for fitting data to a curve. It provides the covariance matrix with the option of automatically scaling of the errors to get a reduced \(\chi^2_r = 1\) if you don’t know the absolute magnitude of the errors.

This routine takes your model function f(x, p1, p2, ...) and your data as an input. If you do not provide a guess, it will try a guess of ones.

from scipy.optimize import curve_fit

nu = len(t) - len(a_exact)
a, C = curve_fit(
    f=f, xdata=t, ydata=ydata, p0=a_exact, sigma=wrong_sigmas, absolute_sigma=True
)
a_ = Params(*correlated_values(a, covariance_mat=C))
chi2_r = (((f(t, *a) - ydata) / wrong_sigmas) ** 2).sum() / nu

print("Fit with wrong errors (too small)")
show(a_)
print(f"chi^2_r = {chi2_r:.2g}")

print("\nAfter scaling C*chi^2_r - same as absolute_sigma=False")
a_ = Params(*correlated_values(a, covariance_mat=C * chi2_r))
show(a_)

a, C = curve_fit(
    f=f, xdata=t, ydata=ydata, p0=a_exact, sigma=wrong_sigmas, absolute_sigma=False
)
a_ = Params(*correlated_values(a, covariance_mat=C))
show(a_)

print("\nFit with wrong model.")
a_wrong, C_wrong = curve_fit(
    f=f, xdata=t, ydata=ydata, p0=a_exact, sigma=sigmas, absolute_sigma=True
)
chi2_r_wrong = (((f_wrong(t, *a_wrong) - ydata) / sigmas) ** 2).sum() / nu
a_wrong_ = Params(*correlated_values(a_wrong, covariance_mat=C_wrong))
show(a_wrong_)
print(f"chi^2_r = {chi2_r_wrong:.2g}")

print("\nFit with correct errors and model")
a, C = curve_fit(
    f=f, xdata=t, ydata=ydata, p0=a_exact, sigma=sigmas, absolute_sigma=True
)
a_ = Params(*correlated_values(a, covariance_mat=C))
chi2_r_correct = (((f(t, *a) - ydata) / sigmas) ** 2).sum() / nu
show(a_)
print(f"chi^2_r = {chi2_r_correct:.2g}")
Fit with wrong errors (too small)
Params(w=1.2542(38),c=1.201(26),A=3.415(35),phi=5.613(23))
chi^2_r = 3.8

After scaling C*chi^2_r - same as absolute_sigma=False
Params(w=1.2542(73),c=1.201(51),A=3.415(69),phi=5.613(44))
Params(w=1.2542(73),c=1.201(51),A=3.415(69),phi=5.613(44))

Fit with wrong model.
Params(w=1.2542(76),c=1.201(52),A=3.415(71),phi=5.613(46))
chi^2_r = 3.6

Fit with correct errors and model
Params(w=1.2542(76),c=1.201(52),A=3.415(71),phi=5.613(46))
chi^2_r = 0.94

Hide code cell source

fig, axs = plt.subplots(1, 3, figsize=(10, 3), sharey=True)

ax = axs[0]
ax.errorbar(t, f(t, *a) - ydata, yerr=wrong_sigmas, fmt="C0.", ecolor="C1")
ax.set(
    xlabel="$t$",
    ylabel="$f(t_n) - y_n$",
    title=fr"$\chi^2_r={chi2_r:.2g}$" + "\nCorrect model, wrong errors",
)

ax = axs[1]
ax.errorbar(t, f_wrong(t, *a_wrong) - ydata, yerr=sigmas, fmt="C0.", ecolor="C1")
ax.set(
    xlabel="$t$",
    title=rf"$\chi^2_r={chi2_r_wrong:.2g}$" + "\nWrong model, correct errors",
)

ax = axs[2]
ax.errorbar(
    t, f(t, *a) - ydata, yerr=sigmas, fmt="C0.", ecolor="C1", label="correct model"
)
ax.set(
    xlabel="$t$",
    title=rf"$\chi^2_r={chi2_r_correct:.2g}$" + "\nCorrect model and errors",
)
for ax in axs:
    ax.grid(True)
../_images/b6491e2661288f2ea3400bf8c441e1a7040d05dbf255f3f0c4c8d1585eb2e357.png

Hide code cell content

# Glue value so we can use it in the documents
from myst_nb import glue

glue("wrong_chi2_r", chi2_r)
glue("correct_chi2_r", chi2_r_correct)
np.float64(3.7651084919443574)
np.float64(0.9412771229860893)

The last result is what you should be aiming for. Here we have used a correct estimate of the errors, and found that \(\chi^2_r = \) . This gives us the third component of model fitting – a statistical measure of the goodness-of-fit that is valid if the errors are small enough that a linear approximation to the model is valid. We should check this (we will below), but if it is valid, then we can trust the other parts – the parameter estimates and their correlated uncertainties.

Important

In the other cases, the large value \(\chi^2_r = \) indicates that something is wrong, and we must be very careful interpreting the results. If (and only if) we are confident in the errors (as we should be) then we can use this to reject the model as an exceedingly unlikely explanation of the data, even though the model “looks” good (“chi-by-eye”).

However, if, after plotting the residuals, it looks like there are no systematic deviations and the residuals look flat as they do on the left, then one might reasonably suspect that the errors have been underestimated. Go back to the experimentalists and see if something is wrong. In the meantime, you might consider using the absolute_sigma=False option to get estimates of the parameter uncertainties while you wait, but don’t publish until you understand why your \(\chi^2_r\) is so large.

If the model is incorrect, then one typically sees some sort of systematic deviations like the periodic peak structure in the middle plot. In this case, the parameter estimates and values do not really have any meaning, and should be discarded until a better model is found.

least_squares#

scipy.optimize.least_squares is a bit more general. This function needs to be provided with the list of weighted residuals:

\[\begin{gather*} r_n = \frac{y_n - f(x_n, \vect{a})}{\sigma_n} \end{gather*}\]

and will the minimize the cost function

\[\begin{gather*} \frac{\chi^2}{2} = \frac{1}{2}\sum_{n} r_n^2. \end{gather*}\]

You could customize this, for example, to allow fitting of complex-valued data. You must provide an initial guess for the parameter values here.

To construct the covariance matrix, you can use the returned Jacobian:

\[\begin{gather*} \mat{C}^{-1} = \mat{J}^T\cdot \mat{J}. \end{gather*}\]

To reproduce the results of curve_fit with absolute_sigma=False, you can normalize this by \(\chi^2_r\):

\[\begin{gather*} \mat{C} \rightarrow \chi^2_r \mat{C}. \end{gather*}\]
from scipy.optimize import least_squares


def get_errors(a):
    return (f(t, *a) - ydata) / wrong_sigmas


res = least_squares(fun=get_errors, x0=a_exact)

a = res.x
J = res.jac
C = np.linalg.inv(J.T @ J)
a_ = Params(*correlated_values(a, covariance_mat=C))
nu = len(t) - len(a)
chi2_r = (get_errors(a) ** 2).sum() / nu

print("Fit with wrong errors (too small)")
show(a_)
print(f"chi^2_r = {chi2_r:.2g}")

a_ = Params(*correlated_values(a, covariance_mat=C * chi2_r))
print()
print("After scaling errors to get chi^2_r=1")
show(a_)
Fit with wrong errors (too small)
Params(w=1.2542(38),c=1.201(26),A=3.415(35),phi=5.613(23))
chi^2_r = 3.8

After scaling errors to get chi^2_r=1
Params(w=1.2542(73),c=1.201(51),A=3.415(69),phi=5.613(44))

minimize#

scipy.optimize.minimize is a generic minimizer. This provides you with the most flexibility, but you must provide the objective function and carefully scale the results to get the covariance matrix. As can be deduced from least_square, you will get the correct covariance matrix \(\mat{C} = \mat{H}^{-1}\) where \(\mat{H}\) is the hessian (second-derivative matrix) of the objective. However, if you do this, you will likely have issues with tolerances because minimize expects the objective function to have an optimal value close to 1.

Instead, we should use the reduced \(\chi^2_r\) as an objective, which will require scaling the hessian to get the parameter covariance matrix:

\[\begin{gather*} \min_{\vect{a}} \chi^2_r(\vect{a}) = \frac{1}{\nu}\sum_{n}\left(\frac{f(x_n;\vect{a}) - y_b}{\sigma_n}\right)^2, \\ \nu = N_{\text{data}} - N_{\text{parameters}},\qquad \mat{C} = \frac{\mat{H}^{-1}}{\nu/2}. \end{gather*}\]
from scipy.optimize import minimize


def get_chi2_r(a):
    errors = (f(t, *a) - ydata) / wrong_sigmas
    nu = len(t) - len(a)
    return (abs(errors) ** 2).sum() / nu


res = minimize(fun=get_chi2_r, x0=a_exact, method="BFGS", tol=1e-6)

print(res.message)
assert res.success
a = res.x
nu = len(t) - len(a)
C = res.hess_inv / (nu / 2)
a_ = Params(*correlated_values(a, covariance_mat=C))
chi2_r = get_chi2_r(a)

print("Fit with wrong errors (too small)")
show(a_)
print(f"chi^2_r = {chi2_r:.2g}")

print()
a_ = Params(*correlated_values(a, covariance_mat=C * chi2_r))
print("After scaling errors to get chi^2_r=1")
show(a_)
Desired error not necessarily achieved due to precision loss.
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[9], line 13
     10 res = minimize(fun=get_chi2_r, x0=a_exact, method="BFGS", tol=1e-6)
     12 print(res.message)
---> 13 assert res.success
     14 a = res.x
     15 nu = len(t) - len(a)

AssertionError: 

Changing Variables#

Once we have established the best fit parameters \(\vect{a}\) and covariance matrix \(\mat{C} = \mat{C}^T\) we have the following model about the minimum \(\bar{\vect{a}}\):

\[\begin{gather*} \overbrace{\delta\chi^2(\vect{a})}^{\chi^2(\vect{a}) - \chi^2(\bar{\vect{a}})} = \delta\vect{a}^T\cdot \mat{C}^{-1}\cdot \overbrace{\delta\vect{a}}^{\vect{a} - \bar{\vect{a}}} + \order(\delta\vect{a}^3) \end{gather*}\]

Once we have these results, we might want to consider other parameter combinations \(\vect{b} = \vect{g}(\vect{a})\). We can immediately transform these results if the \(\vect{g}(\vect{a})\) is sufficiently linear about \(\bar{\vect{a}}\):

\[\begin{gather*} \vect{b} = \vect{g}(\vect{a}) \approx \overbrace{\vect{g}(\bar{\vect{a}})}^{\bar{\vect{b}}} + \mat{J}\cdot\overbrace{\delta \vect{a}}^{\vect{a} - \bar{\vect{a}}},\\ \qquad \delta\vect{a} = \mat{J}^{-1}\cdot \delta\vect{b}, \qquad [\mat{J}]_{ij} = \pdiff{g_i(\bar{\vect{a}})}{\bar{a}_i},\\ \delta\chi^2 = \delta\vect{b}^T\cdot\mat{C}_{b}^{-1}\cdot\delta\vect{b} + \order(\delta\vect{b}^3), \qquad \mat{C}_{b} = \mat{J}\cdot\mat{C}\cdot\mat{J}^T. \end{gather*}\]

This type of transformation can be done automatically by the uncertainties package, which propagates the correlated errors through any arithmetic operations and non-linear functions in the uncertainties.umath or uncertainties.unumpy modules. For example, in our model, we might want to consider the time at which the periodic signal is a maximum, rather than the phase:

\[\begin{gather*} c + A\cos(\omega t + \phi) = c + A\cos\big(\omega (t - t_0)\big), \qquad t_0 = -\phi / \omega. \end{gather*}\]

If \(\phi\) and \(\omega\) had independent gaussian errors, then we could use standard error analysis techniques to deduce that the relative error in \(t_0\) would be the sum of the squares of the relative errors in \(\phi\) and \(\omega\):

\[\begin{gather*} \delta t_0 = \abs{t_0}\sqrt{\left(\frac{\delta \phi}{\phi}\right)^2 + \left(\frac{\delta \omega}{\omega}\right)^2}. \end{gather*}\]
from uncertainties import ufloat, covariance_matrix, correlated_values

a_ = Params(*correlated_values(a, covariance_mat=C))

w = a_.w
# Make phi lie between -2*np.pi and 0 so t0 is positive
phi = a_.phi % (2 * np.pi) - 2 * np.pi
t0 = -phi / w
print(f"Correlated   w={w:.3uS}, phi={phi:.3uS}, t0={t0:.3uS}")

# Now compute result as if errors were uncorrelated
w_ = ufloat(w.n, w.s)
phi_ = ufloat(phi.n, phi.s)
t0_ = -phi_ / w_
print(f"Uncorrelated w={w_:.3uS}, phi={phi_:.3uS}, t0={t0_:.3uS}")
dt0 = t0.n * np.sqrt((w.s / w.n) ** 2 + (phi.s / phi.n) ** 2)
print(f"Error from uncorrelated error formula dt0 = {dt0:.3g}")
Correlated   w=1.25424(378), phi=-0.6702(228), t0=0.5343(168)
Uncorrelated w=1.25424(378), phi=-0.6702(228), t0=0.5343(183)
Error from uncorrelated error formula dt0 = 0.0183
/tmp/ipykernel_4099/1820333103.py:7: FutureWarning: AffineScalarFunc.__mod__() is deprecated. It will be removed in a future release.
  phi = a_.phi % (2 * np.pi) - 2 * np.pi

We see here that the computed error in \(t_0\) from the actual parameters is comparable but slightly less than the estimate given by the uncorrelated error formula above. This is because the errors in \(\omega\) and \(\phi\) have some correlations as can be seen in the off-diagonal entries of the 2-parameter covariance matrix:

covariance_matrix([phi, w])
[[np.float64(0.000521378072136006), np.float64(-7.69544723296118e-05)],
 [np.float64(-7.69544723296118e-05), np.float64(1.4304206417483636e-05)]]

Principal Component Analysis (PCA)#

By diagonalizing the symmetric covariance matrix \(\mat{C} = \mat{C}^T\), we can perform a [principal component analysis] (PCA). We note that about the minimum \(\bar{\vect{a}}\), we have the model:

\[\begin{gather*} \overbrace{\delta\chi^2(\vect{a})}^{\chi^2(\vect{a}) - \chi^2(\bar{\vect{a}})} = \delta\vect{a}^T\cdot \mat{C}^{-1}\cdot \overbrace{\delta\vect{a}}^{\vect{a} - \bar{\vect{a}}} + \order(\delta\vect{a}^3). \end{gather*}\]

By diagonalizing the covariance matrix, we can find the “principal components” – those linear combinations of the original parameters that are most tightly constrained by the data:

\[\begin{gather*} \mat{C} = \mat{V}\cdot\mat{D}\cdot\mat{V}^T, \qquad \mat{C}\cdot\vect{v}_i = \vect{v}_i \sigma_i^2, \\ \mat{D} = \diag(\vect{\sigma}^2), \qquad \mat{V} = \begin{pmatrix} \vect{v}_0 & \vect{v}_1 & \cdots \end{pmatrix}. \end{gather*}\]

In this new basis, we have the model

\[\begin{gather*} \delta\chi^2(\vect{a}) = (\mat{V}^T\cdot\delta\vect{a})^T \cdot\mat{D}^{-1}\cdot(\mat{V}^T\cdot \delta\vect{a}) = \sum_{i}\frac{\lambda_i^2}{\sigma_i^2}, \qquad \lambda_i = \vect{v}_{i}^T\cdot\delta\vect{a}. \end{gather*}\]

The columns of \(\vect{V}\) thus define an orthonormal basis (hence \(\mat{V}^T = \mat{V}^{-1}\)) for parameter space such that the linear combinations of parameters along these directions \(\lambda_i\) are independent, and normally distributed with variance \(\sigma_i^2\).

Note that the \(\lambda_i\) parameters represent linear combinations of the physical parameters \(\vect{a}\). While these are well defined, the eigenvectors themselves \(\vect{v}_i\) may not make much sense unless the original variables \(\vect{a}\) are appropriately scaled to make the components of the eigenvectors compatible. One such option is to scale each parameter by its best fit value \(\bar{\vect{a}}\), then we can write:

\[\begin{gather*} \lambda_i = \overbrace{\vect{v}_i \bar{\vect{a}}}^{\tilde{\vect{v}}_i} \cdot \frac{\delta \vect{a}}{\bar{\vect{a}}} \end{gather*}\]

where the \(\vect{v}_i \bar{\vect{a}}\) and \(\delta \vect{a}/\bar{\vect{a}}\) denote element-wise multiplication and division respectively. The vectors \(\tilde{v}_i\) now provide a PCA of the relative errors, and it is often instructive to plot the components of these vectors:

from uncertainties.unumpy import nominal_values


def num2tex(x, digits=2, min_power=-2, max_power=3):
    """Return a nice LaTeX string for the number."""
    power = int(np.floor(np.log10(abs(x))))
    mantissa = x / 10 ** power
    if min_power <= power and power <= max_power:
        mantissa = mantissa * 10 ** power
        format = f"{{:.{digits}g}}"
    elif power == 1:
        format = rf"{{:.{digits-1}f}}\times 10"
    else:
        format = rf"{{:.{digits-1}f}}\times 10^{{{{{{}}}}}}"
    return format.format(mantissa, power)


def plot_pca(a, C, labels=None, relative=True):
    """Draw a PCA eigenvector plot.
    
    Arguments
    ---------
    a : Params
        Parameter values
    C : array
        Covariance matrix
    labels : [str]
        Labels for the parameters (will use `a._fields` if `None`)
    """
    if labels is None:
        labels = p._fields
    d, V = np.linalg.eigh(C)
    assert np.allclose((V * d[np.newaxis, :]) @ V.T, C)
    if relative:
        V *= a[:, np.newaxis]
        labels = [fr"${_l}/\bar{{{_l}}}$" for _l in labels]
    else:
        labels = list(map("${}$".format, labels))

    fig, axs = plt.subplots(len(a), 1, gridspec_kw=dict(hspace=0), sharex=True)
    is_ = np.arange(len(a_))
    for i, ax, sigma2, v, label in zip(is_, axs, d, V.T, labels):
        # Normalize so maximum component is 1
        ind = np.argmax(abs(v))
        _fact = v[ind]
        sigma = np.sqrt(sigma2 / _fact ** 2)
        ax.bar(is_, v / _fact)
        ax.set(ylim=(-1.2, 1.2))
        ax.yaxis.set_label_position("right")
        ax.set_ylabel(fr"$\sigma_{i} = {num2tex(sigma)}$", ha="left", rotation=0)
        ax.grid(True)
    ax.set_xticks(is_)
    ax.set_xticklabels(labels)


plot_pca(a, C, labels=[r"\omega", r"c", r"A", r"\phi"])
display(plt.gcf())

# New parameters with t0 instead of w.  Note that the uncertainties package
# can construct a covariance matrix from any set of parameters.
p_new = np.concatenate([a_[:3], [t0]])
C_new = covariance_matrix(p_new)
plot_pca(nominal_values(p_new), C_new, labels=[r"\omega", r"c", r"A", r"t_0"])
display(plt.gcf())
plt.close("all")

show(a_)
show(Params(*(a_ / a)))
../_images/8a69e4c0048809ca0df7e5a648829e8bbe1eccab0e25de773223db71971867a0.png ../_images/b124635c371eb72a1863d0ad398e1bf1b785488c144aa6a11f4c90a803b9b54f.png
Params(w=1.2542(38),c=1.201(26),A=3.415(35),phi=5.613(23))
Params(w=1.0000(30),c=1.000(22),A=1.000(10),phi=1.0000(41))
show(Params(*correlated_values(a, C)))
np.sqrt(np.diag(C / a[:, None] / a[None, :]))
np.sqrt(np.diag(C))
Params(w=1.2542(38),c=1.201(26),A=3.415(35),phi=5.613(23))
array([0.00378209, 0.02621777, 0.03534004, 0.0228337 ])

Correlated Errors and the Covariance Matrix#

The covariance matrix \(\mat{C}\) describes the local dependence of \(\chi^2\) on the deviations \(\delta\vect{a} = \vect{a} - \bar{\vect{a}}\) from the best fit values:

(1)#\[ \delta\chi^2 \approx \delta\vect{a}^T\cdot\mat{C}^{-1}\cdot\delta\vect{a}\]

We should check this to be sure. Note that if we vary a single parameter \(a_i\) while holding the others fixed, then we should find:

\[\begin{gather*} \delta\chi^2 \approx \delta a_i^2 [\mat{C}^{-1}]_{ii}. \end{gather*}\]

This is easy to check numerically

a, C = curve_fit(
    f=f, xdata=t, ydata=ydata, p0=a_exact, sigma=sigmas, absolute_sigma=True
)
a_ = Params(*correlated_values(a, covariance_mat=C))

def get_chi2(a, i=0, dai=0):
    a_ = np.copy(a)
    a_[i] += dai
    return (((f(t, *a_) - ydata) / sigmas)**2).sum()

Cinv = np.linalg.inv(C)

fig, ax = plt.subplots()
chi2 = get_chi2(a=a)
for i in range(len(a)):
    sigma_i = 1/np.sqrt(Cinv[i,i])
    dais = np.linspace(-sigma_i, sigma_i, 30)
    chi2s = [get_chi2(a=a, i=i, dai=dai) - chi2 for dai in dais]
    l, = ax.plot(dais, chi2s, "--", label=f"$a_i$ = {Params._fields[i]}")
    ax.plot(dais, dais**2 / sigma_i**2, ".", c=l.get_c())
ax.set(xlabel="$a_i$", ylabel=r"$\delta\chi^2$")
ax.legend();
../_images/67a17dbbe1c8987b7bc1a2527d4c1faf473cc1bbaaa6522d13686fc96f384d69.png

Here we see that our interpretation of the diagonals of \(\mat{C}^{-1}\) correspond with the behavior of \(\chi^2\). This supports the following interpretation of the diagonal elements of \(\mat{C}^{-1}\) and \(\mat{C}\):

  1. The diagonal elements \(\sigma'_i = 1/\sqrt{[\mat{C}^{-1}]_{ii}}\) tell us how much we can vary the parameter \(a_i\) away from the best fit value before \(\chi^2\) changes by 1 while holding the other parameters fixed. I.e. \(a_i \in (\bar{a}_i \pm \sigma'_{i})\).

  2. The diagonal elements \(\sigma_i = \sqrt{\mat{C}_{ii}}\) tell us how much we can vary the parameter \(a_i\) away from the best fit value before \(\chi^2\) changes by 1 while re-minimizing over the other parameters fixed.

Thus, we can describe the distribution of parameters as ellipsoids of constant \(\delta\chi^2\):

\[\begin{gather*} \delta\chi^2 \approx \delta\vect{a}^T\cdot\mat{C}^{-1}\cdot\delta\vect{a}. \end{gather*}\]

To plot these in 2D, we can factor \(\mat{C} = \mat{L}\cdot\mat{L}^T\) using either a Cholesky decomposition or the same diagonalization as performed above in the PCA: \(\mat{C} = \mat{V}\cdot\mat{D}\cdot\mat{V}^T\), \(\mat{L} = \mat{V}\cdot\sqrt{\mat{D}}\). We then have:

\[\begin{gather*} \delta\chi^2 \approx \vect{x}^T\cdot\vect{x}, \qquad \delta\vect{a} = \mat{L}\cdot \vect{x}. \end{gather*}\]

Thus, the ellipsoids of constant \(\chi^2\) are spheres in the space \(\vect{x}\). We can now plot these pairwise in a “corner plot”:

a, C = curve_fit(
    f=f, xdata=t, ydata=ydata, p0=a_exact, sigma=sigmas, absolute_sigma=True
)
a_ = Params(*correlated_values(a, covariance_mat=C))

def corner_plot(a, C, labels=[r"\omega", r"c", r"A", r"\phi"], axs=None, fig=None):
    if axs is None:
        fig, axs = plt.subplots(len(a), len(a), sharex=True, sharey=True,
            gridspec_kw=dict(hspace=0, wspace=0),
            figsize=(10, 10))
    for i, ai in enumerate(a):
        for j, aj in enumerate(a):
            if i <= j:
                if fig is not None:
                    axs[i, j].set(visible=False)
                continue
            ax = axs[i, j]
            inds = np.array([[i, j]])
            C2 = C[inds.T, inds]
            sigma_i, sigma_j = np.sqrt([C[i,i], C[j,j]])
            dai = np.linspace(-3*sigma_i, 3*sigma_i, 100)
            daj = np.linspace(-3*sigma_j, 3*sigma_j, 102)
            das = np.meshgrid(dai, daj, indexing='ij', sparse=False)
            dchi2 = np.einsum('xij,yij,xy->ij', das, das, np.linalg.inv(C2))
            ax.contour(daj, dai, dchi2,
                       colors='C0', 
                       linestyles=['-', '--', '-', '-'],
                       levels=[1.0, 2.30, 2.71, 6.63]) 
            
            if j == 0:
                ax.set(ylabel=rf"${labels[i]}$")
            if i == len(a) - 1:
                ax.set(xlabel=rf"${labels[j]}$")
    return locals()
    
locals().update(corner_plot(a, C))
dchi2.max()
np.float64(18.352862708980922)
../_images/6dc5b50f80686d244ba7dd9a0ec312974c4b6ccc88b16b4eace22144afd187c7.png

Here we play with a multi-normal distribution.

import corner
from scipy.stats import chi2

rng = np.random.default_rng(seed=2)
L = rng.random((2, 2))
C = L @ L.T

X = rng.multivariate_normal(mean=[0, 0], cov=C, size=10000)
a0, a1 = X.T

dchi2 = np.einsum('ai,aj,ij->a', X, X, np.linalg.inv(C))

fig = plt.figure(figsize=(10,10))
fig = corner.corner(X, fig=fig);
axs = np.array([[fig.axes[0], fig.axes[1]],
                [fig.axes[2], fig.axes[3]]])
ax = axs[1,1]
corner_plot([0, 0], C, axs=axs, labels=['a0', 'a1']);

#ax.plot(chi2.ppf(0.683, df=1), 0, 1, color='y')
../_images/b127e25f2606c7675e9e003756d4dd34ba68d8a76d1107491f7ff0b289c246d5.png
chi2s = np.linspace(0, 10, 100)
plt.hist(dchi2, bins=100, density=True);
plt.plot(chi2s, chi2.pdf(chi2s, df=2))
plt.vlines(chi2.ppf([0.683, 0.90, 0.9545, 0.99, 0.9973, 0.9999], df=2), 0, 1, color='y')
<matplotlib.collections.LineCollection at 0x7562f6ec8f50>
../_images/56ad27f4da3c9b4b48f9eefc58f1f9032f1d6f4ec32a9ed0f6ffd9bd02f8d5b5.png
chi2.ppf(0.683, df=2)   # Value of chi^2 with ellipse containing 68.3% of the data
np.float64(2.2977070102097144)
a
array([1.25423761, 1.20066782, 3.41499784, 5.61299667])

Confidence Region#

To describe the

If our errors are indeed gaussian and the model is a good fit, then we can consider the parameters \(\vect{a}\) to be distributed with a multivariate normal distribution about their mean \(\bar{\vect{a}}\):

\[\begin{gather*} P(\delta\vect{a}) = \frac{1}{\sqrt{(2\pi)^{N} \det \mat{\Sigma}}}\exp\left( -\frac{1}{2}\delta\vect{a}^T\cdot\mat{\Sigma}^{-1}\delta\vect{a}\right), \\ \mat{\Sigma} = \mat{C}, \qquad \delta\vect{a} = \vect{a} - \bar{\vect{a}}. \end{gather*}\]

Consider \(\chi^2(\vect{a})\) as a function of the parameters \(\vect{a} = \bar{\vect{a}} + \delta\vect{a}\) where \(\bar{\vect{a}}\) are the exact parameter values for the model with experimental errors \(e_n\) of a known distribution:

\[\begin{gather*} y_n = f(x_n, \bar{\vect{a}}) + e_n. \end{gather*}\]

If the errors are sufficiently small we can expand the model about the exact parameters in powers of \(\delta\vect{a} = \vect{a} - \bar{\vect{a}}\):

\[\begin{gather*} f(x_n, \vect{a}) = f(x_n, \bar{\vect{a}}) + \vect{J}_n^T\cdot\delta\vect{a} + \frac{1}{2}\delta\vect{a}^T\cdot \mat{H}_n\cdot\delta\vect{a} + \order(\delta\vect{a}^2), \\ [\vect{J}_n]_{i} = \pdiff{f(x_n, \bar{\vect{a}})}{\bar{a}_i}, \qquad [\mat{H}_n]_{ij} = \frac{\partial^2 f(x_n, \bar{\vect{a}})}{\partial\bar{a}_i\partial\bar{a}_j}. \end{gather*}\]

Confidence Levels

If the experimental errors \(e_n\) are independent and normally distributed, and your model is sufficiently linear, then you can use the formulae discussed in section 15.6.5 of [Press et al., 2007] to determine the confidence limits of your parameter estimates and the corresponding covariance matrix \(\mat{C}\). The procedure in this case is:

  1. Let \(\nu\) be the number of fitted parameters whose

If we treat the errors \(e_n\) as independent random variables with distribution \(P_n(e)\), then we can consider the \(\chi^2\) as a random variable:

\[\begin{align*} \chi^2(\vect{a}) &= \sum_n\left(\frac{f(x_n, \vect{a}) - y_n}{\sigma_n}\right)^2\\ &= \delta\vect{a}^T\left(\sum_n\frac{\vect{J}^T_n\vect{J}_n - \mat{H}_n e_n}{\sigma_n^2}\right)\delta\vect{a} + \\ &\qquad - 2\sum_n\frac{e_n\vect{J}_n^T}{\sigma_n^2}\cdot\delta\vect{a} + \sum_n\frac{e_n^2}{\sigma_n^2} + \order(\delta\vect{a}^3). \end{align*}\]

Consider taking many measurements and averaging. Each measurement will give data \(\{(x_n, y_n)\}\) with a different realization of the errors \(e_n\) sampled from whatever distribution is relevant for the experiment. The resulting \(\chi^2\) will thus have the following distribution The

\[\begin{gather*} \begin{aligned} \braket{\chi^2(\vect{a})} &= \left\langle\sum_n\left(\frac{f(x_n, \vect{a}) - y_n}{\sigma_n}\right)^2\right\rangle\\ &= \delta\vect{a}^T\left(\sum_n\frac{\vect{J}^T_n\vect{J}_n}{\sigma_n^2}\right)\delta\vect{a} + \left(2\sum_n\frac{\braket{e_n}\vect{J}_n}{\sigma_n^2}\right)\cdot\delta\vect{a} + \sum_n\frac{\braket{e_n^2}}{\sigma_n^2}, \end{aligned}\\ y_n = f(x_n, \bar{\vect{a}}) + e_n, \\ f(x_n, \vect{a}) = f(x_n, \bar{\vect{a}}) + \mat{J}_n\cdot\delta\vect{a} + \order(\delta\vect{a}^2),\\ \end{gather*}\]

for some

In the previous section, we computed the best fit parameters \(\bar{\vect{a}}\), and estimated the covariance matrix \(\mat{C}\).

From the least_squares and minimize models, we see that \(\chi^2_r\) depends on the parameters as:

\[\begin{gather*} \chi^2_r(\vect{a}) = \chi^2_r(\bar{\vect{a}}) + \frac{1}{2} \delta\vect{a}^T \cdot \overbrace{\frac{\mat{C^{-1}}}{\nu/2}}^{\mat{H}} \cdot \delta\vect{a} + \order(\delta\vect{a}^2), \qquad \delta\vect{a} = \vect{a} - \bar{\vect{a}}. \end{gather*}\]
\[\begin{gather*} \ln P(\vect{a}) \approx \end{gather*}\]

Extensions#

One can extend this discussion in several ways. For example:

  1. What happens if the errors depend on the parameters?

    \[\begin{gather*} y_n = f(x_n; \vect{a}) + e_n(x_n; \vect{a}) \end{gather*}\]
  2. What about if the model gives a relationship between \(x_n\) and \(y_n\), but that each has its own errors?

    \[\begin{gather*} y_n - e^{(y)}_n = f(x_n - e^{(x)}_n; \vect{a}). \end{gather*}\]

    This is discussed in section 15.3 of [Press et al., 2007].

References#