Model Fitting E.g. 1: Cosine

Model Fitting E.g. 1: Cosine#

Here we go through the step-by-step procedure for fitting data to the following model:

\[\begin{gather*} y_n = f(t_n, \vect{a}) + e_n, \qquad f(t, \vect{a}) = A\cos(\omega t + \phi) + c, \qquad \vect{a} = \begin{pmatrix} \omega\\ c\\ A\\ \phi \end{pmatrix}. \end{gather*}\]

We shall assume that the errors are independently distributed, considering both the case of gaussian errors, and non-gaussian:

\[\begin{gather*} p_n(e_n) = \frac{p_e(e_n/\sigma_n)}{\sigma_n}, \qquad p_e(x) = \overbrace{\frac{e^{-x^2/2}}{\sqrt{2\pi}}}^{\text{gaussian}}. \end{gather*}\]

We start with the gaussian case, and assume that the errors are sufficiently small that the standard analysis techniques based on \(\chi^2(\vect{a})\) apply as discussed in chapter 15 of [Press et al., 2007]. We then test these using Monte Carlo (MC) and then use an MCMC approach to solve the general case.

Linear Gaussian Approximation#

We start with an “exact” solution, then generate some data to analyze at \(N_t\) equally spaced points for \(t \in [0, t_\max]\).

from IPython.display import Latex
from collections import namedtuple
from uncertainties import correlated_values, unumpy as unp
from scipy.optimize import least_squares
from scipy.stats import chi2

Nt = 12
#Nt = 100
t_max = 10.0

# Exact parameter values
Params = namedtuple("Params", ["w", "c", "A", "phi"])
a_exact = Params(w=2 * np.pi / 5, c=2.1, A=3.4, phi=5.6)
w, c, A, phi = a_exact


def f(t, w, c, A, phi, np=np):
    """Model function."""
    return c + A * np.cos(w * t + phi)


# Exact data and experimental errors
t = np.linspace(0, t_max, Nt)
y = f(t, *a_exact)
sigmas = 0.5 * np.ones(Nt)


# Randomly generated data... An "Experiment"
rng = np.random.default_rng(seed=2)
ydata = rng.normal(loc=y, scale=sigmas)

ts = np.linspace(0, t_max)  # Many points for a smooth curve.
fig, ax = plt.subplots(figsize=(10, 5))
ax.errorbar(t, ydata, yerr=sigmas, fmt="C0.", ecolor="C1", label="data")
ax.plot(ts, f(ts, *a_exact), "-C2", label="exact")
ax.set(xlabel="$t$", ylabel="$f(t)$")
ax.legend();

../_images/1c2b9817b0ce79c680e7a032cc151fb0e3d3969d0b81478b649ff2366b5c1fbc.png

Chi Square Fit#

Now we do the fit. We use scipy.optimize.least_squares() which requires a function that returns the list of weighted residuals:

\[\begin{gather*} r_n = \frac{y_n - f(x_n, \vect{a})}{\sigma_n}. \end{gather*}\]

This minimizes the following cost function, for which the Hessian is the inverse covariance matrix:

\[\begin{gather*} \frac{\chi^2}{2} = \frac{1}{2}\sum_{n} r_n^2, \qquad \mat{C}^{-1} = \mat{J}^T\cdot \mat{J}. \end{gather*}\]

def get_residuals(p, xdata=t, ydata=ydata, sigmas=sigmas):
    """Return residuals for least_squares."""
    return (ydata - f(xdata, *p))/sigmas
    

# By using simple numpy functions, we can compute the jacobian
# with the complex-step method.
kw = dict(jac='cs')

res = least_squares(fun=get_residuals, x0=a_exact, **kw)
p = Params(*res.x)
C = np.linalg.inv(res.jac.T @ res.jac)
r = get_residuals(p)
nu = len(r) - len(p)
chi2_r = np.sum(abs(r)**2) / nu
Q = 1 - chi2.cdf(chi2_r*nu, df=nu)

Latex(rf"$\chi^2_r = {chi2_r:.2g}, \qquad Q = {Q:.2g}$")

\[\chi^2_r = 1.2, \qquad Q = 0.28\]

Here we have used the Chi-Squared Distribution \(P_{\nu,\chi^2}(\chi^2)\) to calculate the chi-square probability \(Q\), sometimes called the tail distribution:

\[\begin{gather*} Q &= \int_{\chi^2}^{\infty} P_{\nu, \chi^2}(\chi^2)\d{\chi^2} = 1 - \underbrace{\int_{0}^{\chi^2} P_{\nu, \chi^2}(\chi^2)\d{\chi^2}}_{\text{CDF}} &= \int_{\chi^2_r}^{\infty} P_{\nu, \chi^2_r}(\chi^2_r)\d{\chi^2_r}. \end{gather*}\]

This is the probability that \(\chi^2\) exceeds the value found, and is the complement of the cumulative distribution function (CDF). This is sometimes called a one-sided \(p\)-value and is used as a test statistic to assess whether or not the model is reasonable. As discussed in section 15.1 of [Press et al., 2007], values of \(Q> 0.001\) are not too bad:

Truly wrong models have values will often be rejected with vastly smaller values of \(Q\), \(10^{-18}\), say.

Here we plot the distribution of the reduced chi squared for our dataset with \(\nu = N - M\) degrees of freedom:

\[\begin{gather*} \chi^2_r = \frac{\chi^2}{\nu},\qquad P_{\nu,\chi^2_r}(\chi^2_r) = \nu P_{\nu, \chi^2}(\nu\chi^2_r). \end{gather*}\]

../_images/becdee0a0b3707e1ef3d9bf557f22e3b682b2cee5e4f84ab46e5902852d22074.png

Parameter Estimates and Uncertainties#

The results of the fit are characterized by the best-fit parameter values \(\bar{\vect{a}}\) and the covariance matrix \(\mat{C}\) such that, to lowest order, \(\chi^2(\vect{a})\) is quadratic:

\[\begin{gather*} \overbrace{\Delta \chi^2(\vect{a})}^{\chi^2(\vect{a})- \chi^2(\bar{\vect{a}})} \approx \delta\vect{a}^T \mat{C}^{-1} \overbrace{\delta\vect{a}}^{\vect{a}-\bar{\vect{a}}}. \end{gather*}\]

The confidence region corresponding to confidence level \(p\) is given by the region within the contour

\[\begin{gather*} \Delta \chi^2(\vect{a}) \approx \delta\vect{a}^T \mat{C}^{-1} \delta\vect{a} < \chi^2_{p} \end{gather*}\]

where the contour \(\chi^2_{p}\) is carefully chosen so that this region contains fraction \(p\) of the samples. This can bed computed from the inverse of the cumulative distribution function (CDF) for the chi squared distribution with \(\nu\) degrees of freedom:

\[\begin{gather*} \int_{\rlap{P(\vect{a}) < P_p}}\d^{\nu}\vect{a}\; P(\vect{a}) = \int_0^{\chi^2_p}\d{\chi^2}P_{\nu, \chi^2}(\chi^2) = p(\chi^2_p) \end{gather*}\]

if:

the fit is good,
the posterior distribution is well approximated by a multivariate normal distribution (i.e. if the experimental errors are gaussian and the model is linear in the region of these errors), and
you use the correct value of \(\nu\) corresponding to the number of free parameters you wish to consider. If you want to consider a marginal distribution of \(\tilde{\nu} < \nu\) parameters, then just extract the corresponding \(\tilde{nu}\) rows and columns from \(\mat{C}\) (not \(\mat{C}^{-1}\)).

In our case, the errors are gaussian as we saw above, so we can proceed with the simplified approach. What we do here is use the uncertainties package to perform a forward error propagation of the parameter covariance matrix \(\mat{C}\) through to the actual function values for the model \(y = f(x, \vect{a})\). We then treat each \(y\) as a single parameter (\(\tilde{\nu}=1\)) and use the \(n\sigma\) values where \(\sigma\) is the standard deviation computed by the uncertainties package to produce a \(1\sigma\) and \(2\sigma\) uncertainty band corresponding to our parameter values:

# Here we use the uncertainties package to do linear
# error propagation of are parameter covariances
p_ = Params(*correlated_values(p, C))
pr_ = Params(*correlated_values(p, C*chi2_r))


def plot_band(t, y_, sigma=1, ax=None, **kw):
    """Plot a band using correlated errors in y_."""
    
    ### To Do: These bands are valid in the Gaussian case, but should
    ### be properly scaled using the proper confidence analysis.
    if ax is None:
        ax = plt.gca()
    y = unp.nominal_values(y_)
    dy = unp.std_devs(y_)
    ax.fill_between(t, y-dy*sigma, y+dy*sigma, **kw)
    return ax
    

ts = np.linspace(0, t_max)  # Many points for a smooth curve.
fig, axs = plt.subplots(2, 1, figsize=(10, 10))
for _p_, ax in zip((p_, pr_), axs):
    ax.errorbar(t, ydata, yerr=sigmas, fmt="C0.", ecolor="C1", label="data")
    ax.plot(ts, f(ts, *a_exact), "-C2", label="exact")
    ax.plot(ts, f(ts, *p), "--C3", label="best fit")
    plot_band(t=ts, y_=f(ts, *_p_, np=unp), ax=ax, fc="C3", alpha=0.5, 
              sigma=1, label=r"$1\sigma$ band")
    plot_band(t=ts, y_=f(ts, *_p_, np=unp), ax=ax, fc="C3", alpha=0.2, 
              sigma=2, label=r"$2\sigma$ band")
    ax.set(xlabel="$t$", ylabel="$f(t)$")
    ax.legend();
axs[0].set(title=fr"$\chi^2_r={chi2_r:.2f}$, $Q={Q:.2g}$");
axs[1].set(title=r"Rescaled covariance so $\chi^2_r=1$");

../_images/5d98231c7899ba8d85a7fa528b89ee4832f768efd4783b0a22fa976e09d7516a.png

In the lower plot, we have rescaled the covariance matrix \(\mat{C} \rightarrow \chi^2_r \mat{C}\) as if we did not trust our error estimates, and interpreted the large \(\chi^2_r\) value as a sign that we underestimated the errors.

We can now look at some of the marginal distributions. One can use a set of corner plots to summarize this information, but we first do this explicitly.

Single Parameter Distributions#

We start with the individual parameter marginal distributions. In this case \(\tilde{\nu} = 1\). Since our posterior is well approximated by a gaussian, we can simply look at the standard deviations given as the square root of the corresponding diagonal entry of the covariance matrix: \(\sigma_i = \sqrt{C_{ii}}\). This is nicely summarized by printing the parameters using the uncertainties package as we show below:

from scipy.stats import norm

fig, ax = plt.subplots()
_x = np.linspace(0, 7, 200)
for _name, _p, _Cii in zip(Params._fields, p_, np.diag(C)):
    print(f"{_name}: {_p:.2uS}, (sqrt(C_ii) = {np.sqrt(_Cii):.2g})")
    ax.plot(_x, norm.pdf(_x, loc=_p.n, scale=_p.s), label=_name)
ax.legend();

w: 1.237(21), (sqrt(C_ii) = 0.021)
c: 2.14(15), (sqrt(C_ii) = 0.15)
A: 3.47(20), (sqrt(C_ii) = 0.2)
phi: 5.79(12), (sqrt(C_ii) = 0.12)

../_images/9e26c0a3421d9f26069b24dcc34b6de76d0615af43927e375e2c9e7a989a3ceb.png

Pairwise Distributions#

Supposed we are most interested in the frequency \(\omega\) and phase \(\phi\). We can consider the marginal distribution of these two parameters by extracting the appropriate columns and rows. Note that we must now choose our contours using \(\tilde{\nu} = 2\). We display these as contours of \(\chi^2\):

from scipy.stats import norm, chi2

i, j = map(Params._fields.index, ['w', 'phi'])
w_, phi_ = p_.w, p_.phi

# Extract rows and colums.  Note: C[[i, j], [i, j]] does not work.
Cij = C[[i, j], :][:, [i, j]]

# We can also do this with the uncertainties package
from uncertainties import covariance_matrix
assert np.allclose(Cij, covariance_matrix([w_, phi_]))

_sigma = 5
ws, phis = np.meshgrid(
  np.linspace(w_.n-_sigma*w_.s, w_.n+_sigma*w_.s, 200),
  np.linspace(phi_.n-_sigma*phi_.s, phi_.n+_sigma*phi_.s, 200))

# Deviation matrix to calculate chi2
dws_phis = np.array([ws-w_.n, phis-phi_.n])

# This is just the matrix product, but we want to do this
# over a bunch of parameter values, so we use einsum.
dchi2 = np.einsum('ij,ixy,jxy->xy', 
                   np.linalg.inv(Cij), dws_phis, dws_phis)

# Use a normal distribution to compute the confidence levels
sigmas_ = np.array([1, 2, 3, 4])
ps = norm.cdf(sigmas_) - norm.cdf(-sigmas_)

# Now use chi2 distribution to get the corresponding contours
levels = chi2.ppf(ps, df=2)
fig, ax = plt.subplots(figsize=(10, 6))
_cs = ax.contour(ws, phis, dchi2, levels=levels, cmap="winter")
fmt = dict([(_l, fr"${_n}\sigma$")
            for _l, _n, _p in zip(levels, sigmas_, ps)])

# Draw single-parameter confidence limits to show that
# they are smaller.
for _n, _sigma in enumerate(sigmas_):
    kw = dict(c=_cs.collections[_n].get_edgecolor(), 
              alpha=0.5, zorder=-100)
    for _s in [1, -1]:
        ax.axhline(phi_.n + _s*_sigma*phi_.s, ls=":", **kw)
        ax.axvline(w_.n + _s*_sigma*w_.s, ls="--", **kw)
ax.clabel(_cs, _cs.levels, inline=True, fmt=fmt, fontsize=10)
ax.set(xlabel="$\omega$", ylabel="$\phi$");

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[7], line 40
     37 # Draw single-parameter confidence limits to show that
     38 # they are smaller.
     39 for _n, _sigma in enumerate(sigmas_):
---> 40     kw = dict(c=_cs.collections[_n].get_edgecolor(), 
     41               alpha=0.5, zorder=-100)
     42     for _s in [1, -1]:
     43         ax.axhline(phi_.n + _s*_sigma*phi_.s, ls=":", **kw)

AttributeError: 'QuadContourSet' object has no attribute 'collections'

../_images/1f490526b687960d02572a4272d3c88f2367f8f9d6ec8f86ca69bc94f3d1a30d.png

We show the \(1\sigma\), \(2\sigma\), \(3\sigma\), and \(4\sigma\) \(\tilde{\nu} = 2\)-parameter confidence regions demonstrating the correlations between \(\omega\) and \(\phi\). For comparison, we show the corresponding 1-parameter regions as dotted lines. E.g. for the \(1\sigma\) confidence regions, the same amount of posterior probability (68.27%) lies within the center blue ellipse as lies between the central vertical blue dashed lines, or between the central horizontal blue dotted lines.

In other words, the single-parameter confidence regions include points outside of the 2-parameter ellipse, hence must be smaller as shown to keep the same probability.

Here is the full corner plot using phys_581.plotting.corner_plot():

from phys_581.plotting import corner_plot

fig, axs = corner_plot(
    p_, labels=[r"$\omega$", r"$c$", r"$A$", r"$\phi$"]);

../_images/ca91fc3d1c415ba2f14e5198ee15a7e178926c4248d4c6da61a4b50e36484840.png

Change of Variables#

To compute a change of variables, we need to keep track of the Jacobian of the transformation so we can properly transform the covariance matrix. The uncertainties package does this automatically, computing the derivatives with automatic differentiation. This is only valid if the errors are small enough that the model is approximately linear, but is very convenient. Here we replace \(\phi\) with the location of the peak \(t_0\):

\[\begin{gather*} \omega t + \phi \equiv \omega(t - t_0) \mod 2\pi, \qquad t_0 = -\phi/\omega. \end{gather*}\]

# make sure t0_ is positive and small by making phi0_ between -2pi and 0
phi0_ = phi_ % (2*np.pi) - 2*np.pi
t0_ = - phi0_ / w_

Cij = covariance_matrix([w_, t0_])

_sigma = 5
ws, t0s = np.meshgrid(
  np.linspace(w_.n-_sigma*w_.s, w_.n+_sigma*w_.s, 200),
  np.linspace(t0_.n-_sigma*t0_.s, t0_.n+_sigma*t0_.s, 200))

# Deviation matrix to calculate chi2
dws_t0s = np.array([ws-w_.n, t0s-t0_.n])
dchi2 = np.einsum('ij,ixy,jxy->xy', 
                   np.linalg.inv(Cij), dws_t0s, dws_t0s)

# Now use chi2 distribution to get the corresponding contours
fig, ax = plt.subplots(figsize=(10, 6))
_cs = ax.contour(ws, t0s, dchi2, levels=levels, cmap="winter")
fmt = dict([(_l, fr"${_n}\sigma$")
            for _l, _n, _p in zip(levels, sigmas_, ps)])

# Draw single-parameter confidence limits to show that
# they are smaller.
for _n, _sigma in enumerate(sigmas_):
    kw = dict(c=_cs.collections[_n].get_edgecolor(), 
              alpha=0.5, zorder=-100)
    for _s in [1, -1]:
        ax.axhline(t0_.n + _s*_sigma*t0_.s, ls=":", **kw)
        ax.axvline(w_.n + _s*_sigma*w_.s, ls="--", **kw)
ax.clabel(_cs, _cs.levels, inline=True, fmt=fmt, fontsize=10)
ax.set(xlabel="$\omega$", ylabel="$t_0$");

/tmp/ipykernel_4519/2791515735.py:2: FutureWarning: AffineScalarFunc.__mod__() is deprecated. It will be removed in a future release.
  phi0_ = phi_ % (2*np.pi) - 2*np.pi

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[9], line 26
     23 # Draw single-parameter confidence limits to show that
     24 # they are smaller.
     25 for _n, _sigma in enumerate(sigmas_):
---> 26     kw = dict(c=_cs.collections[_n].get_edgecolor(), 
     27               alpha=0.5, zorder=-100)
     28     for _s in [1, -1]:
     29         ax.axhline(t0_.n + _s*_sigma*t0_.s, ls=":", **kw)

AttributeError: 'QuadContourSet' object has no attribute 'collections'

../_images/5498269cc0d28cab9decf0de66bf30a967780987197f7f4b27884e4404ce8ff7.png

Non-Gaussian Errors#

We now repeat the analysis, but with non-gaussian errors. Instead, we use the Gumbel distribution:

\[\begin{gather*} P_n(y_n) = \frac{1}{\sigma_n} e^{-z-e^{-z}}, \qquad z = \frac{y_n - f(x_n, \vect{a})}{\sigma_n}. \end{gather*}\]

If we consider the normalized errors \(\tilde{e}_n\), they are distributed as:

\[\begin{gather*} \tilde{e}_n = \frac{y_n - f(x_n, \bar{\vect{a}})}{\sigma_n}, \qquad P_n(\tilde{e}_n) = e^{-\tilde{e}_n-e^{-\tilde{e}_n}}. \end{gather*}\]

The log-likelihood will be a sum of

We can use this to generate the various simulated \(P_{\nu}(\chi^2)\) distributions for Gumbel distributed errors.

en = np.linspace(-3, 10, 100)
plt.plot(en, en + np.exp(-en))

[<matplotlib.lines.Line2D at 0x70568b342ad0>]

../_images/f4b8ee2a0c6258784d1b9ecf93b60683b6bc2d4c16d78ac6793d88441f3786d1.png

from scipy.interpolate import InterpolatedUnivariateSpline, UnivariateSpline

def get_cdf_ppf(samples=None):
    """Return interpolated cdf and ppf functions.

    Results
    -------
    cdf : function
        Cumulative distribution function `q=cdf(chi2_r)`.
    ppf : function
        Percent point function (inverse of `cdf`).
    """
    Ns = len(samples)
    ps = np.linspace(0, 1, Ns)
    x = sorted(samples)
    cdf = InterpolatedUnivariateSpline(x, ps, k=1, ext='const')
    ppf = InterpolatedUnivariateSpline(ps, x, k=1, ext='const')
    return cdf, ppf

def get_chi2_cdf_ppf(nu=1, random=rng.normal, Ns=10000):
    """Return `(cdf, pdf, samples)` for the chi2 distribution.
    
    Arguments
    ---------
    nu : int
        Degrees of freedom.
    random : function
        Random number generator for errors.
    Ns : int
        Number of samples.
    """
    en = random(size=(Ns, nu))
    chi2 = (en**2).sum(axis=-1)
    cdf, ppf = get_cdf_ppf(chi2)
    return cdf, ppf, chi2

../_images/32af36f5bb815181646f44b5341183ad7560495f850f334f52bf27ebf53cd6d7.png

On the left we show the CDF and PDF histogram for \(\chi^2_r\) generated from a sample of random numbers for a standard normal distribution, comparing with the analytic forms (dotted lines). On the right we use the same method to compute the CDF for the standard Gumbel distribution.

We now generate experimental data and proceed with an analysis:

# Randomly generated data...
# An "Experiment" with non-gaussian errors.
rng = np.random.default_rng(seed=2)
ydata = rng.gumbel(loc=y, scale=sigmas)

fig, ax = plt.subplots(figsize=(10, 5))
ax.errorbar(t, ydata, yerr=sigmas, fmt="C0.", ecolor="C1", label="data")
ax.plot(ts, f(ts, *a_exact), "-C2", label="exact")
ax.set(xlabel="$t$", ylabel="$f(t)$")
ax.legend();

../_images/a8486202929f20dcb7ae4b9176e696414b47e35f9cd68b6d7dd9cf60916fef4a.png

Chi Square Fit#

res = least_squares(fun=partial(get_residuals, ydata=ydata),
                    x0=a_exact, jac='cs')
p = Params(*res.x)
C = np.linalg.inv(res.jac.T @ res.jac)
r = get_residuals(p)
nu = len(r) - len(p)
chi2_r = np.sum(abs(r)**2) / nu
Q_wrong = 1 - sp.stats.chi2.cdf(chi2_r*nu, df=nu)

cdf_gumbel, pdf_gumbel, chi2s_gumbel = get_chi2_cdf_ppf(nu=nu, random=rng.gumbel)
Q = 1 - cdf_gumbel(chi2_r*nu)

Latex(r", \qquad ".join([
    rf"$\chi^2_r = {chi2_r:.2g}",
    rf"Q = {Q:.2g}",
    rf"Q_{{\mathrm{{wrong}}}} = {Q_wrong:.2g}$"]))

\[\chi^2_r = 3.4, \qquad Q = 0.13, \qquad Q_{\mathrm{wrong}} = 0.00064\]

With the non-gaussian errors, estimating the \(Q\) value with the chi squared distribution gives a very wrong interpretation about the quality of the fit. Instead, we must use the corresponding CDF for our Gumbel-distributed errors. To obtain the confidence region with confidence level \(p\), we must find the value of \(\chi^2_p\) where:

\[\begin{gather*} \int_0^{\chi^2_p}P(\chi^2)\d{\chi^2} = p. \end{gather*}\]

This is just the inverse CDF. We can check our method (slowly) by actually fitting the data repeatedly to generate the distribution of \(\chi^2\):

../_images/499e0c0cca6266333383a9d1e5477c41ef21b1d8261535fe357693ab3dfa1ec6.png

fig, ax = plt.subplots()
random = rng.gumbel
for Ns in [500, 1000]:
    chi2s_data = np.array([nu*get_chi2_r(random=random) for n in range(Ns)])
    cdf_data, ppf_data = get_cdf_ppf(chi2s_data)
    cdf_gumbel, pdf_gumbel, chi2s_gumbel = get_chi2_cdf_ppf(nu=nu, random=random)
    ax.plot(_chi2_r, cdf_data(nu*_chi2_r), '-', label=f"data {Ns}")
ax.plot(_chi2_r, cdf_gumbel(nu*_chi2_r), ':', label="gumbel")
ax.legend()

<matplotlib.legend.Legend at 0x70568aed0690>

../_images/7cb815896a4d92129d25fb4273f16b32c9b71244c9cb1c6072ba8e6d97b9e2bf.png

../_images/0be479a61c9a051733220afa7243ba5db0f8c8b10a02fc61d429b7bfb02582f3.png

References#

Samples, samples, everywhere…: A nice overview of different MCMC software accessible with python.

Model Fitting E.g. 1: Cosine

Contents

Model Fitting E.g. 1: Cosine#

Linear Gaussian Approximation#

Chi Square Fit#

Parameter Estimates and Uncertainties#

Single Parameter Distributions#

Pairwise Distributions#

Change of Variables#

Non-Gaussian Errors#

Chi Square Fit#

References#