--- execution: timeout: 300 jupytext: notebook_metadata_filter: all text_representation: extension: .md format_name: myst format_version: 0.13 jupytext_version: 1.13.1 kernelspec: display_name: Python 3 (phys-581-2021) language: python name: phys-581-2021 language_info: codemirror_mode: name: ipython version: 3 file_extension: .py mimetype: text/x-python name: python nbconvert_exporter: python pygments_lexer: ipython3 version: 3.9.7 --- ```{code-cell} ipython3 :tags: [hide-cell] import mmf_setup mmf_setup.nbinit() import logging logging.getLogger("matplotlib").setLevel(logging.CRITICAL) %matplotlib inline import numpy as np, matplotlib.pyplot as plt ``` # Model Fitting E.g. 1b: MCMC We continue our example of fitting a cosine, but now in the case where we have large errors, so that the model is certainly not linear. \begin{gather*} y_n = f(t_n, \vect{a}) + e_n, \qquad f(t, \vect{a}) = A\cos(\omega t + \phi) + c, \qquad \vect{a} = \begin{pmatrix} \omega\\ c\\ A\\ \phi \end{pmatrix}. \end{gather*} In this case, the posterior will not be gaussian, and to properly characterize it, we will need to use [Markov chain Monte Carlo] (MCMC). We will use the [emcee] package, but many others exists. See the following for a summary: * [Samples, samples, everywhere...](http://mattpitkin.github.io/samplers-demo/pages/samplers-samplers-everywhere/): A nice overview of different MCMC software accessible with python. Recall that, from the Bayesian perspective, our goal is to compute the posterior probability distribution \begin{gather*} p(\vect{a}|\vect{y}) = \frac{\mathcal{L}(\vect{a}|\vect{y})p(\vect{a})}{p(\vect{y})}, \qquad \mathcal{L}(\vect{a}|\vect{y}) = p(\vect{y}|\vect{a}) = p_e\bigr(\vect{y} - f(\vect{x}, \vect{a})\bigr), \end{gather*} where $p(\vect{a})$ is the **prior**, $\mathcal{L}(\vect{a}|\vect{y}) = p(\vect{y}|\vect{a})$ is the **likelihood**, and $p(\vect{a}|\vect{y})$ is our desired **posterior**. We will still use independent errors: \begin{gather*} p_n(e_n) = \frac{p_e(e_n/\sigma_n)}{\sigma_n} = \frac{N(e_n/\sigma_n)}{\sigma_n} \end{gather*} but will not assume these to always be gaussian. All that [emcee] needs is a function `log_prob(p)` which returns the log of the posterior: \begin{gather*} \overbrace{\ln p(\vect{a}|\vect{y})}^{\mathtt{log\_prob}} = \overbrace{\ln p(\vect{a})}^{\mathtt{log\_prior}} + \overbrace{\ln \mathcal{L}(\vect{a}|\vect{y})}^{\mathtt{log\_liklihood}} - \ln p(\vect{y}). \end{gather*} The last normalization constant need not be included. The MCMC method will then generate a sample which approximates the correct distribution. For our curve-fitting model: \begin{align*} p_e(\vect{e}) &= \prod_{n=1}^{N} \frac{1}{\sigma_n}N\left(\frac{e_n}{\sigma_n}\right),\\ \ln p(\vect{a}|\vect{y}) &= \ln p(\vect{a}) + \ln p_e\bigl(\vect{y} - f(\vect{x}, \vect{a})\bigr) + \text{const}\\ &= \ln p(\vect{a}) + \sum_{n=1}^{N} \overbrace{\ln N\left(\frac{\bigl(y_n - f(x_n, \vect{a})\bigr)}{\sigma_n}\right)} ^{-\rho(e_n/\sigma_n)} - \sum_{n=1}^{N}\ln\sigma_n + \text{const}. \end{align*} For gaussian errors, $\rho(x) = -\ln N(x) = x^2/\sqrt{2\pi}$. ```{code-cell} ipython3 from IPython.display import Latex from collections import namedtuple from uncertainties import correlated_values, unumpy as unp from scipy.optimize import least_squares from scipy.stats import chi2 import emcee Nt = 12 #Nt = 100 t_max = 10.0 # Exact parameter values Params = namedtuple("Params", ["w", "c", "A", "phi"]) a_exact = Params(w=2 * np.pi / 5, c=2.1, A=3.4, phi=5.6) w, c, A, phi = a_exact labels = Params(*map("${}$".format, [r"\omega", "c", "A", r"\phi"])) def f(t, w, c, A, phi, np=np): """Model function.""" return c + A * np.cos(w * t + phi) # Exact data and experimental errors t = np.linspace(0, t_max, Nt) y = f(t, *a_exact) sigmas = 0.5 * np.ones(Nt) # Randomly generated data... An "Experiment" rng = np.random.default_rng(seed=2) ydata = rng.normal(loc=y, scale=sigmas) def rho(e): return e**2/np.sqrt(2*np.pi) def log_liklihood(a, xdata, ydata, sigmas): es = (ydata - f(xdata, *a))/sigmas return - rho(es).sum() - np.log(sigmas).sum() def log_prior(a): """Uniform prior.""" return 1 def log_prob(a, *v, **kw): return log_prior(a) + log_liklihood(a, *v, **kw) def get_residuals(p, xdata=t, ydata=ydata, sigmas=sigmas): """Return residuals for least_squares.""" return (ydata - f(xdata, *p))/sigmas args = (t, ydata, sigmas) ts = np.linspace(0, t_max) # Many points for a smooth curve. fig, ax = plt.subplots(figsize=(10, 5)) ax.errorbar(t, ydata, yerr=sigmas, fmt="C0.", ecolor="C1", label="data") ax.plot(ts, f(ts, *a_exact), "-C2", label="exact") ax.set(xlabel="$t$", ylabel="$f(t)$") ax.legend(); ``` Now we do the fits. We start with the standard least-square fit, then use that as a seed for the [emcee] code. ```{code-cell} ipython3 res = least_squares(fun=get_residuals, x0=a_exact, jac='cs', args=args) p = Params(*res.x) C = np.linalg.inv(res.jac.T @ res.jac) r = get_residuals(p) nu = len(r) - len(p) chi2_r = np.sum(abs(r)**2) / nu Q = 1 - chi2.cdf(chi2_r*nu, df=nu) Latex(rf"$\chi^2_r = {chi2_r:.2g}, \qquad Q = {Q:.2g}$") nwalkers = 32 sampler = emcee.EnsembleSampler(nwalkers=nwalkers, ndim=len(p), log_prob_fn=log_prob, args=args) # Generate initial set of walkers using a gaussian with the least_sqares # estimates for the mean (best fit values) and covariance matrix p0 = rng.multivariate_normal(mean=p, cov=C, size=nwalkers) state = sampler.run_mcmc(p0, 100) # Burn-in period of 100 steps sampler.reset() %time state = sampler.run_mcmc(p0, 10000) # Actual run with 10000 walkers ``` ```{code-cell} ipython3 import corner %load_ext autoreload %autoreload import phys_581_2021.plotting from phys_581_2021.plotting import corner_plot flat_samples = sampler.get_chain(discard=100, thin=15, flat=True) fig = corner.corner(flat_samples, labels=labels) axes = np.array(fig.axes).reshape((4,4)) corner_plot(p, C, axes=axes); ``` Now we repeat the exercize, but with much larger errors - well beyond the regime where the model is linear. ```{code-cell} ipython3 sigmas = 1.5 * np.ones(Nt) # Randomly generated data... An "Experiment" rng = np.random.default_rng(seed=2) ydata = rng.normal(loc=y, scale=sigmas) args = (t, ydata, sigmas) res = least_squares(fun=get_residuals, x0=a_exact, jac='cs', args=args) p = Params(*res.x) C = np.linalg.inv(res.jac.T @ res.jac) r = get_residuals(p) nu = len(r) - len(p) chi2_r = np.sum(abs(r)**2) / nu Q = 1 - chi2.cdf(chi2_r*nu, df=nu) Latex(rf"$\chi^2_r = {chi2_r:.2g}, \qquad Q = {Q:.2g}$") nwalkers = 100 sampler = emcee.EnsembleSampler(nwalkers=nwalkers, ndim=len(p), log_prob_fn=log_prob, args=args) p0 = rng.multivariate_normal(mean=p, cov=C, size=nwalkers) %time state = sampler.run_mcmc(p0, 10100) # Actual run with 10000 walkers flat_samples = sampler.get_chain(discard=2500, thin=15, flat=True) fig = corner.corner(flat_samples, labels=labels) axes = np.array(fig.axes).reshape((4,4)) corner_plot(p, C, axes=axes); ``` We see some interesting features here. In particular, additional regions are appearing which we did not consider before. Inspecting a bit more closely, we see that one such region is related to our main region by $A \rightarrow -A$ and $\phi \rightarrow \phi \pm \pi$. This is completely expected since our model has many discrete degeneracies. We could cure this by introducing a prior that limits the range of the parameter. Many other issues, such as determining how many walkers to use, how long a burn-in period is required, etc., should also be studied. See the [emcee] documentation for details. ```{code-cell} ipython3 fig, axs = plt.subplots(4, figsize=(10, 7), sharex=True) samples = sampler.get_chain() for i in range(len(p)): ax = axs[i] ax.plot(samples[:, :, i], "k", alpha=0.3) ax.set_xlim(0, len(samples)) ax.set_ylabel(labels[i]) ax.yaxis.set_label_coords(-0.1, 0.5) axs[-1].set_xlabel("step number"); ``` [covariance matrix]: [uncertainties]: [`collections.namedtuple`]: [confidence region]: [nuisance parameter]: [least squares]: [algebra of random varables]: [principal componant analysis]: [reduced chi-square statistic]: [multivariate normal distribution]: [Cholesky decomposition]: [cumulative distribution function]: [tail distribution]: [one-sided $p$-value]: [test statistic]: [automatic differentiation]: [Gumbel distribution]: [Markov chain Monte Carlo]: [emcee]: ## References ```{code-cell} ipython3 ```