Bayesian inference of the Light Curve PSD

In this tutorial, we will demonstrate how you can use MCMC with pgmuvi to get a posterior distribution of the PSD of a light curve. We will use the same type of data as in the basic tutorial.

Why Bayesian inference?

When we use timeseries (this tutorial assumes we’re talking about an astronomical lightcurve, but it could be any timeseries) we may want to understand the underlying process that is generating the timeseries, such as the period(s) of variability, or we may want to be able to predict future behaviour. However, neither of these things are particularly useful without a good understanting of the uncertainty on them - if we want to predict future behaviour, we need to know how certain we are about our prediction!

One way to quantify the uncertainty on a quantity of interest is to use Bayesian inference. Bayesian inference allows us to infer the posterior distribution of a quantity of interest, given some data and some prior information. For example, if we want to infer the period of variability of a light curve, we can use Bayesian inference to infer the posterior distribution of the period, given the light curve and some prior information about the period. We can then use this posterior distribution to quantify the uncertainty on the period. In pgmuvi this translates to inferring the posterior distribution of the mixture components of the PSD of a light curve. We can then use this posterior distribution to quantify the uncertainty on the parameters of the mixture model.

Why MCMC?

To do Bayesian inference, we need to be able to quantify the distribution of the parameters of the model. In this case, we want to quantify the distribution of the mixture components of the PSD. We could do this analytically, but this is not always possible. In this case, we need to use numerical methods to quantify the distribution of the parameters. There are many of these, but one popular approach is to use Markov Chain Monte Carlo (MCMC). MCMC allows us to sample from a distribution by creating a Markov chain, which is a sequence of (nearly) uncorrelated random samples drawn from the target distribution. In this case, we want to sample from the posterior distribution of the mixture components of the PSD, using the values and gradients of the posterior probability at each step to produce optimised steps (so-called Hamiltonian Monte Carlo or HMC). We can then use these samples to quantify the distribution of the mixture components of the PSD, or to make predictions about the light curve (e.g. to predict future behaviour) accounting for the uncertainty in the mixture components of the PSD.

This tutorial

This tutorial will cover the following topics:

  • How to set up a model for Bayesian inference of the PSD of a light curve

  • How to run MCMC

  • How to visualize the results

It unfortunately cannot tell you everything there is to know about fitting timeseries data with MCMC using pgmuvi, but it aims to give you a good starting point for your own projects.

Some imports

Before we do anything, we need to make sure that pgmuvi imports correctly, and if it doesn’t we need to install it. We also import some other packages that we will need later.

[1]:
try: #This won't work right now - instead clone the repository and `pip install -e .`
    import pgmuvi
except ImportError:
    %pip install git+https://github.com/ICSM/pgmuvi.git
    import pgmuvi
[2]:
#torch.multiprocessing.get_all_sharing_strategies()

Creating the data

Now that we have imported pgmuvi, we can create some data to fit. We will use the same type of data as in the basic tutorial, but we will use a different random seed to get different data. This data is drawn from a sine wave with a randomly-chosen period between 30 and 300 days, and a Gaussian noise component with a standard deviation of 0.1 times the absolute value of the flux. The times are randomly chosen to cover between 3 and 10 periods of the sine wave, with 40 points in total.

[3]:
import torch
torch.multiprocessing.set_sharing_strategy('file_system')
import gpytorch
import numpy as np
""" Let's generate some synthetic data from a perturbed sine curve
    but on the same time sampling as the real data"""

P = np.random.uniform(30, 300)#137. #Days!
print("True period: ",P," days")
n_data = 40
jd_min = 2450000
n_periods = np.random.uniform(3,10)
jd_max = jd_min + P*(n_periods)
print("Simulating for ",n_periods," periods")

#train_mag =
#train_mag = train_mag + 0.1*torch.randn_like(train_mag)
#train_mag_err = 0.1*train_mag

period_guess = P*(np.random.uniform()+0.5)#147 #this number is in the same units as our original input.

#generate data from a simple case - superimpose two sine curves and add noise
timestamps_1d = torch.sort(torch.Tensor(np.random.uniform(jd_min, jd_max, size=n_data)))[0]#generate random x data here
fluxes_1d = torch.sin(timestamps_1d*(2*np.pi/P))#generate random y data here
fluxes_1d += 0.1*torch.randn_like(fluxes_1d)
flux_err_1d = 0.1*fluxes_1d.abs()
print("Generated data with ",n_data," points")
print("Period guess: ",period_guess," days")
print("Period guess: ",period_guess/P," periods")
print(timestamps_1d)
print(fluxes_1d)
print(flux_err_1d)
True period:  85.90981750732666  days
Simulating for  5.429473686481526  periods
Generated data with  40  points
Period guess:  72.13351400129208  days
Period guess:  0.8396422678367383  periods
tensor([2450008.7500, 2450009.0000, 2450016.0000, 2450017.0000, 2450021.2500,
        2450022.0000, 2450036.7500, 2450052.0000, 2450052.2500, 2450057.5000,
        2450081.0000, 2450092.2500, 2450096.0000, 2450108.0000, 2450119.0000,
        2450139.2500, 2450141.7500, 2450143.0000, 2450151.5000, 2450181.5000,
        2450187.0000, 2450189.5000, 2450210.0000, 2450273.0000, 2450312.0000,
        2450316.0000, 2450316.5000, 2450320.7500, 2450324.7500, 2450332.5000,
        2450343.7500, 2450358.7500, 2450384.5000, 2450394.0000, 2450403.5000,
        2450417.0000, 2450422.5000, 2450427.7500, 2450435.5000, 2450455.5000])
tensor([ 0.5816,  0.6123,  0.0909,  0.1241, -0.0474, -0.1691, -1.0325, -0.6349,
        -0.6861, -0.5253,  0.9151,  0.6993,  0.4829, -0.2919, -0.8768, -0.3999,
        -0.3155, -0.5659,  0.3321,  0.6142,  0.3667,  0.0015, -1.1482,  0.4097,
        -0.5810, -0.2861, -0.3967,  0.1772,  0.3555,  0.8483,  0.9226,  0.1858,
        -1.1437, -0.7637, -0.0636,  0.4618,  0.9705,  0.9687,  0.9063, -0.4795])
tensor([0.0582, 0.0612, 0.0091, 0.0124, 0.0047, 0.0169, 0.1033, 0.0635, 0.0686,
        0.0525, 0.0915, 0.0699, 0.0483, 0.0292, 0.0877, 0.0400, 0.0315, 0.0566,
        0.0332, 0.0614, 0.0367, 0.0001, 0.1148, 0.0410, 0.0581, 0.0286, 0.0397,
        0.0177, 0.0356, 0.0848, 0.0923, 0.0186, 0.1144, 0.0764, 0.0064, 0.0462,
        0.0970, 0.0969, 0.0906, 0.0479])
/home/docs/checkouts/readthedocs.org/user_builds/pgmuvi/envs/latest/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Getting the Lightcurve object

[4]:
from pgmuvi.lightcurve import Lightcurve

lightcurve_1d = Lightcurve(timestamps_1d, fluxes_1d, yerr = flux_err_1d, xtransform='minmax')

Our Model

Now we can create our model. This is very similar to the previous tutorial, but with one small complication. When we didn’t use MCMC, we wanted to learn additional diagonal noise to account for the intrinsic scatter of the data even if we had uncertainties on the data. However, when we use MCMC, we need to be careful about how we define our likelihood. If we attempt to learn this additional noise, gpytorch will inject NaNs along the diagonal of the covariance matrix, which will cause the MCMC sampler to fail. However, if you don’t have uncertainty information, you can still learn this additional noise.

[5]:
# This won't work! Learning the additional noise results in NaNs in the covariance matrix during MCMC
# lightcurve_1d.set_model(model='1D', likelihood='learn', num_mixtures=1)

Instead we use the following:

[6]:
lightcurve_1d.set_model(model='1D', num_mixtures=1)

It’s perhaps not as useful as for optimisation, but we can still choose to set initial values for the hyperparameters. This will help the sampler to converge more quickly, which can take a significant fraction of the total runtime. However, in principle it should be possible to compensate for a bad initialisation by setting a longer warm-up (also known as burn-in) period.

[7]:
lightcurve_1d.print_parameters()
print(period_guess)
guess = {
         'sci_kernel.mixture_means': torch.Tensor([1/period_guess]),}
lightcurve_1d.set_hypers(guess)
lightcurve_1d.print_parameters()
mean_module.constant: 0.0
covar_module.mixture_weights: tensor([0.6295])
covar_module.mixture_means: tensor([[[1.6870]]])
covar_module.mixture_scales: tensor([[[0.0020]]])
72.13351400129208
mean_module.constant: 0.0
covar_module.mixture_weights: tensor([0.6295])
covar_module.mixture_means: tensor([[[0.0139]]])
covar_module.mixture_scales: tensor([[[0.0020]]])

Doing MCMC

Now we’re ready to actually do some MCMC. By default, pgmuvi will use pyro’s implementation of the No U-Turn Sampler (or NUTS), but any pyro MCMC sampler will work. You can change this with the sampler keyword argument. The most important arguments to worry about are num_samples and warmup_steps. These control the number of samples to draw from the posterior distribution, and the number of samples to discard as burn-in. As a result, the runtime of the sampler is roughly proportional to the sum of these two numbers. The burn-in steps are used to find a good region of the parameter space, tune the step size and mass matrix of the sampler to make it more efficient, and bring the chain into equilibrium. As a result, you should always check the trace plot of the chain to make sure that the burn-in period is sufficient.

The post-warmup samples are then used to estimate the posterior distribution of the parameters. The advantage of NUTS is that it produces samples that are approximately independent of each other, so you can use all of them to estimate the posterior distribution. This means that num_samples can be much smaller than with other samplers. For example, if all you care about is the mean and standard deviation of the marginal posteriors, you can get a good estimate with only a few hundred samples.

In principle, you could also use the num_chains argument to run multiple chains, however this is still in development.

[8]:
lightcurve_1d.mcmc(num_samples=100, #0,
                   warmup_steps=100,
                   num_chains=1)
Sample: 100%|██████████| 200/200 [03:51,  1.16s/it, step size=4.82e-01, acc. prob=0.923]
/home/docs/checkouts/readthedocs.org/user_builds/pgmuvi/envs/latest/lib/python3.11/site-packages/arviz/data/io_pyro.py:157: UserWarning: Could not get vectorized trace, log_likelihood group will be omitted. Check your model vectorization or set log_likelihood=False
  warnings.warn(

Interpreting the results

First we want to make a quick summary of the results. We can do this by plotting the marginal posterior distributions of each parameter, and by printing out the mean and standard deviation of each parameter. The mean and standard deviation are useful because they give us a sense of the “best fit” parameters, and the uncertainty on those parameters - or more formally, the credible interval of the parameters and the point estimate.

For these purposes, pgmuvi provides a number of helpful methods. The first of these is Lightcurve.summary(), which produces a summary of the results. This includes the median and MAD (you can switch it to the mean and standard deviation if you prefer) of the marginal posterior distributions of each parameter, as well as the 68.3% equal-tailed interval (or highest-density interval). These can be used as approximations of the point estimate and credible interval of the parameters. It will also produce a few diagnostic numbers, including the effective sample size (ESS) of each parameter, which is a measure of how well the chain has converged, and the Monte Carlo Standard Error (MCSE), which is a measure of the sampling error of the mean of each parameter. It also produces the Gelman-Rubin statistic (\(\hat{R}\)), which is a measure of how well the chains have mixed (NB: this is not currently calculated because at present only a single chain is computed, while the arviz implementation of \(\hat{R}\) needs more than one chain).

[9]:
lightcurve_1d.summary()
arviz - WARNING - Shape validation failed: input_shape: (1, 100), minimum_shape: (chains=2, draws=4)
[9]:
median mad eti_15.85% eti_84.15% mcse_median ess_median ess_tail r_hat
covar_module.mixture_weights_prior[0] 0.50 0.22 0.27 1.29 0.05 144.10 60.03 NaN
mean_module.mean_prior -0.03 0.02 -0.05 0.00 0.00 129.20 52.45 NaN
raw_periods[0, 0, 0] 85.73 1.46 83.73 88.08 0.26 102.09 38.10 NaN
raw_frequencies[0, 0, 0] 0.01 0.00 0.01 0.01 0.00 102.09 38.10 NaN
raw_period_scales[0, 0, 0] 388.37 120.70 251.37 654.35 24.46 115.19 71.29 NaN
raw_frequency_scales[0, 0, 0] 0.00 0.00 0.00 0.00 0.00 115.19 71.29 NaN

We also want to take a look at how the MCMC sampler explored the parameter space. This is often referred to as the “trace” of the MCMC sampler.

[10]:
lightcurve_1d.plot_trace()
../_images/notebooks_pgmuvi_tutorial_mcmc_22_0.png

This plot shows the value of each parameter as a function of the step number of the chain in the right-hand columns of plots. This is useful for checking that the chain has converged, and that the burn-in period was sufficient. If the chain has not converged, you will see a clear trend in the parameter values as a function of step number. The left-hand column shows the marginal posterior distribution of each parameter, which is useful for seeing how complex the distribution is - is the simple summary reported by summary() actually useful, or do you need more complex information to communicate what is happening?

This is where the “corner” plot comes in. This plot shows the joint-marginal posterior distribution of each pair of parameters (i.e. the distribution after marginalising over all parameters except for that pair), as well as the marginal posterior distribution of each parameter. This is useful for seeing how the parameters are correlated with each other, or for seeing if there are any degeneracies between parameters. For example, if two parameters are strongly correlated, then you will see a clear trend in the joint-marginal posterior distribution, and the marginal posterior distribution of each parameter will be broad. However, you should bear in mind that some of the parameters shown by default in this plot are directly derived from others, for example the periods from the frequencies. This means that the joint-marginal posterior distribution of these parameters is not necessarily meaningful, and you should instead look at the joint-marginal posterior distribution of the parameters that they are derived from.

[11]:
lightcurve_1d.plot_corner()
../_images/notebooks_pgmuvi_tutorial_mcmc_24_0.png

Plotting the marginal posterior distributions is easy with plot_pair. pgmuvi often works best with kind='scatter', but you can also try out kind='kde' or kind='hexbin' for prettier plots. Scatter plots work well even when you have few posterior samples, but KDE and hexbin plots look much nicer when you have more samples. Hence, if you’re in a hurry you probably want to stick with scatter plots, but if you have time to spare you can try out the other options.

We also want to plot the light curve with samples from the posterior distribution of the model. This lets us see how well the model fits the data, and how well the model can predict the data. This is often referred to as the “posterior predictive distribution”, and is the distribution of the data given the model and the parameters.

[12]:
# plot goes here
lightcurve_1d.plot(mcmc_samples=True)
torch.Size([10000, 1])
torch.Size([100, 10000, 1])
../_images/notebooks_pgmuvi_tutorial_mcmc_27_1.png
[12]:
../_images/notebooks_pgmuvi_tutorial_mcmc_27_2.png

You will hopefully see that the model produces mean functions which reproduce the data very well. This is great!

We might also want to look at the posterior distribution of the PSD. This is useful to understand how much the data constrain the PSD, and hence how much we can trust the results. The same information is effectively contained in the posterior distribution of the model parameters (as reported in the summary or displayed in the trace and the corner plot), but it is often easier to understand the PSD directly.

[13]:
# plot goes here
lightcurve_1d.plot_psd(mcmc_samples=True, log=(True, True))
tensor([2.3392e-05], grad_fn=<MulBackward0>) 0.25 8.0
torch.Size([341903])
../_images/notebooks_pgmuvi_tutorial_mcmc_30_1.png
[13]:
(<Figure size 800x600 with 1 Axes>, <Axes: >)

And that’s it! You now know how to use pgmuvi to fit a model to a light curve using MCMC, and how to interpret the results.