Background¶
This page provides background on the mathematical and scientific concepts underlying pgmuvi.
It is intended as a companion to the tutorials and how-to guides, and assumes familiarity
with basic statistics and time-series analysis but does not require deep knowledge of
Gaussian processes or Bayesian inference.
Gaussian Processes¶
A Gaussian process (GP) is a probability distribution over functions. Any finite collection of function values drawn from a GP follows a multivariate Gaussian distribution. A GP is fully specified by:
a mean function \(m(\mathbf{x})\), which describes the expected value of the function at any input \(\mathbf{x}\), and
a covariance (kernel) function \(k(\mathbf{x}, \mathbf{x}')\), which encodes how strongly two function values at inputs \(\mathbf{x}\) and \(\mathbf{x}'\) are correlated.
In pgmuvi, the inputs \(\mathbf{x}\) are observation times (and, optionally,
wavelengths for multiband data), and the function values are source fluxes or magnitudes.
The GP prior is updated with observed data to produce a posterior distribution, which
provides both predictions and rigorous uncertainty estimates.
Further reading:
Rasmussen & Williams (2006), Gaussian Processes for Machine Learning (free PDF)
GPyTorch documentation (the GP library that
pgmuviis built on)
Spectral Mixture Kernels¶
The choice of kernel function determines what kind of variability patterns the GP can
represent. pgmuvi uses Spectral Mixture Kernels (SMK; Wilson & Adams 2013),
which are a highly flexible family of kernels capable of capturing quasi-periodic
signals, coloured noise, and multiple simultaneous periodicities.
A Spectral Mixture Kernel represents the power spectral density (PSD) of the GP as a mixture of Gaussians in the frequency domain:
where \(Q\) is the number of mixture components (num_mixtures), and each
component has weight \(w_q\), centre frequency \(\mu_q\) (which corresponds
to a period \(1/\mu_q\)), and bandwidth \(\sigma_q\) (which controls how
quickly coherence is lost over time).
This parameterisation means that pgmuvi directly infers the shape of the PSD
rather than assuming a rigid functional form (e.g., a pure sinusoid or a power law).
Reference:
Wilson, A. G. & Adams, R. P. (2013), Gaussian Process Kernels for Pattern Discovery and Extrapolation, ICML. arXiv:1302.4245
Power Spectral Density¶
The power spectral density (PSD) describes how the variance of a time series is distributed across frequencies. A peak in the PSD at frequency \(\nu\) indicates quasi-periodic variability with period \(P = 1/\nu\). Broad, low-frequency power is characteristic of red noise (stochastic variability that is correlated on long timescales), while narrow peaks indicate coherent oscillations.
Because pgmuvi infers the PSD non-parametrically via a spectral mixture kernel,
it can simultaneously represent:
one or more quasi-periodic signals,
aperiodic stochastic variability (red noise),
white noise (through the GP likelihood),
long-term trends (through the mean function).
Multiwavelength Variability Inference¶
Many astrophysical sources are observed simultaneously at multiple wavelengths or in
multiple photometric bands. pgmuvi supports 2D Gaussian processes where the
inputs \(\mathbf{x} = (t, \lambda)\) include both time and wavelength. Two
families of 2D kernel are available:
The default
model="2D"uses a non-separable 2D spectral-mixture kernel defined jointly over time and wavelength, allowing the GP to learn their joint covariance structure.The separable model family (
"2DSeparable","2DAchromatic","2DWavelengthDependent", etc.) uses a product of a temporal kernel and a wavelength kernel, which is a natural assumption when the variability structure can be factored across the two dimensions.
Both families allow the GP to learn:
the temporal variability structure (periodicities, noise),
how variability amplitude and coherence vary with wavelength.
This is particularly useful when different bands are observed at different cadences or have different noise levels — the GP naturally handles irregular and asynchronous sampling.
Bayesian Parameter Inference¶
pgmuvi currently supports one mode of parameter estimation:
Optimisation (MAP estimation): The hyperparameters are optimised to maximise the (marginal) log likelihood. This is fast and suitable for exploratory analysis or when the posterior is well-behaved. Use
fit().
Note
MCMC sampling is planned but not yet available. Full posterior samples via
Hamiltonian Monte Carlo will be accessible through
mcmc() in a future release. In the current
version, calling mcmc() raises NotImplementedError.
Initialisation with Lomb–Scargle¶
GP optimisation can be sensitive to the starting point. pgmuvi uses the
Lomb–Scargle periodogram (and its multiband extension) to initialise the
spectral mixture kernel frequencies before fitting. This substantially improves
convergence and reduces the risk of settling in a poor local optimum.
In the current API, this Lomb–Scargle-based initialisation is performed
internally by fit() (when
use_mls_init=True, which is the default). fit_LS()
is a public helper that runs the same periodogram and returns the detected
peaks, but kernel initialisation is performed by fit(), not fit_LS()
alone.