Background
==========
This page provides background on the mathematical and scientific concepts underlying ``pgmuvi``.
It is intended as a companion to the tutorials and how-to guides, and assumes familiarity
with basic statistics and time-series analysis but does not require deep knowledge of
Gaussian processes or Bayesian inference.
.. contents:: On this page
:local:
:depth: 2
Gaussian Processes
------------------
A Gaussian process (GP) is a probability distribution over functions.
Any finite collection of function values drawn from a GP follows a multivariate
Gaussian distribution. A GP is fully specified by:
* a **mean function** :math:`m(\mathbf{x})`, which describes the expected value of the
function at any input :math:`\mathbf{x}`, and
* a **covariance (kernel) function** :math:`k(\mathbf{x}, \mathbf{x}')`, which encodes
how strongly two function values at inputs :math:`\mathbf{x}` and :math:`\mathbf{x}'`
are correlated.
In ``pgmuvi``, the inputs :math:`\mathbf{x}` are observation times (and, optionally,
wavelengths for multiband data), and the function values are source fluxes or magnitudes.
The GP prior is updated with observed data to produce a posterior distribution, which
provides both predictions and rigorous uncertainty estimates.
Further reading:
* Rasmussen & Williams (2006), *Gaussian Processes for Machine Learning*
(`free PDF `_)
* `GPyTorch documentation `_ (the GP library that
``pgmuvi`` is built on)
Spectral Mixture Kernels
------------------------
The choice of kernel function determines what kind of variability patterns the GP can
represent. ``pgmuvi`` uses **Spectral Mixture Kernels** (SMK; Wilson & Adams 2013),
which are a highly flexible family of kernels capable of capturing quasi-periodic
signals, coloured noise, and multiple simultaneous periodicities.
A Spectral Mixture Kernel represents the power spectral density (PSD) of the GP as
a mixture of Gaussians in the frequency domain:
.. math::
S(\omega) = \sum_{q=1}^{Q} w_q \,
\mathcal{N}(\omega \,|\, \mu_q, \sigma_q^2)
where :math:`Q` is the number of mixture components (``num_mixtures``), and each
component has weight :math:`w_q`, centre frequency :math:`\mu_q` (which corresponds
to a period :math:`1/\mu_q`), and bandwidth :math:`\sigma_q` (which controls how
quickly coherence is lost over time).
This parameterisation means that ``pgmuvi`` directly infers the *shape* of the PSD
rather than assuming a rigid functional form (e.g., a pure sinusoid or a power law).
Reference:
* Wilson, A. G. & Adams, R. P. (2013), *Gaussian Process Kernels for Pattern
Discovery and Extrapolation*, ICML. `arXiv:1302.4245 `_
Power Spectral Density
----------------------
The **power spectral density** (PSD) describes how the variance of a time series is
distributed across frequencies. A peak in the PSD at frequency :math:`\nu` indicates
quasi-periodic variability with period :math:`P = 1/\nu`. Broad, low-frequency power
is characteristic of red noise (stochastic variability that is correlated on long
timescales), while narrow peaks indicate coherent oscillations.
Because ``pgmuvi`` infers the PSD non-parametrically via a spectral mixture kernel,
it can simultaneously represent:
* one or more quasi-periodic signals,
* aperiodic stochastic variability (red noise),
* white noise (through the GP likelihood),
* long-term trends (through the mean function).
Multiwavelength Variability Inference
--------------------------------------
Many astrophysical sources are observed simultaneously at multiple wavelengths or in
multiple photometric bands. ``pgmuvi`` supports **2D Gaussian processes** where the
inputs :math:`\mathbf{x} = (t, \lambda)` include both time and wavelength. Two
families of 2D kernel are available:
* The default ``model="2D"`` uses a **non-separable 2D spectral-mixture kernel**
defined jointly over time and wavelength, allowing the GP to learn their joint
covariance structure.
* The **separable model family** (``"2DSeparable"``, ``"2DAchromatic"``,
``"2DWavelengthDependent"``, etc.) uses a product of a temporal kernel and a
wavelength kernel, which is a natural assumption when the variability structure
can be factored across the two dimensions.
Both families allow the GP to learn:
* the temporal variability structure (periodicities, noise),
* how variability amplitude and coherence vary with wavelength.
This is particularly useful when different bands are observed at different cadences or
have different noise levels — the GP naturally handles irregular and asynchronous
sampling.
Bayesian Parameter Inference
-----------------------------
``pgmuvi`` currently supports one mode of parameter estimation:
1. **Optimisation (MAP estimation):** The hyperparameters are optimised to maximise
the (marginal) log likelihood. This is fast and suitable for exploratory analysis
or when the posterior is well-behaved. Use :meth:`~pgmuvi.lightcurve.Lightcurve.fit`.
.. note::
**MCMC sampling is planned but not yet available.** Full posterior samples via
Hamiltonian Monte Carlo will be accessible through
:meth:`~pgmuvi.lightcurve.Lightcurve.mcmc` in a future release. In the current
version, calling ``mcmc()`` raises ``NotImplementedError``.
Initialisation with Lomb–Scargle
---------------------------------
GP optimisation can be sensitive to the starting point. ``pgmuvi`` uses the
**Lomb–Scargle periodogram** (and its multiband extension) to initialise the
spectral mixture kernel frequencies before fitting. This substantially improves
convergence and reduces the risk of settling in a poor local optimum.
In the current API, this Lomb–Scargle-based initialisation is performed
internally by :meth:`~pgmuvi.lightcurve.Lightcurve.fit` (when
``use_mls_init=True``, which is the default). :meth:`~pgmuvi.lightcurve.Lightcurve.fit_LS`
is a public helper that runs the same periodogram and returns the detected
peaks, but kernel initialisation is performed by ``fit()``, not ``fit_LS()``
alone.