Loading and Preparing Data

This guide shows how to load observational data into pgmuvi and prepare it for fitting.

Overview

pgmuvi expects data as three parallel arrays:

  • times — observation epochs (any consistent time unit, e.g., days, MJD).

  • fluxes — flux or magnitude measurements.

  • errors — 1-σ uncertainties on the measurements.

All three arrays must have the same length. For multiband data, each array has one row per observation across all bands (see Multiwavelength (2D) Analysis).

Creating a Lightcurve

Pass the arrays directly to the constructor:

import pgmuvi
import numpy as np

times  = np.array([...])   # shape (N,)
fluxes = np.array([...])   # shape (N,)
errors = np.array([...])   # shape (N,)

lc = pgmuvi.lightcurve.Lightcurve(times, fluxes, errors)

The data are stored internally as PyTorch tensors. You can retrieve them as NumPy arrays via lc.xdata.cpu().numpy(), etc.

Loading from a File

From a CSV file

from_csv() reads a CSV file directly. Column names are matched case-insensitively using common aliases, so in most cases no extra arguments are required:

import pgmuvi

lc = pgmuvi.lightcurve.Lightcurve.from_csv("my_lightcurve.csv")

For multiband CSV files that include a numeric wavelength column, pass the column name explicitly or let the method auto-detect it:

# Explicit wavelength column
lc = pgmuvi.lightcurve.Lightcurve.from_csv(
    "multiband.csv", wavelcol="wavelength_um"
)

# Or specify time and wavelength together
lc = pgmuvi.lightcurve.Lightcurve.from_csv(
    "multiband.csv", xcol=["mjd", "wavelength_um"]
)

If the CSV contains a string band-identifier column (e.g. band or filter with values like "V", "R"), that column may be automatically stored in band for labelling purposes. For 2-D (multiband) lightcurves this happens automatically. For 1-D lightcurves, auto-population only occurs when the band-ID column contains exactly one distinct non-empty label (matching the 1-D constructor contract); if multiple distinct labels are present, band is left unset and a warning is emitted. Note that these string labels are for human readability only — the GP model requires a numeric wavelength in column 1 of xdata (see Multiwavelength (2D) Analysis).

From an Astropy-compatible format

from_table() builds a light curve from an astropy.table.Table instance or any file format that Astropy can read (FITS, VOTable, many ASCII dialects):

import pgmuvi

lc = pgmuvi.lightcurve.Lightcurve.from_table("my_lightcurve.vot")

Example from an in-memory table:

from astropy.table import Table
import pgmuvi

t = Table.read("my_lightcurve.fits")
lc = pgmuvi.lightcurve.Lightcurve.from_table(t)

From raw arrays

For any other format, read the data manually and pass arrays directly:

import numpy as np
import pgmuvi

data = np.loadtxt("my_lightcurve.csv", delimiter=",")
lc = pgmuvi.lightcurve.Lightcurve(data[:, 0], data[:, 1], data[:, 2])

Adding More Observations

Merging a new band into an existing multiband lightcurve

merge() appends a new band to an existing 2-D light curve. The calling object must already be 2-D; 1-D inputs are promoted automatically when a wavelength is supplied. For 1-D inputs that have no band attribute set, you must also pass band= explicitly (otherwise a ValueError is raised):

# lc2d is an existing 2-D lightcurve; lc_new is a new single-band lc
merged = lc2d.merge(lc_new, wavelength=0.80, band="I")   # 0.80 μm, band "I"

You can also merge directly from a CSV path:

merged = lc2d.merge("new_band.csv", wavelength=0.80, band="I")

Combining multiple lightcurves into one multiband object

concat() is a class method that builds a 2-D light curve from a list of single-band (or already-multiband) objects. Every 1-D input must carry both band information (either set at construction time via band= or via from_csv()) and a scalar wavelength value (lc.wavelength, lc.wave, or lc.lambda_); concat() raises a ValueError if either is missing:

lc_V.band = "V";  lc_V.wavelength = 0.55
lc_R.band = "R";  lc_R.wavelength = 0.64
lc_I.band = "I";  lc_I.wavelength = 0.80
combined = pgmuvi.lightcurve.Lightcurve.concat([lc_V, lc_R, lc_I])

Both methods accept on_conflict="skip" to drop duplicate bands and emit a UserWarning rather than raising an error.

Concatenating arrays before construction

For simple cases where band information is not needed, concatenate the NumPy arrays before constructing the Lightcurve:

import numpy as np
import pgmuvi

all_times  = np.concatenate([times,  new_times])
all_fluxes = np.concatenate([fluxes, new_fluxes])
all_errors = np.concatenate([errors, new_errors])

lc = pgmuvi.lightcurve.Lightcurve(all_times, all_fluxes, all_errors)

Note

For 2D / multiband data, xdata must have shape (N, 2) with column 0 being time and column 1 being a numeric wavelength. See Multiwavelength (2D) Analysis.

Data Transformations

GP optimisation can be sensitive to the scale of the input data. pgmuvi provides built-in transformations to rescale the time and flux axes:

Transform

Description

'minmax'

Rescale to [0, 1] using min and max.

'zscore'

Standardise to zero mean, unit variance.

'robust_score'

Standardise using median and MAD (median absolute deviation; robust to outliers).

Apply a transformation at construction time via the xtransform and ytransform keyword arguments:

lc = pgmuvi.lightcurve.Lightcurve(
    times, fluxes, errors,
    xtransform="minmax",
    ytransform="zscore",
)

The GP is trained in the transformed space, but all results and plots are automatically inverse-transformed back to the original units.

Working with Magnitudes

Native magnitude support is planned for a future release but is not currently available. If your data are in magnitudes, convert them to (relative) flux before constructing the Lightcurve. A common choice is:

\[f \propto 10^{-0.4\,m}\]

In code:

import numpy as np
import pgmuvi

# mags and mag_errors are your input magnitudes and uncertainties
fluxes = 10 ** (-0.4 * mags)
errors = fluxes * np.log(10) * 0.4 * mag_errors

lc = pgmuvi.lightcurve.Lightcurve(times, fluxes, errors)

Only relative variations matter for most pgmuvi analyses, so the overall flux normalisation is arbitrary.

Checking Data Quality

Before fitting, assess whether the observations are sufficient to detect the variability timescales you are interested in:

lc.assess_sampling_quality()

See Data Preprocessing for more detail on sampling quality metrics and filtering.

Exporting Data

The loaded data can be exported to an Astropy table or a VO Table file:

table = lc.to_table()
lc.write_votable("lightcurve_output.xml")