Loading and Preparing Data¶

This guide shows how to load observational data into pgmuvi and prepare it for fitting.

Overview ¶

pgmuvi expects data as three parallel arrays:

times — observation epochs (any consistent time unit, e.g., days, MJD).
fluxes — flux or magnitude measurements.
errors — 1-σ uncertainties on the measurements.

All three arrays must have the same length. For multiband data, each array has one row per observation across all bands (see Multiwavelength (2D) Analysis).

Creating a Lightcurve ¶

Pass the arrays directly to the constructor:

import pgmuvi
import numpy as np

times  = np.array([...])   # shape (N,)
fluxes = np.array([...])   # shape (N,)
errors = np.array([...])   # shape (N,)

lc = pgmuvi.lightcurve.Lightcurve(times, fluxes, errors)

The data are stored internally as PyTorch tensors. You can retrieve them as NumPy arrays via lc.xdata.cpu().numpy(), etc.

Loading from a File ¶

From a CSV file

from_csv() reads a CSV file directly. Column names are matched case-insensitively using common aliases, so in most cases no extra arguments are required:

import pgmuvi

lc = pgmuvi.lightcurve.Lightcurve.from_csv("my_lightcurve.csv")

For multiband CSV files that include a numeric wavelength column, pass the column name explicitly or let the method auto-detect it:

# Explicit wavelength column
lc = pgmuvi.lightcurve.Lightcurve.from_csv(
    "multiband.csv", wavelcol="wavelength_um"
)

# Or specify time and wavelength together
lc = pgmuvi.lightcurve.Lightcurve.from_csv(
    "multiband.csv", xcol=["mjd", "wavelength_um"]
)

If the CSV contains a string band-identifier column (e.g. band or filter with values like "V", "R"), that column may be automatically stored in band for labelling purposes. For 2-D (multiband) lightcurves this happens automatically. For 1-D lightcurves, auto-population only occurs when the band-ID column contains exactly one distinct non-empty label (matching the 1-D constructor contract); if multiple distinct labels are present, band is left unset and a warning is emitted. Note that these string labels are for human readability only — the GP model requires a numeric wavelength in column 1 of xdata (see Multiwavelength (2D) Analysis).

From an Astropy-compatible format

from_table() builds a light curve from an astropy.table.Table instance or any file format that Astropy can read (FITS, VOTable, many ASCII dialects):

import pgmuvi

lc = pgmuvi.lightcurve.Lightcurve.from_table("my_lightcurve.vot")

Example from an in-memory table:

from astropy.table import Table
import pgmuvi

t = Table.read("my_lightcurve.fits")
lc = pgmuvi.lightcurve.Lightcurve.from_table(t)

From raw arrays

For any other format, read the data manually and pass arrays directly:

import numpy as np
import pgmuvi

data = np.loadtxt("my_lightcurve.csv", delimiter=",")
lc = pgmuvi.lightcurve.Lightcurve(data[:, 0], data[:, 1], data[:, 2])

Adding More Observations ¶

Merging a new band into an existing multiband lightcurve

merge() appends a new band to an existing 2-D light curve. The calling object must already be 2-D; 1-D inputs are promoted automatically when a wavelength is supplied. For 1-D inputs that have no band attribute set, you must also pass band= explicitly (otherwise a ValueError is raised):

# lc2d is an existing 2-D lightcurve; lc_new is a new single-band lc
merged = lc2d.merge(lc_new, wavelength=0.80, band="I")   # 0.80 μm, band "I"

You can also merge directly from a CSV path:

merged = lc2d.merge("new_band.csv", wavelength=0.80, band="I")

Combining multiple lightcurves into one multiband object

concat() is a class method that builds a 2-D light curve from a list of single-band (or already-multiband) objects. Every 1-D input must carry both band information (either set at construction time via band= or via from_csv()) and a scalar wavelength value (lc.wavelength, lc.wave, or lc.lambda_); concat() raises a ValueError if either is missing:

lc_V.band = "V";  lc_V.wavelength = 0.55
lc_R.band = "R";  lc_R.wavelength = 0.64
lc_I.band = "I";  lc_I.wavelength = 0.80
combined = pgmuvi.lightcurve.Lightcurve.concat([lc_V, lc_R, lc_I])

Both methods accept on_conflict="skip" to drop duplicate bands and emit a UserWarning rather than raising an error.

Concatenating arrays before construction

For simple cases where band information is not needed, concatenate the NumPy arrays before constructing the Lightcurve:

import numpy as np
import pgmuvi

all_times  = np.concatenate([times,  new_times])
all_fluxes = np.concatenate([fluxes, new_fluxes])
all_errors = np.concatenate([errors, new_errors])

lc = pgmuvi.lightcurve.Lightcurve(all_times, all_fluxes, all_errors)

Note

For 2D / multiband data, xdata must have shape (N, 2) with column 0 being time and column 1 being a numeric wavelength. See Multiwavelength (2D) Analysis.

Data Transformations ¶

GP optimisation can be sensitive to the scale of the input data. pgmuvi provides built-in transformations to rescale the time and flux axes:

Transform	Description
`'minmax'`	Rescale to [0, 1] using min and max.
`'zscore'`	Standardise to zero mean, unit variance.
`'robust_score'`	Standardise using median and MAD (median absolute deviation; robust to outliers).

Apply a transformation at construction time via the xtransform and ytransform keyword arguments:

lc = pgmuvi.lightcurve.Lightcurve(
    times, fluxes, errors,
    xtransform="minmax",
    ytransform="zscore",
)

The GP is trained in the transformed space, but all results and plots are automatically inverse-transformed back to the original units.

Working with Magnitudes ¶

Native magnitude support is planned for a future release but is not currently available. If your data are in magnitudes, convert them to (relative) flux before constructing the Lightcurve. A common choice is:

\[f \propto 10^{-0.4\,m}\]

In code:

import numpy as np
import pgmuvi

# mags and mag_errors are your input magnitudes and uncertainties
fluxes = 10 ** (-0.4 * mags)
errors = fluxes * np.log(10) * 0.4 * mag_errors

lc = pgmuvi.lightcurve.Lightcurve(times, fluxes, errors)

Only relative variations matter for most pgmuvi analyses, so the overall flux normalisation is arbitrary.

Checking Data Quality ¶

Before fitting, assess whether the observations are sufficient to detect the variability timescales you are interested in:

lc.assess_sampling_quality()

See Data Preprocessing for more detail on sampling quality metrics and filtering.

Exporting Data ¶

The loaded data can be exported to an Astropy table or a VO Table file:

table = lc.to_table()
lc.write_votable("lightcurve_output.xml")