Loading and Preparing Data¶
This guide shows how to load observational data into pgmuvi and prepare it for
fitting.
Overview¶
pgmuvi expects data as three parallel arrays:
times — observation epochs (any consistent time unit, e.g., days, MJD).
fluxes — flux or magnitude measurements.
errors — 1-σ uncertainties on the measurements.
All three arrays must have the same length. For multiband data, each array has one row per observation across all bands (see Multiwavelength (2D) Analysis).
Creating a Lightcurve¶
Pass the arrays directly to the constructor:
import pgmuvi
import numpy as np
times = np.array([...]) # shape (N,)
fluxes = np.array([...]) # shape (N,)
errors = np.array([...]) # shape (N,)
lc = pgmuvi.lightcurve.Lightcurve(times, fluxes, errors)
The data are stored internally as PyTorch tensors. You can retrieve them as NumPy
arrays via lc.xdata.cpu().numpy(), etc.
Loading from a File¶
From a CSV file
from_csv() reads a CSV file directly.
Column names are matched case-insensitively using common aliases, so in most
cases no extra arguments are required:
import pgmuvi
lc = pgmuvi.lightcurve.Lightcurve.from_csv("my_lightcurve.csv")
For multiband CSV files that include a numeric wavelength column, pass the column name explicitly or let the method auto-detect it:
# Explicit wavelength column
lc = pgmuvi.lightcurve.Lightcurve.from_csv(
"multiband.csv", wavelcol="wavelength_um"
)
# Or specify time and wavelength together
lc = pgmuvi.lightcurve.Lightcurve.from_csv(
"multiband.csv", xcol=["mjd", "wavelength_um"]
)
If the CSV contains a string band-identifier column (e.g. band or
filter with values like "V", "R"), that column may be automatically
stored in band for labelling purposes.
For 2-D (multiband) lightcurves this happens automatically. For 1-D
lightcurves, auto-population only occurs when the band-ID column contains
exactly one distinct non-empty label (matching the 1-D constructor contract); if
multiple distinct labels are present, band is left unset and a warning is
emitted.
Note that these string labels are for human readability only — the GP model
requires a numeric wavelength in column 1 of xdata (see
Multiwavelength (2D) Analysis).
From an Astropy-compatible format
from_table() builds a light curve from an
astropy.table.Table instance or any file format that Astropy can read
(FITS, VOTable, many ASCII dialects):
import pgmuvi
lc = pgmuvi.lightcurve.Lightcurve.from_table("my_lightcurve.vot")
Example from an in-memory table:
from astropy.table import Table
import pgmuvi
t = Table.read("my_lightcurve.fits")
lc = pgmuvi.lightcurve.Lightcurve.from_table(t)
From raw arrays
For any other format, read the data manually and pass arrays directly:
import numpy as np
import pgmuvi
data = np.loadtxt("my_lightcurve.csv", delimiter=",")
lc = pgmuvi.lightcurve.Lightcurve(data[:, 0], data[:, 1], data[:, 2])
Adding More Observations¶
Merging a new band into an existing multiband lightcurve
merge() appends a new band to an
existing 2-D light curve. The calling object must already be 2-D; 1-D
inputs are promoted automatically when a wavelength is supplied. For 1-D
inputs that have no band attribute set, you must also pass band=
explicitly (otherwise a ValueError is raised):
# lc2d is an existing 2-D lightcurve; lc_new is a new single-band lc
merged = lc2d.merge(lc_new, wavelength=0.80, band="I") # 0.80 μm, band "I"
You can also merge directly from a CSV path:
merged = lc2d.merge("new_band.csv", wavelength=0.80, band="I")
Combining multiple lightcurves into one multiband object
concat() is a class method that builds a
2-D light curve from a list of single-band (or already-multiband) objects.
Every 1-D input must carry both band information (either set at construction
time via band= or via from_csv()) and
a scalar wavelength value (lc.wavelength, lc.wave, or lc.lambda_);
concat() raises a ValueError if either is missing:
lc_V.band = "V"; lc_V.wavelength = 0.55
lc_R.band = "R"; lc_R.wavelength = 0.64
lc_I.band = "I"; lc_I.wavelength = 0.80
combined = pgmuvi.lightcurve.Lightcurve.concat([lc_V, lc_R, lc_I])
Both methods accept on_conflict="skip" to drop duplicate bands and emit a
UserWarning rather than raising an error.
Concatenating arrays before construction
For simple cases where band information is not needed, concatenate the NumPy
arrays before constructing the Lightcurve:
import numpy as np
import pgmuvi
all_times = np.concatenate([times, new_times])
all_fluxes = np.concatenate([fluxes, new_fluxes])
all_errors = np.concatenate([errors, new_errors])
lc = pgmuvi.lightcurve.Lightcurve(all_times, all_fluxes, all_errors)
Note
For 2D / multiband data, xdata must have shape (N, 2) with column 0
being time and column 1 being a numeric wavelength. See Multiwavelength (2D) Analysis.
Data Transformations¶
GP optimisation can be sensitive to the scale of the input data. pgmuvi
provides built-in transformations to rescale the time and flux axes:
Transform |
Description |
|---|---|
|
Rescale to [0, 1] using min and max. |
|
Standardise to zero mean, unit variance. |
|
Standardise using median and MAD (median absolute deviation; robust to outliers). |
Apply a transformation at construction time via the xtransform and ytransform
keyword arguments:
lc = pgmuvi.lightcurve.Lightcurve(
times, fluxes, errors,
xtransform="minmax",
ytransform="zscore",
)
The GP is trained in the transformed space, but all results and plots are automatically inverse-transformed back to the original units.
Working with Magnitudes¶
Native magnitude support is planned for a future release but is not currently
available. If your data are in magnitudes, convert them to (relative) flux
before constructing the Lightcurve. A common
choice is:
In code:
import numpy as np
import pgmuvi
# mags and mag_errors are your input magnitudes and uncertainties
fluxes = 10 ** (-0.4 * mags)
errors = fluxes * np.log(10) * 0.4 * mag_errors
lc = pgmuvi.lightcurve.Lightcurve(times, fluxes, errors)
Only relative variations matter for most pgmuvi analyses, so the overall
flux normalisation is arbitrary.
Checking Data Quality¶
Before fitting, assess whether the observations are sufficient to detect the variability timescales you are interested in:
lc.assess_sampling_quality()
See Data Preprocessing for more detail on sampling quality metrics and filtering.
Exporting Data¶
The loaded data can be exported to an Astropy table or a VO Table file:
table = lc.to_table()
lc.write_votable("lightcurve_output.xml")