calpipe — calibration pipeline

calpipe orchestrates NenuFAR calibration. It is a wrapper around DP3 (DPPP) and several supporting tools (nenupy, losoto, AOFlagger) driven by a single TOML configuration file.

Basic usage

calpipe config.toml SB*.MS

Multiple MS paths can be given; calpipe dispatches each one as an independent job to the worker pool.

How it works

A calpipe config lists the ordered pipeline steps in steps and has one TOML section per step that overrides the defaults for that task:

steps = ['restore_flags', 'build_sky_model', 'ddecal', 'subtract', 'apply_cal']

[ddecal]
cal.parmdb  = 'instrument.h5'
cal.sol_int = 20
cal.uv_min  = 20

[subtract]
type    = 'subtract'
col_out = 'CORRECTED_DATA'

[apply_cal]
col_in  = 'CORRECTED_DATA'
cal.parmdb = 'instrument.h5'

Every parameter has a default (see Global configuration and Task reference); you only need to override what differs from the defaults.

A step name is also its task type. To run the same task type under a different name, set type explicitly:

steps = ['subtract_ateam', 'subtract_cyga']

[subtract_ateam]
type       = 'subtract'
directions = '!Main'

[subtract_cyga]
type       = 'subtract'
directions = 'CygA'

The sky model

Calibration works by comparing the observed visibilities against a model of the sky, and solving for the instrumental gains that reconcile the two. The ddecal, predict, subtract, and peel tasks all need such a model.

A sky model comes in two flavours:

  • intrinsic — the true flux of the sources on the sky;

  • apparent — the intrinsic model seen through the primary beam, i.e. what the instrument actually measures. DP3 calibrates against the apparent model.

The build_sky_model task turns an intrinsic model into the apparent one by applying the NenuFAR beam, and writes the result inside each MS (a .skymodel plus a DP3 .sourcedb) under the name given by app_sky_model_name. The later tasks then reference that name — so build_sky_model normally runs once, before ddecal.

Where the intrinsic model comes from is set by int_sky_model in the sky_model section:

  • a cataloglcs165 or specfind — queried around the field within catalog_radius, keeping sources brighter than min_flux;

  • a .skymodel file you provide.

A-team sources (the handful of very bright off-axis sources: CasA, CygA, TauA, VirA) are added on top when add_ateam is true, using the intrinsic model int_ateam_sky_model ('lowres' for the built-in one, or a file) and filtered by ateam_min_elevation. Including them lets the solver account for — and later subtract — their contamination.

If you already have an apparent model, set app_sky_model_file instead and build_sky_model simply copies it (optionally one file per frequency), skipping the catalog/beam step entirely.

A typical catalog-based configuration:

steps = ['build_sky_model', 'ddecal', 'subtract', 'apply_cal']

[sky_model]
int_sky_model = 'lcs165'           # or a path to a .skymodel file

[build_sky_model]
catalog_radius      = 20           # deg around the field centre
min_flux            = 0.5          # Jy
add_ateam           = true
ateam_always_keep   = ['CasA', 'CygA']
ateam_min_elevation = 10

To reuse a ready-made apparent model instead of building one:

steps = ['ddecal', 'subtract', 'apply_cal']   # no build_sky_model step

[sky_model]
app_sky_model_file = '/path/to/apparent.skymodel'

See sky_model (global options) and build_sky_model (task options) for the full parameter tables.

With a data-handler

If [data_handler] is configured, obs_ids or glob patterns can be used instead of explicit MS paths:

[data_handler]
config_file = 'data_handler.toml'   # path to the data-handler config
data_level  = 'L2_BP'               # data level to calibrate
calpipe config.toml "202312*_NT04"
calpipe config.toml "20231208_NT04:SW03"   # restrict to SW03

See Global configuration for the [data_handler] section.

Available tasks

Task

Description

build_sky_model

Build apparent sky model from catalog or intrinsic model

ddecal

Direction-dependent calibration with DP3 DDE-Cal

apply_cal

Apply gain solutions with DP3 ApplyCal

subtract

Subtract patches with DP3

predict

Predict model visibilities with DP3

restore_flags

Save / restore the FLAG column checkpoint

flagger

Pre/post-calibration flagging (AOFlagger, SSINS, bad baselines, …)

peel

Advanced iterative peeling of bright off-axis sources

multims_smooth_sol

Smooth gain solutions across multiple MSs

Detailed parameter tables for each task are in Task reference.

Typical workflows

Calibration-only (no sky model build)

steps = ['restore_flags', 'ddecal', 'subtract', 'apply_cal']

[ddecal]
cal.parmdb = 'instrument.h5'
cal.uv_min = 20
cal.sol_int = 30

[subtract]
type    = 'subtract'
col_out = 'CORRECTED_DATA'

[apply_cal]
col_in     = 'CORRECTED_DATA'
cal.parmdb = 'instrument.h5'

With sky model build (first run)

steps = ['build_sky_model', 'ddecal', 'subtract', 'apply_cal']

[build_sky_model]
min_flux_path      = 15
add_ateam          = true
ateam_always_keep  = ['CasA', 'CygA']
ateam_min_elevation = 10

Dry-run on a cluster

[worker]
nodes          = 'node[101-110]'
max_concurrent = 4
dry_run        = true
env_file       = '/home/user/.bashrc'

Global configuration

Global sections apply to the whole pipeline run, not to individual tasks. They can be omitted — defaults are loaded from default_settings.toml.

worker

Controls the distributed execution engine.

Parameter

Default

Description

nodes

'localhost'

Comma-separated node list or range expression (node[101-110] expands to 10 nodes)

max_concurrent

4

Maximum parallel jobs per node

env_file

''

Shell file sourced before each job (e.g. ~/.bashrc)

dry_run

false

Print commands without executing them

debug

false

Enable verbose worker logging

run_on_file_host

false

Route each job to the node that holds the MS file

run_on_file_host_pattern

''

Regex with a capture group that extracts the hostname from the MS path (requires run_on_file_host = true)

numthreads

0

DP3 thread count per job (0 = DP3 default)

Example

[worker]
nodes          = 'node[101-110]'
max_concurrent = 4
env_file       = '/home/user/.bashrc'
run_on_file_host = true
run_on_file_host_pattern = '/net/([^/]+)/'

sky_model

Defines the sky model files shared across tasks.

Parameter

Default

Description

int_sky_model

'lcs165'

Intrinsic sky model: a filename, or one of lcs165 / specfind to fetch from a catalog service

int_ateam_sky_model

'lowres'

Intrinsic A-team sky model: a filename, or 'lowres' for the built-in low-resolution model

app_sky_model_name

'app_sky_model'

Base name (without extension) of the apparent sky model written inside each MS

app_sky_model_file

''

If set, skip the catalog fetch and copy this file as the apparent sky model. Supports {MSIN} token and per-frequency dict

Using a pre-built apparent sky model

[sky_model]
app_sky_model_file = '/path/to/apparent.skymodel'

Per-frequency sky model (dict form)

[sky_model.app_sky_model_file]
# keys are frequency in MHz; the first key >= the MS centre frequency is used
150 = '/models/app_150mhz.skymodel'
185 = '/models/app_185mhz.skymodel'

data_handler

Enables obs_id–based input resolution via a data_handler.toml file. When this section is present and config_file is non-empty, the positional arguments to calpipe are treated as obs_id patterns rather than MS paths.

Parameter

Default

Description

config_file

''

Path to the data-handler TOML config; leave empty to use explicit MS paths

data_level

'L2'

Data level to resolve (must be defined in [data_level_path])

Example

[data_handler]
config_file = 'data_handler.toml'
data_level  = 'L2_BP'

Then invoke:

calpipe calibration.toml "202312*_NT04"
calpipe calibration.toml "20231208_NT04:SW03,SW04"

The obs_id pattern may include a spectral window filter after a colon (obs_id_pattern:SW_pattern).


Task reference

Each task corresponds to a TOML section in the config file. All parameters have defaults; only override what you need.

A step uses its own name as the task type unless you set type explicitly:

[my_subtract]
type    = 'subtract'    # task type is 'subtract', not 'my_subtract'
col_out = 'CORRECTED_DATA'

build_sky_model

Builds an apparent sky model from an intrinsic catalog or file and writes it into each MS under <MS>/sky_model/<app_sky_model_name>.skymodel and a SourceDB .sourcedb. If add_ateam is true, A-team patches are appended.

Parameter

Default

Description

catalog_radius

20

Search radius in degrees around the MS phase centre

min_flux

0.5

Minimum apparent point-source flux (Jy)

min_flux_path

15

Minimum apparent patch flux (Jy)

add_ateam

true

Append A-team patches (CasA, CygA, TauA, VirA)

ateam_always_keep

['CasA', 'CygA']

Keep these A-team patches even if below min_flux_path

ateam_remove

[]

Never add these A-team sources

ateam_min_elevation

10

Skip A-team patches below this elevation (degrees)

Example

[build_sky_model]
min_flux_path      = 20
add_ateam          = true
ateam_always_keep  = ['CasA']
ateam_min_elevation = 15

ddecal

Runs DP3 DDECal to compute direction-dependent gain solutions. Optionally averages the data before calibrating, smooths solutions afterwards, and produces diagnostic plots.

Data and directions

Parameter

Default

Description

col_in

'DATA'

Input visibility column

directions

'all'

Patch names to calibrate on, or 'all' for every patch in the sky model

avg.time

1

Time averaging factor before calibration

avg.freq

1

Frequency averaging factor before calibration

Solver

Parameter

Default

Description

cal.parmdb

'instrument_dde.h5'

Output H5Parm filename (relative to each MS)

cal.sol_int

20

Solution interval in time slots

cal.mode

'diagonal'

Calibration mode: 'diagonal' or 'fulljones'

cal.uv_min

10

Minimum baseline length in wavelengths

cal.smoothnessconstraint

4e6

Spectral smoothness constraint kernel size in Hz (0 = disabled)

cal.solveralgorithm

'directionsolve'

DP3 solver algorithm (see DP3 docs)

cal.solutions_per_direction

{}

Sub-solutions per interval per direction, e.g. cal.solutions_per_direction.CasA = 3

cal.extra

{}

Additional DP3 DDECal parameters passed verbatim, e.g. cal.extra.maxiter = 50

Solution smoothing

Parameter

Default

Description

do_smooth_sol

true

Smooth solutions in time and frequency after calibration

smooth_sol.time_min

15

Gaussian FWHM in minutes (non-Main directions)

smooth_sol.freq_mhz

1

Gaussian FWHM in MHz (non-Main directions)

smooth_sol.main_time_min

20

Gaussian FWHM in minutes for the Main direction

smooth_sol.main_freq_mhz

4

Gaussian FWHM in MHz for the Main direction

smooth_sol.clip_nsigma

4

Clip solutions more than this many sigma above the median

Diagnostic plots

Parameter

Default

Description

plot_sol

true

Write amplitude/phase plots to <MS>/plots_<parmdb_name>/

Example

[ddecal]
col_in      = 'DATA'
directions  = 'all'
cal.parmdb  = 'instrument.h5'
cal.sol_int = 30
cal.uv_min  = 20
cal.smoothnessconstraint = 2e6
cal.extra.maxiter = 50
smooth_sol.time_min = 10

apply_cal

Applies gain solutions from an H5Parm using DP3 ApplyCal.

Parameter

Default

Description

col_in

'DATA'

Input column

col_out

'CORRECTED_DATA'

Output column

direction

'Main'

Which direction’s solutions to apply (for multi-direction H5Parm)

cal.parmdb

'instrument_dde.h5'

H5Parm file to read

cal.mode

'diagonal'

Solution type to apply: 'diagonal' or 'fulljones'

Example

[apply_cal]
col_in     = 'CORRECTED_DATA'
col_out    = 'CORRECTED_DATA'
direction  = 'Main'
cal.parmdb = 'instrument.h5'

subtract

Subtracts model visibilities for selected sky-model directions using DP3.

Parameter

Default

Description

col_in

'DATA'

Input column

col_out

'CORRECTED_DATA'

Output column

directions

'!Main'

Directions to subtract. Use 'all' for everything, a list ['CygA', 'CasA'], or '!Main' for all except Main

cal.parmdb

'instrument_dde.h5'

H5Parm with gain solutions for each direction

cal.mode

'diagonal'

Solution type: 'diagonal' or 'fulljones'

Example

[subtract_ateam]
type       = 'subtract'
col_in     = 'DATA'
col_out    = 'CORRECTED_DATA'
directions = '!Main'
cal.parmdb = 'instrument.h5'

predict

Predicts model visibilities into a data column using DP3.

Parameter

Default

Description

col_out

'DATA'

Output column

directions

'Main'

Directions to predict

cal.parmdb

''

H5Parm for gain correction during prediction (empty = no correction)

cal.mode

'diagonal'

Solution type

restore_flags

Checkpoints the FLAG column. On first call the current flags are saved to flag_name; on subsequent calls (when the file already exists) the saved flags are restored. This ensures calibration starts from a clean, repeatable flag state.

Parameter

Default

Description

flag_name

'pre_cal_flags.h5'

Path (relative to each MS) for the flag checkpoint file

Example

[restore_flags]
flag_name = 'flags_before_cal.h5'

flagger

Runs one or more flagging algorithms in sequence. Each sub-flagger is independently enabled/disabled.

AOFlagger

Parameter

Default

Description

do_aoflagger

false

Run AOFlagger

aoflagger.strategy

'nenufar_1s1c'

AOFlagger strategy file (name or path)

aoflagger.data_col

'CORRECTED_DATA'

Column to flag

SSINS

Parameter

Default

Description

do_ssins

false

Run the SSINS flagger

ssins.seetings

'default'

SSINS settings file (note: parameter name has a typo inherited from the original code)

Bad baselines / stations

Parameter

Default

Description

do_badbaselines

false

Detect and flag outlier baselines/stations via AOQuality

badbaselines.nsigma_stations

5

Flag stations whose amplitude deviates more than this many sigma

badbaselines.nsigma_baselines

8

Flag baselines whose amplitude deviates more than this many sigma

Manual baseline flagging

Parameter

Default

Description

do_baselinesflag

false

Flag specific baselines/stations

baselinesflag.baselines

''

DP3 baseline string, e.g. 'CS001LBA&&;RS208LBA&&' (applied to all MSs)

baselinesflag.baselines_from_file

''

Path to a file mapping obs_ids to baseline strings — text or JSON format (see below)

baselines_from_file accepts two formats, detected automatically by extension:

  • Text (.txt or any other extension): one <obs_id> <baselines> pair per line, where <baselines> is a DP3-formatted string such as MR003&&*;MR017&&*. Lines starting with # are ignored.

  • JSON (.json): the output of nenudata bad-stations or aostats find-bad-stations. Keys are obs_ids; values are lists of bare antenna names. The &&* suffix is appended automatically. Keys starting with _ (e.g. _meta) are ignored.

# JSON format — points at the file managed by nenudata bad-stations
baselinesflag.baselines_from_file = 'bad_stations.json'

See nenudata — bad stations for the full management workflow.

Frequency flagging

Parameter

Default

Description

do_flagfreq

false

Flag a fixed frequency range

flagfreq.fmhz_range

[0, 200]

[start_MHz, end_MHz] range to flag

Scan flagging

Parameter

Default

Description

do_scans_flagging

false

Flag time scans with anomalously high residuals

scans_flagging.nsigma_scans

5

Sigma threshold above which a scan is flagged

Example

[flagger]
do_aoflagger           = true
aoflagger.strategy     = 'nenufar_1s1c'
aoflagger.data_col     = 'CORRECTED_DATA'
do_badbaselines        = true
badbaselines.nsigma_stations  = 5
badbaselines.nsigma_baselines = 8
do_baselinesflag       = true
baselinesflag.baselines_from_file = 'bad_baselines.txt'

peel

Iterative peeling of bright off-axis sources. The MS is first copied with a postfix, then each source is phase-shifted, calibrated, and subtracted in turn.

Parameter

Default

Description

ms_postfix

'PEEL'

Suffix appended to the MS name for the peeling copy

init.parmdb

'instrument_dde.h5'

H5Parm with the initial DD calibration

init.mode

'diagonal'

Solution type of the initial calibration

cal.sol_int_flux_per_slot_per_sec

75000

Scales the solution interval by flux × integration_time; set to 0 to use cal.sol_int directly

cal.sol_int_min

2

Minimum solution interval

cal.sol_int_max

120

Maximum solution interval

cal.mode

'diagonal'

Calibration mode for peeling

cal.uv_min

10

Minimum baseline length in wavelengths

cal.extra

{}

Additional DP3 DDECal parameters

do_phase_shift

true

Phase-shift to each source before calibrating

phase_shift.time_avg

4

Time averaging after phase shift

phase_shift.freq_avg

1

Frequency averaging after phase shift

do_smooth_sol

true

Smooth solutions after each peel iteration

smooth_sol.time_min

15

FWHM in minutes for solution smoothing

smooth_sol.freq_mhz

2

FWHM in MHz for solution smoothing

multims_smooth_sol

Smooths gain solutions across multiple MSs jointly (e.g. across spectral windows or time chunks from the same night), then writes the result to a new H5Parm.

Parameter

Default

Description

parmdb_in

'instrument_init.h5'

Input H5Parm (relative to each MS)

parmdb_out

'instrument_smooth.h5'

Output H5Parm (relative to each MS)

plot_dir

'smooth_sol'

Directory for diagnostic plots

Example

steps = ['ddecal', 'multims_smooth_sol', 'apply_cal']

[ddecal]
cal.parmdb    = 'instrument_init.h5'
do_smooth_sol = false   # skip per-MS smoothing; smooth jointly instead

[multims_smooth_sol]
parmdb_in  = 'instrument_init.h5'
parmdb_out = 'instrument_smooth.h5'

[apply_cal]
cal.parmdb = 'instrument_smooth.h5'