# calpipe — calibration pipeline `calpipe` orchestrates NenuFAR calibration. It is a wrapper around DP3 (DPPP) and several supporting tools (nenupy, losoto, AOFlagger) driven by a single TOML configuration file. ## Basic usage ```bash calpipe config.toml SB*.MS ``` Multiple MS paths can be given; `calpipe` dispatches each one as an independent job to the worker pool. ## How it works A calpipe config lists the ordered pipeline steps in `steps` and has one TOML section per step that overrides the defaults for that task: ```toml steps = ['restore_flags', 'build_sky_model', 'ddecal', 'subtract', 'apply_cal'] [ddecal] cal.parmdb = 'instrument.h5' cal.sol_int = 20 cal.uv_min = 20 [subtract] type = 'subtract' col_out = 'CORRECTED_DATA' [apply_cal] col_in = 'CORRECTED_DATA' cal.parmdb = 'instrument.h5' ``` Every parameter has a default (see [Global configuration](#global-configuration) and [Task reference](#task-reference)); you only need to override what differs from the defaults. A step name is also its task type. To run the same task type under a different name, set `type` explicitly: ```toml steps = ['subtract_ateam', 'subtract_cyga'] [subtract_ateam] type = 'subtract' directions = '!Main' [subtract_cyga] type = 'subtract' directions = 'CygA' ``` (sky-model)= ## The sky model Calibration works by comparing the observed visibilities against a **model of the sky**, and solving for the instrumental gains that reconcile the two. The `ddecal`, `predict`, `subtract`, and `peel` tasks all need such a model. A sky model comes in two flavours: - **intrinsic** — the true flux of the sources on the sky; - **apparent** — the intrinsic model seen *through the primary beam*, i.e. what the instrument actually measures. DP3 calibrates against the apparent model. The **`build_sky_model`** task turns an intrinsic model into the apparent one by applying the NenuFAR beam, and writes the result inside each MS (a `.skymodel` plus a DP3 `.sourcedb`) under the name given by `app_sky_model_name`. The later tasks then reference that name — so `build_sky_model` normally runs once, before `ddecal`. Where the intrinsic model comes from is set by `int_sky_model` in the [`sky_model`](#sky_model) section: - a **catalog** — `lcs165` or `specfind` — queried around the field within `catalog_radius`, keeping sources brighter than `min_flux`; - a **`.skymodel` file** you provide. **A-team sources** (the handful of very bright off-axis sources: CasA, CygA, TauA, VirA) are added on top when `add_ateam` is true, using the intrinsic model `int_ateam_sky_model` (`'lowres'` for the built-in one, or a file) and filtered by `ateam_min_elevation`. Including them lets the solver account for — and later subtract — their contamination. If you already have an apparent model, set `app_sky_model_file` instead and `build_sky_model` simply copies it (optionally one file per frequency), skipping the catalog/beam step entirely. A typical catalog-based configuration: ```toml steps = ['build_sky_model', 'ddecal', 'subtract', 'apply_cal'] [sky_model] int_sky_model = 'lcs165' # or a path to a .skymodel file [build_sky_model] catalog_radius = 20 # deg around the field centre min_flux = 0.5 # Jy add_ateam = true ateam_always_keep = ['CasA', 'CygA'] ateam_min_elevation = 10 ``` To reuse a ready-made apparent model instead of building one: ```toml steps = ['ddecal', 'subtract', 'apply_cal'] # no build_sky_model step [sky_model] app_sky_model_file = '/path/to/apparent.skymodel' ``` See [`sky_model`](#sky_model) (global options) and [`build_sky_model`](#build_sky_model) (task options) for the full parameter tables. ## With a data-handler If `[data_handler]` is configured, obs_ids or glob patterns can be used instead of explicit MS paths: ```toml [data_handler] config_file = 'data_handler.toml' # path to the data-handler config data_level = 'L2_BP' # data level to calibrate ``` ```bash calpipe config.toml "202312*_NT04" calpipe config.toml "20231208_NT04:SW03" # restrict to SW03 ``` See [Global configuration](#global-configuration) for the `[data_handler]` section. ## Available tasks | Task | Description | |---|---| | `build_sky_model` | Build apparent sky model from catalog or intrinsic model | | `ddecal` | Direction-dependent calibration with DP3 DDE-Cal | | `apply_cal` | Apply gain solutions with DP3 ApplyCal | | `subtract` | Subtract patches with DP3 | | `predict` | Predict model visibilities with DP3 | | `restore_flags` | Save / restore the FLAG column checkpoint | | `flagger` | Pre/post-calibration flagging (AOFlagger, SSINS, bad baselines, …) | | `peel` | Advanced iterative peeling of bright off-axis sources | | `multims_smooth_sol` | Smooth gain solutions across multiple MSs | Detailed parameter tables for each task are in [Task reference](#task-reference). ## Typical workflows ### Calibration-only (no sky model build) ```toml steps = ['restore_flags', 'ddecal', 'subtract', 'apply_cal'] [ddecal] cal.parmdb = 'instrument.h5' cal.uv_min = 20 cal.sol_int = 30 [subtract] type = 'subtract' col_out = 'CORRECTED_DATA' [apply_cal] col_in = 'CORRECTED_DATA' cal.parmdb = 'instrument.h5' ``` ### With sky model build (first run) ```toml steps = ['build_sky_model', 'ddecal', 'subtract', 'apply_cal'] [build_sky_model] min_flux_path = 15 add_ateam = true ateam_always_keep = ['CasA', 'CygA'] ateam_min_elevation = 10 ``` ### Dry-run on a cluster ```toml [worker] nodes = 'node[101-110]' max_concurrent = 4 dry_run = true env_file = '/home/user/.bashrc' ``` --- (global-configuration)= ## Global configuration Global sections apply to the whole pipeline run, not to individual tasks. They can be omitted — defaults are loaded from `default_settings.toml`. ### `worker` Controls the distributed execution engine. | Parameter | Default | Description | |---|---|---| | `nodes` | `'localhost'` | Comma-separated node list or range expression (`node[101-110]` expands to 10 nodes) | | `max_concurrent` | `4` | Maximum parallel jobs per node | | `env_file` | `''` | Shell file sourced before each job (e.g. `~/.bashrc`) | | `dry_run` | `false` | Print commands without executing them | | `debug` | `false` | Enable verbose worker logging | | `run_on_file_host` | `false` | Route each job to the node that holds the MS file | | `run_on_file_host_pattern` | `''` | Regex with a capture group that extracts the hostname from the MS path (requires `run_on_file_host = true`) | | `numthreads` | `0` | DP3 thread count per job (0 = DP3 default) | #### Example ```toml [worker] nodes = 'node[101-110]' max_concurrent = 4 env_file = '/home/user/.bashrc' run_on_file_host = true run_on_file_host_pattern = '/net/([^/]+)/' ``` (sky_model)= ### `sky_model` Defines the sky model files shared across tasks. | Parameter | Default | Description | |---|---|---| | `int_sky_model` | `'lcs165'` | Intrinsic sky model: a filename, or one of `lcs165` / `specfind` to fetch from a catalog service | | `int_ateam_sky_model` | `'lowres'` | Intrinsic A-team sky model: a filename, or `'lowres'` for the built-in low-resolution model | | `app_sky_model_name` | `'app_sky_model'` | Base name (without extension) of the apparent sky model written inside each MS | | `app_sky_model_file` | `''` | If set, skip the catalog fetch and copy this file as the apparent sky model. Supports `{MSIN}` token and per-frequency dict | #### Using a pre-built apparent sky model ```toml [sky_model] app_sky_model_file = '/path/to/apparent.skymodel' ``` #### Per-frequency sky model (dict form) ```toml [sky_model.app_sky_model_file] # keys are frequency in MHz; the first key >= the MS centre frequency is used 150 = '/models/app_150mhz.skymodel' 185 = '/models/app_185mhz.skymodel' ``` ### `data_handler` Enables obs_id–based input resolution via a [data_handler.toml](../nenudata/config.md) file. When this section is present and `config_file` is non-empty, the positional arguments to `calpipe` are treated as obs_id patterns rather than MS paths. | Parameter | Default | Description | |---|---|---| | `config_file` | `''` | Path to the data-handler TOML config; leave empty to use explicit MS paths | | `data_level` | `'L2'` | Data level to resolve (must be defined in `[data_level_path]`) | #### Example ```toml [data_handler] config_file = 'data_handler.toml' data_level = 'L2_BP' ``` Then invoke: ```bash calpipe calibration.toml "202312*_NT04" calpipe calibration.toml "20231208_NT04:SW03,SW04" ``` The obs_id pattern may include a spectral window filter after a colon (`obs_id_pattern:SW_pattern`). --- (task-reference)= ## Task reference Each task corresponds to a TOML section in the config file. All parameters have defaults; only override what you need. A step uses its own name as the task type unless you set `type` explicitly: ```toml [my_subtract] type = 'subtract' # task type is 'subtract', not 'my_subtract' col_out = 'CORRECTED_DATA' ``` (build_sky_model)= ### `build_sky_model` Builds an apparent sky model from an intrinsic catalog or file and writes it into each MS under `/sky_model/.skymodel` and a SourceDB `.sourcedb`. If `add_ateam` is true, A-team patches are appended. | Parameter | Default | Description | |---|---|---| | `catalog_radius` | `20` | Search radius in degrees around the MS phase centre | | `min_flux` | `0.5` | Minimum apparent point-source flux (Jy) | | `min_flux_path` | `15` | Minimum apparent patch flux (Jy) | | `add_ateam` | `true` | Append A-team patches (CasA, CygA, TauA, VirA) | | `ateam_always_keep` | `['CasA', 'CygA']` | Keep these A-team patches even if below `min_flux_path` | | `ateam_remove` | `[]` | Never add these A-team sources | | `ateam_min_elevation` | `10` | Skip A-team patches below this elevation (degrees) | #### Example ```toml [build_sky_model] min_flux_path = 20 add_ateam = true ateam_always_keep = ['CasA'] ateam_min_elevation = 15 ``` ### `ddecal` Runs DP3 DDECal to compute direction-dependent gain solutions. Optionally averages the data before calibrating, smooths solutions afterwards, and produces diagnostic plots. #### Data and directions | Parameter | Default | Description | |---|---|---| | `col_in` | `'DATA'` | Input visibility column | | `directions` | `'all'` | Patch names to calibrate on, or `'all'` for every patch in the sky model | | `avg.time` | `1` | Time averaging factor before calibration | | `avg.freq` | `1` | Frequency averaging factor before calibration | #### Solver | Parameter | Default | Description | |---|---|---| | `cal.parmdb` | `'instrument_dde.h5'` | Output H5Parm filename (relative to each MS) | | `cal.sol_int` | `20` | Solution interval in time slots | | `cal.mode` | `'diagonal'` | Calibration mode: `'diagonal'` or `'fulljones'` | | `cal.uv_min` | `10` | Minimum baseline length in wavelengths | | `cal.smoothnessconstraint` | `4e6` | Spectral smoothness constraint kernel size in Hz (0 = disabled) | | `cal.solveralgorithm` | `'directionsolve'` | DP3 solver algorithm (see DP3 docs) | | `cal.solutions_per_direction` | `{}` | Sub-solutions per interval per direction, e.g. `cal.solutions_per_direction.CasA = 3` | | `cal.extra` | `{}` | Additional DP3 DDECal parameters passed verbatim, e.g. `cal.extra.maxiter = 50` | #### Solution smoothing | Parameter | Default | Description | |---|---|---| | `do_smooth_sol` | `true` | Smooth solutions in time and frequency after calibration | | `smooth_sol.time_min` | `15` | Gaussian FWHM in minutes (non-Main directions) | | `smooth_sol.freq_mhz` | `1` | Gaussian FWHM in MHz (non-Main directions) | | `smooth_sol.main_time_min` | `20` | Gaussian FWHM in minutes for the `Main` direction | | `smooth_sol.main_freq_mhz` | `4` | Gaussian FWHM in MHz for the `Main` direction | | `smooth_sol.clip_nsigma` | `4` | Clip solutions more than this many sigma above the median | #### Diagnostic plots | Parameter | Default | Description | |---|---|---| | `plot_sol` | `true` | Write amplitude/phase plots to `/plots_/` | #### Example ```toml [ddecal] col_in = 'DATA' directions = 'all' cal.parmdb = 'instrument.h5' cal.sol_int = 30 cal.uv_min = 20 cal.smoothnessconstraint = 2e6 cal.extra.maxiter = 50 smooth_sol.time_min = 10 ``` ### `apply_cal` Applies gain solutions from an H5Parm using DP3 ApplyCal. | Parameter | Default | Description | |---|---|---| | `col_in` | `'DATA'` | Input column | | `col_out` | `'CORRECTED_DATA'` | Output column | | `direction` | `'Main'` | Which direction's solutions to apply (for multi-direction H5Parm) | | `cal.parmdb` | `'instrument_dde.h5'` | H5Parm file to read | | `cal.mode` | `'diagonal'` | Solution type to apply: `'diagonal'` or `'fulljones'` | #### Example ```toml [apply_cal] col_in = 'CORRECTED_DATA' col_out = 'CORRECTED_DATA' direction = 'Main' cal.parmdb = 'instrument.h5' ``` ### `subtract` Subtracts model visibilities for selected sky-model directions using DP3. | Parameter | Default | Description | |---|---|---| | `col_in` | `'DATA'` | Input column | | `col_out` | `'CORRECTED_DATA'` | Output column | | `directions` | `'!Main'` | Directions to subtract. Use `'all'` for everything, a list `['CygA', 'CasA']`, or `'!Main'` for all except `Main` | | `cal.parmdb` | `'instrument_dde.h5'` | H5Parm with gain solutions for each direction | | `cal.mode` | `'diagonal'` | Solution type: `'diagonal'` or `'fulljones'` | #### Example ```toml [subtract_ateam] type = 'subtract' col_in = 'DATA' col_out = 'CORRECTED_DATA' directions = '!Main' cal.parmdb = 'instrument.h5' ``` ### `predict` Predicts model visibilities into a data column using DP3. | Parameter | Default | Description | |---|---|---| | `col_out` | `'DATA'` | Output column | | `directions` | `'Main'` | Directions to predict | | `cal.parmdb` | `''` | H5Parm for gain correction during prediction (empty = no correction) | | `cal.mode` | `'diagonal'` | Solution type | ### `restore_flags` Checkpoints the FLAG column. On first call the current flags are saved to `flag_name`; on subsequent calls (when the file already exists) the saved flags are restored. This ensures calibration starts from a clean, repeatable flag state. | Parameter | Default | Description | |---|---|---| | `flag_name` | `'pre_cal_flags.h5'` | Path (relative to each MS) for the flag checkpoint file | #### Example ```toml [restore_flags] flag_name = 'flags_before_cal.h5' ``` ### `flagger` Runs one or more flagging algorithms in sequence. Each sub-flagger is independently enabled/disabled. #### AOFlagger | Parameter | Default | Description | |---|---|---| | `do_aoflagger` | `false` | Run AOFlagger | | `aoflagger.strategy` | `'nenufar_1s1c'` | AOFlagger strategy file (name or path) | | `aoflagger.data_col` | `'CORRECTED_DATA'` | Column to flag | #### SSINS | Parameter | Default | Description | |---|---|---| | `do_ssins` | `false` | Run the SSINS flagger | | `ssins.seetings` | `'default'` | SSINS settings file (note: parameter name has a typo inherited from the original code) | #### Bad baselines / stations | Parameter | Default | Description | |---|---|---| | `do_badbaselines` | `false` | Detect and flag outlier baselines/stations via AOQuality | | `badbaselines.nsigma_stations` | `5` | Flag stations whose amplitude deviates more than this many sigma | | `badbaselines.nsigma_baselines` | `8` | Flag baselines whose amplitude deviates more than this many sigma | (manual-baseline-flagging)= #### Manual baseline flagging | Parameter | Default | Description | |---|---|---| | `do_baselinesflag` | `false` | Flag specific baselines/stations | | `baselinesflag.baselines` | `''` | DP3 baseline string, e.g. `'CS001LBA&&;RS208LBA&&'` (applied to all MSs) | | `baselinesflag.baselines_from_file` | `''` | Path to a file mapping obs_ids to baseline strings — text or JSON format (see below) | `baselines_from_file` accepts two formats, detected automatically by extension: - **Text** (`.txt` or any other extension): one ` ` pair per line, where `` is a DP3-formatted string such as `MR003&&*;MR017&&*`. Lines starting with `#` are ignored. - **JSON** (`.json`): the output of `nenudata bad-stations` or `aostats find-bad-stations`. Keys are obs_ids; values are lists of bare antenna names. The `&&*` suffix is appended automatically. Keys starting with `_` (e.g. `_meta`) are ignored. ```toml # JSON format — points at the file managed by nenudata bad-stations baselinesflag.baselines_from_file = 'bad_stations.json' ``` See [nenudata — bad stations](../nenudata/bad_stations.md) for the full management workflow. #### Frequency flagging | Parameter | Default | Description | |---|---|---| | `do_flagfreq` | `false` | Flag a fixed frequency range | | `flagfreq.fmhz_range` | `[0, 200]` | `[start_MHz, end_MHz]` range to flag | #### Scan flagging | Parameter | Default | Description | |---|---|---| | `do_scans_flagging` | `false` | Flag time scans with anomalously high residuals | | `scans_flagging.nsigma_scans` | `5` | Sigma threshold above which a scan is flagged | #### Example ```toml [flagger] do_aoflagger = true aoflagger.strategy = 'nenufar_1s1c' aoflagger.data_col = 'CORRECTED_DATA' do_badbaselines = true badbaselines.nsigma_stations = 5 badbaselines.nsigma_baselines = 8 do_baselinesflag = true baselinesflag.baselines_from_file = 'bad_baselines.txt' ``` ### `peel` Iterative peeling of bright off-axis sources. The MS is first copied with a postfix, then each source is phase-shifted, calibrated, and subtracted in turn. | Parameter | Default | Description | |---|---|---| | `ms_postfix` | `'PEEL'` | Suffix appended to the MS name for the peeling copy | | `init.parmdb` | `'instrument_dde.h5'` | H5Parm with the initial DD calibration | | `init.mode` | `'diagonal'` | Solution type of the initial calibration | | `cal.sol_int_flux_per_slot_per_sec` | `75000` | Scales the solution interval by `flux × integration_time`; set to 0 to use `cal.sol_int` directly | | `cal.sol_int_min` | `2` | Minimum solution interval | | `cal.sol_int_max` | `120` | Maximum solution interval | | `cal.mode` | `'diagonal'` | Calibration mode for peeling | | `cal.uv_min` | `10` | Minimum baseline length in wavelengths | | `cal.extra` | `{}` | Additional DP3 DDECal parameters | | `do_phase_shift` | `true` | Phase-shift to each source before calibrating | | `phase_shift.time_avg` | `4` | Time averaging after phase shift | | `phase_shift.freq_avg` | `1` | Frequency averaging after phase shift | | `do_smooth_sol` | `true` | Smooth solutions after each peel iteration | | `smooth_sol.time_min` | `15` | FWHM in minutes for solution smoothing | | `smooth_sol.freq_mhz` | `2` | FWHM in MHz for solution smoothing | ### `multims_smooth_sol` Smooths gain solutions across multiple MSs jointly (e.g. across spectral windows or time chunks from the same night), then writes the result to a new H5Parm. | Parameter | Default | Description | |---|---|---| | `parmdb_in` | `'instrument_init.h5'` | Input H5Parm (relative to each MS) | | `parmdb_out` | `'instrument_smooth.h5'` | Output H5Parm (relative to each MS) | | `plot_dir` | `'smooth_sol'` | Directory for diagnostic plots | #### Example ```toml steps = ['ddecal', 'multims_smooth_sol', 'apply_cal'] [ddecal] cal.parmdb = 'instrument_init.h5' do_smooth_sol = false # skip per-MS smoothing; smooth jointly instead [multims_smooth_sol] parmdb_in = 'instrument_init.h5' parmdb_out = 'instrument_smooth.h5' [apply_cal] cal.parmdb = 'instrument_smooth.h5' ```