imgpipe — imaging pipeline

imgpipe runs WSClean on one or more Measurement Sets, driven by a TOML configuration file. It distributes the imaging jobs across nodes using the same [worker] pool as calpipe.

WSClean options are written directly in the [wsclean] section of the config file and map one-to-one to WSClean command-line flags. A few special values (see below) allow the output directory and some parameters to be derived automatically from the input MS.

Use cases

Image a list of MS files directly

imgpipe img.toml SW03_T001.MS SW03_T002.MS SW03_T003.MS

Runs one WSClean job per MS.

Image all MS for a set of observations via the data handler

imgpipe img.toml "202312*_NT04"

When [data_handler] is configured in img.toml, imgpipe resolves the obs_id pattern to MS paths using the data handler and then submits one WSClean job per MS.

Combine all MS into a single image

imgpipe img.toml "202312*_NT04" --combine

All MSs for the same spectral window and obs_id are passed to WSClean in a single call (multi-MS imaging). WSClean averages them jointly during gridding.

Combine across obs_ids too

imgpipe img.toml "202312*_NT04" --combine --combine_obs_ids

Groups all MSs across all obs_ids and spectral windows into one WSClean call per spectral window. Useful for deep integrations.

Configuration file

[worker]
nodes = 'nancep5'         # comma-separated list of hosts, or 'localhost'
max_concurrent = 4        # max simultaneous WSClean jobs per node
env_file = '~/.bashrc'   # sourced before each job (to activate software env)
dry_run = false

# Optional: resolve obs_ids -> MS paths via a data handler
[data_handler]
config_file = 'data_handler.toml'
data_level = 'L2'

[wsclean]
name = 'img'                         # output image name prefix (required)
out-dir = '$ms_basename$/images'     # output directory (see path tokens below)
pol = 'I'
size = '1024 1024'
scale = '1arcmin'
weight = 'briggs 0'
data-column = 'CORRECTED_DATA'
niter = 1000
auto-threshold = 3.0
channels-out = 'all'                 # special value: one output channel per MS channel

WSClean option mapping

Every key in [wsclean] except name and out-dir is passed directly to WSClean as a command-line flag:

  • String or numeric values become -key value

  • Boolean true becomes -key (flag with no value)

  • The key name sets the -name argument, prefixed with the output directory

Special values for channels-out

Value

Result

'all'

Set to the total number of channels in the MS

'every N'

Set to total_channels // N

Any integer

Passed through as-is

Output directory tokens

The out-dir value supports several placeholder tokens that are expanded at runtime:

Token

Expands to

$ms_in$

Full path of the input MS

$ms_basename$

Basename of the input MS (directory name without path)

$name$

Value of the name key in [wsclean]

$obs_id$

Obs_id (only when using the data handler)

$sw$

Spectral window (only when using the data handler)

Example:

out-dir = '/data/images/$obs_id$/$sw$/$name$'

run_on_file_host — co-locate imaging with data

When the data lives on distributed storage (one MS per node), WSClean runs fastest on the node that holds the file. Enable this in [worker]:

[worker]
run_on_file_host = true
run_on_file_host_pattern = '\/net/(node\d{3})'

imgpipe extracts the hostname from the MS path using the regex pattern and submits each job to that host.

Reference

imgpipe

Calibration pipeline

CONFIG_FILE: Configuration file
MS_INS_OR_OBS_IDS: Measurement sets to process or OBS_IDS in case you have set data_handler in CONFIG_FILE

Usage

imgpipe [OPTIONS] CONFIG_FILE MS_INS_OR_OBS_IDS...

Options

--version

Show the version and exit.

-c, --combine

Combine all MS to produce one image (for each SW/OBS_ID when applicable).

-o, --combine_obs_ids

Combine also OBS_IDS (when applicable).

-n, --nodes_mpi <nodes_mpi>

Nodes to use for distributed imaging

Arguments

CONFIG_FILE

Required argument

MS_INS_OR_OBS_IDS

Required argument(s)