# Quality statistics — quality-collect, quality-combine

These commands wrap [AOQuality](https://pypi.org/project/aoquality/) to compute
and aggregate visibility statistics across a set of MeasurementSets.

The workflow has two stages:

1. **`quality-collect`** — compute per-MS statistics and write them into each MS
   directory (in parallel, one job per MS).
2. **`quality-combine`** — merge each observation's per-MS statistics into one
   `<obs_id>.qs` file (one per obs_id) in an output directory, ready to be
   plotted with `aostats plot-grid` or screened with `aostats find-bad-obs` /
   `aostats find-bad-stations`.

## quality-collect — compute per-MS statistics in parallel

```bash
nenudata quality-collect L2 "202312*_NT04:SW03"
```

Runs `aoquality collect -d DATA` on every MS matching the pattern.  Jobs are
distributed across nodes via the same worker pool as `l1_to_l2`.

```bash
# Collect on CORRECTED_DATA (e.g. after calibration) with 6 concurrent jobs
nenudata quality-collect L2_12C40S "202312*_CASA:SW03" \
    -d CORRECTED_DATA -m 6
```

```bash
# Dry-run: print commands without executing
nenudata quality-collect L2 "202312*_NT04:SW03" --dry-run
```

### Options

| Option | Default | Description |
|---|---|---|
| `--data-column` / `-d` | `DATA` | MS column passed to `aoquality collect` |
| `--max-concurrent` / `-m` | `1` | Max concurrent collect jobs per node |
| `--dry-run` | false | Print commands without executing |
| `--env-file` | `~/.bashrc` | Shell environment file sourced before each job |
| `--only-n2` | false | Restrict `OBS_IDS` resolution to N2 obs_ids |
| `--config` / `-c` | `data_handler.toml` | Data-handler config file |

### Node routing

For N1 obs_ids, each MS runs on the node assigned to that obs_id and SW.
For N2 obs_ids, MSs are round-robined across the N2 node pool in the same
order they are spread by `l1_to_l2`.

---

## quality-combine — one combined `.qs` per observation

```bash
nenudata quality-combine L2 "202312*_NT04:SW03" quality_l2/SW03
```

For each obs_id matching the pattern, collects that observation's MS paths and
runs one `aoquality combine` call, writing `<obs_id>.qs` into the output
directory:

```
aoquality combine quality_l2/SW03/20231208_NT04.qs MS1 MS2 ...
aoquality combine quality_l2/SW03/20231210_NT04.qs MS1 MS2 ...
```

The output location (`OUT_DIR`) is a mandatory positional argument; it is
created automatically.  Writing one file per observation is what lets the
downstream `aostats` grids and per-observation screening work — each `.qs` is
a single observation (one grid row).

```bash
# Separate directory per SW (one command per SW)
nenudata quality-combine L2 "202312*_NT04:SW03" quality/SW03
nenudata quality-combine L2 "202312*_NT04:SW04" quality/SW04
```

### Options

| Option | Default | Description |
|---|---|---|
| `--only-n2` | false | Restrict `OBS_IDS` resolution to N2 obs_ids |
| `--config` / `-c` | `data_handler.toml` | Data-handler config file |

---

## Downstream analysis

The per-observation `.qs` files produced by `quality-combine` are consumed by
the `aostats` command-line tool (from the `aoquality` package).  Pass all of an
SW's files at once — each one contributes a single observation to the grid:

```bash
# Frequency × obs / antenna × obs / LST × obs heatmaps
aostats plot-grid quality/SW03/*.qs SNR -o plots/

# Flag observations with anomalous LST trends
aostats find-bad-obs quality/SW03/*.qs SNR

# Flag stations with outlier statistics (CASA calibrator data)
aostats find-bad-stations quality/SW03/*.qs SNR -o bad_stations.json
```

---

## Reference

```{eval-rst}
.. click:: nenucal.tools.nenudata:quality_collect
   :prog: nenudata quality-collect
   :nested: full
```

```{eval-rst}
.. click:: nenucal.tools.nenudata:quality_combine
   :prog: nenudata quality-combine
   :nested: full
```

## See also

- [pipeline](pipeline.md) — run DP3 to produce the L2 MSs that are collected here
- [transfer](transfer.md) — push L2 data to a remote site after quality screening