SDP pipelines overview

The SARAO Science Data Processor (SDP) runs a range of processing pipelines in addition to producing visibility data. These pipelines, their basic concept of operation, data products produced, and key design features are described below.

The primary aim of these pipelines is to produce Quality Assessment (QA) metrics to help internal and external stakeholders quickly assess the quality of a particular observation. However, we do hope that the products produced can become scientifically usable, and as such welcome user feedback on possible improvements to the pipelines that can increase this utility.

 

Running the pipeline yourself

Please note that since the MeerKAT SDP pipeline runs in realtime on data directly streamed from the correlator, the processing cannot be redone on our hardware with e.g. different flagging or imaging settings.

The code repositories and detailed developer documentation are listed on the following page:

https://skaafrica.atlassian.net/wiki/spaces/ESDKB/pages/1333067787

 

At the moment, the only data products that are restaged from tape are the flags and visibilities. No image products are restored. Users should be prepared to image the data themselves.

History and Notes

It should be noted that the deployment of the various pipelines has been incremental over the years, and historic observations will not always carry the full array of pipelined products. In general, the calibration report should be available for observations carried out since March 2019.

Continuum imaging has been running regularly on compatible observations (containing bandpass and gain calibrators and at least 15 minutes on target) since November 2019.

Note: Flux calibration was only active from February 2020.

The spectral imager has been running on compatible observations (as for continuum but with at least 60 minutes on target) since January 2020.

Note: Primary beam correction with a simple spherical model has been active since mid-April 2020, and the spectral imager report from June 2020.

Workflow and Products

The basic workflow through the pipelines is shown here:

Data products are available at each stage of the pipeline process, allowing the user to choose the desired level without compromising the raw correlator data:

Product Name

Description

Availability

Product Name

Description

Availability

L0 Visibilities

The visibility data as received by the correlator, but after time averaging, conservative excision of high-threshold RFI (typically 1-2% of the data), and type conversion from INT32 to Float32. This is the lowest level product available from the archive.

Direct access using katdal with a secure link retrieved from the SARAO archive. Can also be exported into MSv2 format for use with CASA.

L1 Visibilities

The internal name for L0 Visibilities that have had calibration solutions from the calibration pipeline applied.

The generated calibration solutions are stored in the observation meta-data and can optionally be applied using katdal or when exporting to MSv2.

L1 Flags

A set of flags produced from both the ingest process (performs the basic post-correlator steps to produce L0 visibilities), and the calibration pipeline. Flags are stored as an 8-bit value with each bit carrying a specific flag type (these are described below).

Included as part of the MVFv4* format and optionally exported in MSv2 (as an aggregate flag). The user is able to select the flag combination they would like to use in both of these cases.

* MVFv4 is the internal format used to store MeerKAT visibility data. It consists of a meta-data file (.rdb - a Redis compatible binary format), and data objects storing visibilities and flags. More detail on this format is available here.

Calibration Pipeline

This pipeline is run for all wideband data products and since August 2021 for all narrowband data products as well. It produces the following:

Calibration Solutions

Primarily intended for use in the Continuum and Spectral pipelines, these are likely to be generically applicable for end users, and are available along with the visibility data. The available products are listed in the table below.

This lists the katpoint tag needed to trigger the generation of particular solutions, the resultant key in the meta-data, and the frequency range over which the solutions are calculated for each type (shown for L-band in the table below, other bands are listed separately in the next table). Prior to Feb 2020, 32K observations used a different frequency range for gain and delay calibrations than 1K and 4K observations, since then observations in all 3 spectral wideband modes use the same frequency range.

Katpoint Tag

Solution Type

Meta-Data Key

Solution Freq. Range

1K & 4K

32K

(prior to Feb 2020)

'delaycal'

Delay

cal_product_K

1326 - 1367

973 - 1000

'bpcal'

Bandpass

cal_product_B{i}

Whole band

'gaincal'

Gain

cal_product_G

1326 - 1367

973 - 1000

'polcal'

Cross hand delay

cal_product_KCROSS

1326 - 1367

973 - 1000

'bfcal'

Delay

cal_product_K

1326 - 1367

973 - 1000

Bandpass

cal_product_B{i}

Whole band

Gain

cal_product_G

1326 - 1367

973 - 1000

If noise diode firing enabled:

 

 

Cross hand delay

cal_product_KCROSS_DIODE

1326 - 1367

973 - 1000

Cross hand bp

cal_product_BCROSS_DIODE{i}

Whole band

 

The frequency intervals used for gain and delay calibrations in UHF, L and S band are listed in the table below. When observing in S-band users select from five possible sub-bands of 875 MHz bandwidth within the full 1.75 - 3.50 GHz band, there are thus three possible windows used for gain and delay calibration depending on the selected sub-band in use.

UHF-band

L-band

S-band

UHF-band

L-band

S-band

842 - 869 MHz

1326 - 1367 MHz

2010 - 2096 MHz

2517 - 2603 MHz

2876 - 2962 MHz

 

Since August 2021 calibration solutions and reports are also produced for narrowband observations. As the narrowband frequency interval is not fixed it is not possible to select a default window for producing the gain calibration solutions. Instead the pipeline selects a suitable frequency window when the observation first starts, it does this by selecting a contiguous, unflagged region of frequency space with a width of ~ 41MHz which is processed by a single calibration server. (All 32K mode observations are split into 4 frequency bands and processed on 4 separate calibration servers in parallel.) If all the unflagged regions of the narrowband observation in question, i.e. regions not flagged by the static mask, are smaller than the required 41MHz then the largest available unflagged region is selected. The chosen gain calibrator interval for all narrow and wide products is listed in the calibration report at the top of the table listing the flux calibration of the observation, an example is shown below.

 

Reference Antenna

All calibration solutions are relative to a reference antenna, an antenna whose phase errors for all solutions are arbitrarily set to zero. The chosen reference antenna is clearly labeled in the calibration report and its phase solutions should be zero in all plots of the calibration solutions. In order to avoid selecting a malfunctioning antenna as the reference antenna, the pipeline uses data from the first scan on a calibrator target to assess the quality of data from each antenna. It creates a quality measurement using the following procedure:

  • Take the Fast Fourier Transform (FFT) of the scan averaged data on each baseline;

  • Measure the ratio of the height of the peak of the FFT to the rms noise at a region of the FFT spectrum located some distance from the peak. A peak-to-noise ratio is measured per baseline;

  • Take the median of these peak-to-noise ratios for all baselines to each antenna to form a per antenna measurement;

  • Use the antenna with the highest median value of peak-to-noise ratios as the reference antenna.

 This reference antenna selection method may be updated in the future.  

The reference antenna is selected using the first scan on a calibrator target once the array has started to observe. The reference antenna is never updated during the course of an observation (within a given capture block). However at the start of a new observation (a new capture block) the pipeline will assess whether the currently selected reference antenna has become excessively flagged, as might happen if an antenna becomes faulty during the course of an observation. If the flag fraction on the reference antenna exceeds 80%, then a new reference antenna will be selected using the same method outlined above.

Flags

A first pass set of flags is produced by the ingest process, but this has relatively conservative tuning and also only looks at a single correlation interval at a time. The calibration pipeline builds on this with a 2D flagger that inspects a full calibration interval. As mentioned the various stages of flagging are assigned individual bits in the flag byte, allowing the end user to choose specific flags for their reduction.

The currently used flag bits are as follows:

Bit

Flag Type

Description

Bit

Flag Type

Description

1

reserved

Not used.

2

static

Static mask for the band edges as well as short baselines (<1000m) for some broad hotspots. A typical example is shown below for L-band from 856 to 1712 MHz:

4

cam

Flags provided by the Control and Monitoring system, typically used to indicate antenna slewing as well as flag out entire antennas due to failures or events such as wind-stow.

8

data_lost

Either missing from the correlator data stream or temporarily unavailable from the archive.

16

ingest_rfi

Produced by the ingest process during 1D flagging.

32

predicted_rfi

Not used.

64

cal_rfi

Flags produced by the calibration pipeline itself. Essentially a tweaked AOflagger written from scratch in Numba that runs along time and frequency.

128

post_proc

If katdal is used to apply calibration solutions any invalid gains left after interpolation obtain this flag. This mostly affects band edges (no BP extrapolation here) and selfcal gains on the calibrator.

Calibration Report

One of the most important outputs of the calibration pipeline is the detailed Calibration Report, which forms the key part of the post observation quality assessment.

The report is too detailed to give justice to here, but broadly includes an observation summary, high level metrics such as antenna SNR, a flagging summary, and a range of plots showing the various calibration solutions. A few examples are shown below (click the caption for a larger version):

 

The report is available for all wideband observations from the SARAO archive. Since August 2021 it also available for narrowband observations. An example of a typical report is available here.

Note: Occasionally the chosen reference antenna will be dropped from an array during an observation which can lead to obviously wrong calibration solutions in the report, whilst the underlying data is still of good quality.

Continuum Pipeline

The first of the offline pipelines, the continuum imager is run when more than 15 minutes of target data is accumulated for a single schedule block. It has two main purposes: to produce a best-effort continuum image for QA purposes, and to produce self-cal solutions and continuum subtraction components for use in the spectral imager.

A detailed write-up of the pipeline is available here, but a short summary of the most important points follows:

  • Based on the Obit package authored by Bill Cotton

  • Each target in an observation is imaged individually (as long as the 15 minutes on target is met)

  • Baseline dependent averaging is used, reducing data volume by a factor of around 4. A 1% amplitude loss at 1 degree from the phase center is used as the binning criterion.

  • Wide-field effects are dealt with using a faceted approach (~ 140 facets used in the inner 1 degree for L-band). Roughly the same number over a wider field of view is used for UHF.

  • Wide-band effects are handled by imaging sub-bands independently - typically around 10.

  • Additional facets are placed on bright sources outside the primary beam (selected from SUMSS).

  • Two rounds of self-calibration are used, the first cleaning down to around 1 mJy, with the second to a depth that matches the sensitivity of a single channel in a 32k observation (about 100 μJy).

  • In the final step the facets are combined in each sub-band, with these then averaged to produce a final image.

  • At present a typical 8-hour observation takes around 10 hours to produce. There are plans to improve this performance since, at present, continuum imaging is limited to a single compute node.

  • The full resolution FITS file as well as PNG thumbnails are available from the archive.

  • Since April 2021 a Continuum Image quality report is produced, which extracts sources from primary beam corrected version of the pipeline image and compares the fluxes and positions of these with matching sources in the SUMSS and NVSS catalogues.

An example UHF continuum thumbnail image produced automatically by the pipeline is shown below (click here for full-resolution):

 

Spectral Pipeline

The spectral imager may sometimes be turned off due to logistical issues. Users that are depending on the spectral cubes produced by SDP should indicate clearly in the comments for their schedule blocks that the spectral imager should be activated.

The spectral pipeline is the terminal point of the current SDP pipelines and produces a single-channel FITS image for each channel in the observation up to 32k. It is run for compatible observations, with appropriate calibrators, when more than 45 minutes1 has been observed for a single target.

The main design goal was imaging speed, and for a typical 8-hour observation we can produce 32k images in around 5 hours.

A detailed write-up of the pipeline is available, but once again a short summary is provided:

  • The imager has been built from scratch, with a strong focus on memory usage, GPU support and overall speed, rather than outright imaging precision.

  • Gridding and de-gridding use a hybrid of W-stacking and W-projection.

  • No explicit self-calibration is performed, rather the solutions from the continuum pipeline are used.

  • A typical observation will see 3 major cycles before converging.

  • Continuum subtraction is done in the visibility domain using visibilities produced by direct evaluation of the RIME from the clean components produced by the continuum pipeline.

  • Primary beam correction is performed using a simple spherical model.

  • Currently no merging of the channels is performed post-imaging, with each image available from the archive as an individual FITS file. We await feedback from the community with regard to merge options.

  • A spectral-imaging report is produced and available in the archive. This shows the comparison of imaged to theoretical noise across the band, and highlights channels with strong sources in them.