Data formats

1 MeerKAT Visibility Format
2 Conversion to Measurement Set (MS) format
3 Using katdal to directly access data objects in the archive

MeerKAT Visibility Format

MeerKAT data in the archive are stored in a unique data format known as MeerKAT Visibility Format (MVF). MeerKAT science data are stored in MVF version 4. Early commissioning and science data from MeerKAT-16 are in MVF version 3 but very few users will ever encounter these files.

Access is facilitated by the katdal package. This package will seamlessly detect any prior data format and provide a standard interface to access visibility, sensor and metadata. Note that katdal also has various fixes built into it to account for errors found retrospectively in the data (e.g., timestamp and frequency errors). It is advised to always use the latest version of katdal for this reason.

Conversion to Measurement Set (MS) format

Most external data reduction pipelines for MeerKAT make use of MS files. Conversion can be requested on the archive interface.

The measurement set format does not support continuous scans so katdal will need to be used to access non-standard observation data.

On occasion, one may want to download a small subset of data for quick checks, without waiting on the SARAO archive data transfer queue. In this case, it is possible to obtain the direct link to the rdb file (see article on archive), and run mvftoms on a local machine across the network.

#katdal can be installed using pip
pip install katdal

#get the link to the rdb file from the archive <katdal link>
#you can use various selection options, including binning in time or channels
mvftoms.py --target J1939-6342 --flags '' --dumptime 60 -o newms.ms <katdal link>

Using katdal to directly access data objects in the archive

The OBIT-based SDP pipeline (and OBIT itself) uses the native data format so no conversion is necessary. On occasion, users may need to write their own data access and reduction scripts, e.g. with HI intensity mapping. Please have a look at the documentation on data chunking to understand how the data is physically stored and accessed and optimise retrieval speeds.

Below is an example of how to access and plot visibilities directly from the archive. First copy the rdb link with token to clipboard and paste into your code. We have elected to use a delay calibration for this example.

import katdal
from astropy.time import Time
from matplotlib.dates import DateFormatter

#get the link to the rdb file from the archive
link = 'https://archive-gw-1.kat.ac.za/1639440394/1639440394_sdp_l0.full.rdb?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiJ9.eyJpc3MiOiJrYXQtYXJjaGl2ZS5rYXQuYWMuemEiLCJhdWQiOiJhcmNoaXZlLWd3LTEua2F0LmFjLnphIiwiaWF0IjoxNjM5NTczNDQ0LCJwcmVmaXgiOlsiMTYzOTQ0MDM5NCJdLCJleHAiOjE2NDAxNzgyNDQsInN1YiI6InNoYXJtaWxhQHNhcmFvLmFjLnphIiwic2NvcGVzIjpbInJlYWQiXX0.H2yEdZY8BEJDAgMR2XyBf3r6IcHQ2E2hCmhaUmKJgqchy_okOxxNr5XLpHFTpNSv9iitvFQX40B1_ioLr1YvIQ'
data = katdal.open(link)
print(data)

===============================================================================
Name: https://archive-gw-1.kat.ac.za/1639440394/1639440394_sdp_l0.full.rdb?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiJ9.eyJpc3MiOiJrYXQtYXJjaGl2ZS5rYXQuYWMuemEiLCJhdWQiOiJhcmNoaXZlLWd3LTEua2F0LmFjLnphIiwiaWF0IjoxNjM5NTczNDQ0LCJwcmVmaXgiOlsiMTYzOTQ0MDM5NCJdLCJleHAiOjE2NDAxNzgyNDQsInN1YiI6InNoYXJtaWxhQHNhcmFvLmFjLnphIiwic2NvcGVzIjpbInJlYWQiXX0.H2yEdZY8BEJDAgMR2XyBf3r6IcHQ2E2hCmhaUmKJgqchy_okOxxNr5XLpHFTpNSv9iitvFQX40B1_ioLr1YvIQ | 1639440394-sdp-l0 (version 4.0)
===============================================================================
Observer: Operator  Experiment ID: 20211213-0027
Description: 'Delaycal'
Observed from 2021-12-14 02:06:46 SAST to 2021-12-14 02:12:13.127 SAST
Dump rate / period: 0.99961 Hz / 1.000 s
Subarrays: 1
  ID  Antennas                            Inputs  Corrprods
   0  m001,m002,m003,m004,m005,m006,m007,m008,m009,m010,m011,m013,m015,m017,m018,m019,m022,m025,m026,m027,m028,m029,m030,m032,m033,m034,m035,m037,m038,m039,m040,m041,m042,m043,m044,m045,m046,m047,m048,m049,m050,m051,m052,m053,m054,m055,m056,m057,m058,m059,m060,m061,m062,m063  108      5940
Spectral Windows: 1
  ID Band Product  CentreFreq(MHz)  Bandwidth(MHz)  Channels  ChannelWidth(kHz)
   0 UHF  c544M1k     816.000         544.000           1024       531.250
-------------------------------------------------------------------------------
Data selected according to the following criteria:
  ants=['m001', 'm002', 'm003', 'm004', 'm005', 'm006', 'm007', 'm008', 'm009', 'm010', 'm011', 'm013', 'm015', 'm017', 'm018', 'm019', 'm022', 'm025', 'm026', 'm027', 'm028', 'm029', 'm030', 'm032', 'm033', 'm034', 'm035', 'm037', 'm038', 'm039', 'm040', 'm041', 'm042', 'm043', 'm044', 'm045', 'm046', 'm047', 'm048', 'm049', 'm050', 'm051', 'm052', 'm053', 'm054', 'm055', 'm056', 'm057', 'm058', 'm059', 'm060', 'm061', 'm062', 'm063']
  spw=0
  subarray=0
-------------------------------------------------------------------------------
Shape: (327 dumps, 1024 channels, 5940 correlation products) => Size: 15.912 GB
Antennas: m001,m002,m003,m004,m005,m006,m007,m008,m009,m010,m011,m013,m015,m017,m018,m019,m022,m025,m026,m027,m028,m029,m030,m032,m033,m034,m035,m037,m038,m039,m040,m041,m042,m043,m044,m045,m046,m047,m048,m049,m050,m051,m052,m053,m054,m055,m056,m057,m058,m059,m060,m061,m062,m063  Inputs: 108  Autocorr: yes  Crosscorr: yes
Channels: 1024 (index 0 - 1023,  544.000 MHz - 1087.469 MHz), each 531.250 kHz wide
Targets: 1 selected out of 1 in catalogue
  ID  Name        Type      RA(J2000)     DEC(J2000)  Tags                       Dumps  ModelFlux(Jy)
   0  J0408-6545  radec      4:08:20.38  -65:45:09.1  bfcal single_accumulation    327      29.68
Scans: 6 selected out of 6 total       Compscans: 2 selected out of 2 total
  Date        Timerange(UTC)       ScanState  CompScanLabel  Dumps  Target
  14-Dec-2021/00:06:46 - 00:06:57    0:slew     0:un_corrected     12    0:J0408-6545
              00:06:58 - 00:09:04    1:track    0:un_corrected    127    0:J0408-6545
              00:09:05 - 00:10:58    2:stop     0:un_corrected    114    0:J0408-6545
              00:10:59 - 00:11:00    3:stop     1:corrected      2    0:J0408-6545
              00:11:01 - 00:11:10    4:slew     1:corrected     10    0:J0408-6545
              00:11:11 - 00:12:12    5:track    1:corrected     62    0:J0408-6545

Next, select a single baseline to examine the visibilities. Generate a spectrum.

data.select(ants='m001, m063', corrprods='cross', pol='H')
#Data is ordered by time, frequency, polarisation

amp = np.abs(data.vis[:])
spectrum = np.mean(amp, axis=0)
power = 10*np.log10(spectrum)

freqs = data.freqs/1e6
fig = plt.figure(figsize=[20,10])
plt.plot(freqs, power)
plt.xlim([544, 1088])
plt.xlabel('Frequency (MHz)')
plt.ylabel('Mean Power (dB)')

Example plot generated in ipython from data accessed directly from the archive.

#Create dynamic spectrum of a preselected single antenna/baseline 
#correlation product
corrprod = 0
scan_start = Time(min(data.timestamps), format='unix')
scan_end = Time(max(data.timestamps),format='unix')
fig1 = plt.figure(figsize=[20,10])
phase = np.angle(data.vis[:,:,corrprod])

ax1 = plt.subplot(111)
ax1 = plt.subplot(111)
plt.title(data.corr_products[corrprod][0] + ',' + data.corr_products[corrprod][1]+ 
          ' ' + scan_start.iso+' to ' +scan_end.iso)
cax = ax1.imshow(phase, origin='lower', cmap='rainbow' )
cbar = fig1.colorbar(cax)
cbar.set_label('Phase (deg)')
ax1.axis('tight')
ax1.set_ylabel('integration number ')
ax1.set_xlabel('Channel number')

A waterfall plot showing the phases between M001 and M063 before and after delay calibration.