Data formats
MeerKAT Visibility Format
MeerKAT data in the archive are stored in a unique data format known as MeerKAT Visibility Format (MVF). MeerKAT science data are stored in MVF version 4. Early commissioning and science data from MeerKAT-16 are in MVF version 3 but very few users will ever encounter these files.
Access is facilitated by the katdal package. This package will seamlessly detect any prior data format and provide a standard interface to access visibility, sensor and metadata. Note that katdal also has various fixes built into it to account for errors found retrospectively in the data (e.g., timestamp and frequency errors). It is advised to always use the latest version of katdal for this reason.
Conversion to Measurement Set (MS) format
Most external data reduction pipelines for MeerKAT make use of MS files. Conversion can be requested on the archive interface.
The measurement set format does not support continuous scans so katdal will need to be used to access non-standard observation data.
On occasion, one may want to download a small subset of data for quick checks, without waiting on the SARAO archive data transfer queue. In this case, it is possible to obtain the direct link to the rdb file (see article on archive), and run mvftoms on a local machine across the network.
#katdal can be installed using pip
pip install katdal
#get the link to the rdb file from the archive <katdal link>
#you can use various selection options, including binning in time or channels
mvftoms.py --target J1939-6342 --flags '' --dumptime 60 -o newms.ms <katdal link>
Using katdal to directly access data objects in the archive
The OBIT-based SDP pipeline (and OBIT itself) uses the native data format so no conversion is necessary. On occasion, users may need to write their own data access and reduction scripts, e.g. with HI intensity mapping. Please have a look at the documentation on data chunking to understand how the data is physically stored and accessed and optimise retrieval speeds.
Below is an example of how to access and plot visibilities directly from the archive. First copy the rdb link with token to clipboard and paste into your code. We have elected to use a delay calibration for this example.
import katdal
from astropy.time import Time
from matplotlib.dates import DateFormatter
#get the link to the rdb file from the archive
link = 'https://archive-gw-1.kat.ac.za/1639440394/1639440394_sdp_l0.full.rdb?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzI1NiJ9.eyJpc3MiOiJrYXQtYXJjaGl2ZS5rYXQuYWMuemEiLCJhdWQiOiJhcmNoaXZlLWd3LTEua2F0LmFjLnphIiwiaWF0IjoxNjM5NTczNDQ0LCJwcmVmaXgiOlsiMTYzOTQ0MDM5NCJdLCJleHAiOjE2NDAxNzgyNDQsInN1YiI6InNoYXJtaWxhQHNhcmFvLmFjLnphIiwic2NvcGVzIjpbInJlYWQiXX0.H2yEdZY8BEJDAgMR2XyBf3r6IcHQ2E2hCmhaUmKJgqchy_okOxxNr5XLpHFTpNSv9iitvFQX40B1_ioLr1YvIQ'
data = katdal.open(link)
print(data)
Next, select a single baseline to examine the visibilities. Generate a spectrum.
data.select(ants='m001, m063', corrprods='cross', pol='H')
#Data is ordered by time, frequency, polarisation
amp = np.abs(data.vis[:])
spectrum = np.mean(amp, axis=0)
power = 10*np.log10(spectrum)
freqs = data.freqs/1e6
fig = plt.figure(figsize=[20,10])
plt.plot(freqs, power)
plt.xlim([544, 1088])
plt.xlabel('Frequency (MHz)')
plt.ylabel('Mean Power (dB)')
#Create dynamic spectrum of a preselected single antenna/baseline
#correlation product
corrprod = 0
scan_start = Time(min(data.timestamps), format='unix')
scan_end = Time(max(data.timestamps),format='unix')
fig1 = plt.figure(figsize=[20,10])
phase = np.angle(data.vis[:,:,corrprod])
ax1 = plt.subplot(111)
ax1 = plt.subplot(111)
plt.title(data.corr_products[corrprod][0] + ',' + data.corr_products[corrprod][1]+
' ' + scan_start.iso+' to ' +scan_end.iso)
cax = ax1.imshow(phase, origin='lower', cmap='rainbow' )
cbar = fig1.colorbar(cax)
cbar.set_label('Phase (deg)')
ax1.axis('tight')
ax1.set_ylabel('integration number ')
ax1.set_xlabel('Channel number')