Data formats

MeerKAT Visibility Format

MeerKAT data in the archive are stored in a unique data format known as MeerKAT Visibility Format (MVF). MeerKAT science data are stored in MVF version 4. Early commissioning and science data from MeerKAT-16 are in MVF version 3 but very few users will ever encounter these files.

Access is facilitated by the katdal package. This package will seamlessly detect any prior data format and provide a standard interface to access visibility, sensor and metadata. Note that katdal also has various fixes built into it to account for errors found retrospectively in the data (e.g., timestamp and frequency errors). It is advised to always use the latest version of katdal for this reason.

Conversion to Measurement Set (MS) format

Most external data reduction pipelines for MeerKAT make use of MS files. Conversion can be requested on the archive interface.

The measurement set format does not support continuous scans so katdal will need to be used to access non-standard observation data.

On occasion, one may want to download a small subset of data for quick checks, without waiting on the SARAO archive data transfer queue. In this case, it is possible to obtain the direct link to the rdb file (see article on archive), and run mvftoms on a local machine across the network.


#katdal can be installed using pip pip install katdal #get the link to the rdb file from the archive <katdal link> #you can use various selection options, including binning in time or channels --target J1939-6342 --flags '' --dumptime 60 -o <katdal link>

Using katdal to directly access data objects in the archive

The OBIT-based SDP pipeline (and OBIT itself) uses the native data format so no conversion is necessary. On occasion, users may need to write their own data access and reduction scripts, e.g. with HI intensity mapping. Please have a look at the documentation on data chunking to understand how the data is physically stored and accessed and optimise retrieval speeds.

Below is an example of how to access and plot visibilities directly from the archive. First copy the rdb link with token to clipboard and paste into your code. We have elected to use a delay calibration for this example.

import katdal from astropy.time import Time from matplotlib.dates import DateFormatter #get the link to the rdb file from the archive link = '' data = print(data)


=============================================================================== Name: | 1639440394-sdp-l0 (version 4.0) =============================================================================== Observer: Operator Experiment ID: 20211213-0027 Description: 'Delaycal' Observed from 2021-12-14 02:06:46 SAST to 2021-12-14 02:12:13.127 SAST Dump rate / period: 0.99961 Hz / 1.000 s Subarrays: 1 ID Antennas Inputs Corrprods 0 m001,m002,m003,m004,m005,m006,m007,m008,m009,m010,m011,m013,m015,m017,m018,m019,m022,m025,m026,m027,m028,m029,m030,m032,m033,m034,m035,m037,m038,m039,m040,m041,m042,m043,m044,m045,m046,m047,m048,m049,m050,m051,m052,m053,m054,m055,m056,m057,m058,m059,m060,m061,m062,m063 108 5940 Spectral Windows: 1 ID Band Product CentreFreq(MHz) Bandwidth(MHz) Channels ChannelWidth(kHz) 0 UHF c544M1k 816.000 544.000 1024 531.250 ------------------------------------------------------------------------------- Data selected according to the following criteria: ants=['m001', 'm002', 'm003', 'm004', 'm005', 'm006', 'm007', 'm008', 'm009', 'm010', 'm011', 'm013', 'm015', 'm017', 'm018', 'm019', 'm022', 'm025', 'm026', 'm027', 'm028', 'm029', 'm030', 'm032', 'm033', 'm034', 'm035', 'm037', 'm038', 'm039', 'm040', 'm041', 'm042', 'm043', 'm044', 'm045', 'm046', 'm047', 'm048', 'm049', 'm050', 'm051', 'm052', 'm053', 'm054', 'm055', 'm056', 'm057', 'm058', 'm059', 'm060', 'm061', 'm062', 'm063'] spw=0 subarray=0 ------------------------------------------------------------------------------- Shape: (327 dumps, 1024 channels, 5940 correlation products) => Size: 15.912 GB Antennas: m001,m002,m003,m004,m005,m006,m007,m008,m009,m010,m011,m013,m015,m017,m018,m019,m022,m025,m026,m027,m028,m029,m030,m032,m033,m034,m035,m037,m038,m039,m040,m041,m042,m043,m044,m045,m046,m047,m048,m049,m050,m051,m052,m053,m054,m055,m056,m057,m058,m059,m060,m061,m062,m063 Inputs: 108 Autocorr: yes Crosscorr: yes Channels: 1024 (index 0 - 1023, 544.000 MHz - 1087.469 MHz), each 531.250 kHz wide Targets: 1 selected out of 1 in catalogue ID Name Type RA(J2000) DEC(J2000) Tags Dumps ModelFlux(Jy) 0 J0408-6545 radec 4:08:20.38 -65:45:09.1 bfcal single_accumulation 327 29.68 Scans: 6 selected out of 6 total Compscans: 2 selected out of 2 total Date Timerange(UTC) ScanState CompScanLabel Dumps Target 14-Dec-2021/00:06:46 - 00:06:57 0:slew 0:un_corrected 12 0:J0408-6545 00:06:58 - 00:09:04 1:track 0:un_corrected 127 0:J0408-6545 00:09:05 - 00:10:58 2:stop 0:un_corrected 114 0:J0408-6545 00:10:59 - 00:11:00 3:stop 1:corrected 2 0:J0408-6545 00:11:01 - 00:11:10 4:slew 1:corrected 10 0:J0408-6545 00:11:11 - 00:12:12 5:track 1:corrected 62 0:J0408-6545

Next, select a single baseline to examine the visibilities. Generate a spectrum.'m001, m063', corrprods='cross', pol='H') #Data is ordered by time, frequency, polarisation amp = np.abs(data.vis[:]) spectrum = np.mean(amp, axis=0) power = 10*np.log10(spectrum) freqs = data.freqs/1e6 fig = plt.figure(figsize=[20,10]) plt.plot(freqs, power) plt.xlim([544, 1088]) plt.xlabel('Frequency (MHz)') plt.ylabel('Mean Power (dB)')
Example plot generated in ipython from data accessed directly from the archive.
#Create dynamic spectrum of a preselected single antenna/baseline #correlation product corrprod = 0 scan_start = Time(min(data.timestamps), format='unix') scan_end = Time(max(data.timestamps),format='unix') fig1 = plt.figure(figsize=[20,10]) phase = np.angle(data.vis[:,:,corrprod]) ax1 = plt.subplot(111) ax1 = plt.subplot(111) plt.title(data.corr_products[corrprod][0] + ',' + data.corr_products[corrprod][1]+ ' ' + scan_start.iso+' to ' +scan_end.iso) cax = ax1.imshow(phase, origin='lower', cmap='rainbow' ) cbar = fig1.colorbar(cax) cbar.set_label('Phase (deg)') ax1.axis('tight') ax1.set_ylabel('integration number ') ax1.set_xlabel('Channel number')
A waterfall plot showing the phases between M001 and M063 before and after delay calibration.