Table of Contents
Overview
We have used the Robert C. Byrd Green Bank Telescope (GBT) to collect a large volume of radio-frequency baseband data as part of National Science Foundation (NSF) Advanced Technology and Instrumentaion (ATI) Grant #1910302 ("New Wide-band Digital Technology and Interference Excision for Radio Astronomy", PI: Ryan S. Lynch). The overarching goal of this project is to develop and rigorously test new techniques for identifying and removing radio-frequency interfence (RFI) in astronomical data. The data were collected on a variety of astronomical sources, including pulsars, OH masers, extragalactic sources of HI, and flux calibration sources. Most data were collected using the GBT's 1.1 - 1.8 GHz L-Band receiver using the Versatile Green Bank Astronomical Spectrometer (VEGAS). These data represent the lowest-level data products that can be recorded by VEGAS and are subject to minimal pre-processing. This allows them to be further processed offline to produce a wide variety of scientific data products suitable for different science use cases.
Having access to these baseband data allows us to test a variety of approaches to RFI mitigation. These techniques have been implemented and applied offline, but the mitigated baseband data is identical to what could, potentially, be produced using real-time processing. We can then process the mitigated baseband data to produce higher-level scientific data products and compare the results to the same types of data products that have been produced from the unmitigated baseband data. If the RFI mitigation techniques are successful then the scientific data products produced from the mitigated baseband data should be of a higher quality than those produced from the unmitigated baseband data, while preserving the relevant astrophysical parameters that are of scientific interest.
All of the baseband data that we have recorded (both the original unmitigated data and the mitigated data), along with corresponding data products, are being made publicly available here for use by other scientists. We encourage others to make use of these data to test their own RFI mitigation techniques. Questions and comments may be directed to Ryan Lynch.
Acknowledging Use of These Data
We ask that anyone that makes use of these data include the following acknolwedgement.
This work makes use of data collected by the Green Bank Observatory's Robert C. Byrd Green Bank Telescope (GBT) as part of National Science Foundation Advanced Technology and Instrumentaion Grant #1910302 (PI: Ryan S. Lynch). The Green Bank Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc.
Data Format
The majority of the data were taken with the VEGAS backend in its baseband recording mode. The full bandwidth is sampled by two 8-bit analog to digital converters (one for each polarization), producing real-valued samples. The data is then channelized via a polyphase filterbank (PFB), producing a number of Nyquist-sampled, narrowband, 8-bit complex-valued samples for each polarization. These channels are tapered to have an effective bandwidth that is 95% the sampled bandwidth in order to reduce leakage. These data are recorded on disk in blocks that are approximately 1 GB in size, with a small accompanying header section that containing meta-data about that block. Each block overlaps with the succeeding block by a number of time samples specified by the OVERLAP parameter in the headers. The number of PFB channels, the bandwidth of each channel, and the number of time samples corresponding to one data block vary depending on the observational setup. A schematic representation of one example configuration is shown below.
Some early data were taken with the Green Bank Ultimate Pulsar Processing Instrument (GUPPI). These data have the same basic format. One key difference is that all VEGAS data are lower sideband, so the lowest PFB channel contains the highest radio frequency, where as GUPPI data may be lower or upper sideband. Lower sideband data is indicated by a negative value for the OBSBW header parameter, whereas upper sideband is positive.
Processing the Data
We recommend using one of two packages for processing these baseband data.
BlimPy
The BlimPy package mantained by the Breakthrough Listen project provides an easy-to-use Python interface that can read headers and data blocks and return them as Python dictionaries and Numpy arrays, respectively.
DSPSR
The DSPSR package provides several tools for producing PSRFITS-format data from baseband data, and includes options for a number of pulsar-science use cases.
RFI Mitigation Strategies
RFI in most data was mitigated using one of two strategies, namely spectral kurtosis (SK) and cyclostationary signal processing (CSP).
Spectral Kurtosis
Astronomical sources that emit incoherently and that vary in intensity slowly produce observable voltages that closely follows a normal distribution. However, RFI is expected to produce voltages that do not follow a normal distribution. SK identifies samples that do not follow a normal distribution by measuring excess kurtosis, the degree of "tailedness" relative to a normal distribution. A complete description of our SK-based RFI mitigation algorithm can be found in Smith, Lynch, and Pisano (2022).
Cyclostationary Signal Processing
The majority of astronomical sources produce radiation that can be modeled as a wide-sense stationary process, i.e. estimates of the statistical moments of the process producing the data are approximately constant in time. However, many sources of RFI are cyclostatoinary, meaning that estimates of the statistical moments of the process producing the data are periodic or close-to-periodic. This comes about because of the way that telecommunications signal encode information by modulating some property of the radio-frequency carrier signal, namely its phase, amplitude, and/or frequency. The only known astronomical sources to exhibit cyclostationarity are pulsars (technosignatures from an extraterrestrial civilization, if they exist, may also be cyclostationary). This makes cyclostationarity an attractive means by which to distinguish RFI from noise and astronomical sources of interest. A complete description of our CSP-based algorithm can be found in Lynch, Smith, and Harrison (in prep.) (note that the version posted here is a pre-print which may undergo further revisions).Data Replacement
We replace baseband samples that show evidence of containing RFI using one of three approaches. Each has benefits and undesirable side-effects.
The first is to replace the data with zeros, effectively blanking the data. The benefit of this approach is its simplicity, but it does have a drawback. Some post-processing steps involve taking a Fourier transform of the detected power-values corresponding to these raw voltages. Replacing with zeros is equivalent to multiplying the time series by a boxcar function, and after taking the Fourier transform artifacts can be introduced in the form of a sinc-like ripple.
The second approach is to replace data with normally-distributed random values with mean and variance that is approximately equal to what would be the case if RFI was not present. This has the benefit of eliminating discontinuities in the baseband data. However, it can be difficult to obtain an accurate estimate of the mean and variance of data that do not contain RFI, especially when RFI is almost always present at certain frequencies.
The third approach is to duplicate previous baseband samples that were not identified as containing RFI. This eliminates the need to estimate the statistical properties of the data but suffers from the same problem of having to identify the appropriate samples, as with the previous approach.
In practice, most of our mitigated data uses random values or zeros as the replacement strategy.
Data Reduction
We produce scientific data products from the unmitigated and mitigated baseband data. Most of the data products are relevant for pulsars. Specifically, we perform a "blind" search for the pulsar in the data using routines that are part of the PRESTO package. We also produce phase-folded data using the PSRCHIVE package.Most of our baseband data were produced with coarse Nyquist-sample channels using a PFB with relatively wide channel bandwidth (typically a few hundred kHz). These data lack the spectral resolution needed to analyze spectral lines. To produce spectral line data products we further channelize the data using a second PFB.
A detailed comparison of the data products produced from unmitigated and mitigated data and the improvements in data quality resulting from the mitigation process will be published in an upcoming memo.
The Data
All baseband data and scientific data products are available for download via a direct download link. Some files are quite large. For bulk downloads, please contact Ryan Lynch, and we can discuss alternative ways of obtaining the data.