The Green Bank Telescope

How to put data on a tape.

There are tape drives on several hosts that everyone can use for storing large amounts of data. See this document for a list of tapedrives, their capacities and locations.

When should you consider putting data on tape?

If you have large datasets that don't change or change infrequently, and that you don't need to access for months, you should consider moving them off the fileserver and onto tape. For example, a good candidate for tape storage is a set of giant binary datafiles of observations that you've finished analyzing, but which you want to keep around just in case. Another good example is a paper that you wrote a year or two ago, with all its figures.

It's good to move such things off the fileserver and onto tape because the fileserver has limited storage, and it's optimized for protecting frequently changing data against disk crashes. If the same data sits on the fileserver for months without changing it's a waste of resources and can impact on other users.

How to put data onto a tape.

First you'll need a suitable tape for the drive you are going to use. Try to choose a tapedrive who's capacity matches your needs in terms of size. There is no point putting a single 10Mb dataset on a 40Gb tape.

  1. Put the tape in tapehost's tapedrive.

  2. Log on to tapehost, maybe remotely.

  3. Choose which files you want to move onto the tape. In the following code, I'll move all the files in ~/scratch and ~/fchannel, plus my mailbox ~/mail.

  4. Use du to make sure you can fit all your data on one tape.
            tapehost$ cd ~
            tapehost$ du -sk scratch fchannel mail
            48      scratch
            320     fchannel
            10224   mail
            10592   total
    
    The output of du is in kilobytes. Divide by 1024 to get MB. Here's an awk script that does the conversion to MB (cut and paste it to your shell to use it).
            tapehost$ du -sk scratch fchannel mail | awk '{printf("%d\t%s\n", $1/1024, $2)}'
            0       scratch
            0       fchannel
            9       mail
            10      total
            tapehost$
    
    Make sure the total is not larger than the tape!

  5. Use gtar to copy the files onto the tape. gtar stands for tape archive. The gtar options cpf mean
    • c create a tape archive
    • p preserve file attributes like owner and timestamp
    • f put the archive in file /dev/tape, i.e. the tape drive.
              tapehost$ cd ~
              tapehost$ gtar -cpf /dev/tape scratch fchannel INBOX
      

    Here /dev/tape refers to the actual tape device. Refer to the list of tape drives here to select the right device. The actual tape drives are also labelled with the appropriate device name. (eg /dev/rmt/0)

  6. Wait until gtar finishes and the light on the tape drive stops flashing. A 1 GB archive might take ten minutes or so. Then eject the tape.

  7. Important Label your tape!

How to get data off a tape.

  1. Put the tape in the drive, log on to tapehost.

  2. Make a temporary directory and cd into it. This is really important if you want to avoid the possibility of overwriting existing files.
            tapehost$ cd ~
            tapehost$ mkdir tmp
            tapehost$ cd tmp
    

  3. Make sure the tape is rewound by running
            tapehost$ /bin/mt -f /dev/tape rewind
    

  4. List the table of contents of the archive by running
            tapehost$ gtar -tvf /dev/tape
    
  5. Extract everything from the tape by running
            tapehost$ gtar -xvf /dev/tape
    
    or extract specific files by running
            tapehost$ gtar -xvf /dev/tape filename1 path/filename2
    
    where the filename arguments appear exactly as listed in the archive table of contents.

When the gtar command finishes your files should be back on the disc.

For help with any of the above please contact the helpdesk.

NRAONSFAUI

The National Radio Astronomy Observatory is a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc.

Last updated 24 April 2007 by Chris Clark