How to Archive Files and Directories Using tar in Linux - dummies

How to Archive Files and Directories Using tar in Linux

By Emmett Dulaney

In Linux systems, you can use the tar command to archive files to a device, such as a hard drive or tape. The tar program in Linux creates an archive file that can contain other directories and files and (optionally) compress the archive for efficient storage. Then the archive is written to a specified device or another file. Many software packages are distributed in the form of a compressed tar file.

The command syntax of the tar program in Linux is as follows:

tar options destination source

Here, options usually is specified by a sequence of single letters, with each letter specifying what tar does; destination is the device name of the backup device; and source is a list of file or directory names denoting the files to back up.

Backing up and restoring a single-volume archive in Linux

Suppose that you want to back up the contents of the /etc/X11 directory on a hard drive. Log in as root, and type the following command, where xxx represents your drive:

tar zcvf /dev/xxx /etc/X11

The tar program displays a list of filenames as each file is copied to the compressed tar archive. In this case, the options are zcvf, the destination is /dev/xxx (the drive), and the source is the /etc/X11 directory (which implies all its subdirectories and their contents). You can use a similar tar command to back up files to a tape by replacing the hard drive location with that of the tape device, such as /dev/st0 for a SCSI tape drive.

This table defines a few common tar options in Linux.

Common tar Options
Option Does the Following
c Creates a new archive.
f Specifies the name of the archive file or device on the next field in the command line.
M Specifies a multivolume archive.
t Lists the contents of the archive.
v Displays verbose messages.
x Extracts files from the archive.
z Compresses the tar archive by using gzip.

To view the contents of the tar archive that you create on the drive, type the following command (replacing xxx with the drive device):

tar ztf /dev/xxx

You see a list of filenames (each beginning with /etc/X11) indicating what’s in the backup. In this tar command, the t option lists the contents of the tar archive.

To extract the files from a tar backup, follow these steps while logged in as root:

  1. Change the directory to/tmp by typing this command:
    cd /tmp

    This step is where you can practice extracting the files from the tar backup. For a real backup, change the directory to an appropriate location. (Typically, you type cd /.)

  2. Type the following command:
    tar zxvf /dev/xxx

    This tar command uses the x option to extract the files from the archive stored on the device (replace xxx with the drive).

Now if you check the contents of the /tmp directory, you notice that the tar command creates an etc/X11 directory tree in /tmp and restores all the files from the tar archive to that directory. The tar command strips the leading / from the filenames in the archive and restores the files in the current directory. If you want to restore the /etc/X11 directory from the archive, use this command (substituting the device name for xxx):

tar zxvf /dev/xxx -C /

The -C option changes directories to the directory specified (in this case, the root directory of /) before doing the tar; the / at the end of the command denotes the directory where you want to restore the backup files.

In Linux systems, you can use the tar command to create, view, and restore an archive. You can store the archive in a file or in any device you specify with a device name.

Backing up and restoring a multivolume archive in Linux

Sometimes, the capacity of a single storage medium is less than the total storage space needed to store the archive. In this case, you can use the M option for a multivolume archive, meaning that the archive can span multiple tapes. Note, however, that you can’t create a compressed, multivolume archive, so you have to drop the z option.

The M tells tar to create a multivolume archive. The tar command prompts you for a second media when the first one is filled. Take out the first media and insert another when you see the following prompt:

Prepare volume #2 and hit return:

When you press Enter, the tar program continues with the second media. For larger archives, the tar program continues to prompt for new media as needed.

To restore from this multivolume archive, type cd /tmp to change the directory to /tmp. (The /tmp is used directory for illustrative purposes, but you have to use a real directory when you restore files from archive.) Then type (replacing xxx with the device you’re using)

tar xvfM /dev/xxx

The tar program prompts you to feed the media as necessary.

Use the du -s command to determine the amount of storage you need for archiving a directory. Type du -s /etc to see the total size of the /etc directory in kilobytes, for example. Here’s typical output from that command:

35724 /etc

The resulting output shows that the /etc directory requires at least 35,724 kilobytes of storage space to back up.

Backing up on tapes for Linux systems

Although backing up on tapes is as simple as using the right device name in the tar command, you do have to know some nuances of the tape device to use it well. When you use tar to back up to the device named /dev/st0 (the first SCSI tape drive), the tape device automatically rewinds the tape when the tar program finishes copying the archive to the tape. The /dev/st0 device is called a rewinding tape device because it rewinds tapes by default.

If your tape can hold several gigabytes of data, you may want to write several tar archives — one after another — to the same tape. (Otherwise, much of the tape may be left empty.) If you plan to do so, your tape device can’t rewind the tape after the tar program finishes. To help you with scenarios like this one, several Linux tape devices are nonrewinding. The nonrewinding SCSI tape device is called /dev/nst0. Use this device name if you want to write one archive after another on a tape.

After each archive, the nonrewinding tape device writes an end of file (EOF) marker to separate one archive from the next. Use the mt command to control the tape; you can move from one marker to the next or rewind the tape. When you finish writing several archives to a tape using the /dev/nst0 device name, for example, you can force the tape to rewind with the following command:

mt -f /dev/nst0 rewind

After rewinding the tape, you can use the following command to extract files from the first archive to the current disk directory:

tar xvf /dev/nst0

After that, you must move past the EOF marker to the next archive. To do so, use the following mt command:

mt -f /dev/nst0 fsf 1

This command positions the tape at the beginning of the next archive. Now use the tar xvf command again to read this archive.

If you save multiple archives on a tape, you have to keep track of the archives yourself. The order of the archives can be hard to remember, so you may be better off simply saving one archive per tape.

Performing incremental backups in Linux

Suppose that you use tar to back up your system’s hard drive on a tape. Because creating a full backup can take quite some time, you don’t want to repeat this task every night. (Besides, only a small number of files may have changed during the day.) To locate the files that need backing up, you can use the find command to list all files that have changed in the past 24 hours:

find / -mtime -1 -type f -print

This command prints a list of files that have changed within the past day. The -mtime -1 option means that you want the files that were last modified less than one day ago. Now you can combine this find command with the tar command to back up only those files that have changed within the past day:

tar cvf /dev/st0 `find / -mtime -1 -type f -print`

When you place a command between single back quotes, the shell executes that command and places the output at that point in the command line. The result is that the tar program saves only the changed files in the archive. This process gives you an incremental backup of only the files that have changed since the previous day.

Performing automated backups in Linux

In Linux systems, you can use crontab to set up recurring jobs (called cron jobs). The Linux system performs these tasks at regular intervals. Backing up your system is a good use of the crontab facility. Suppose that your backup strategy is as follows:

  • Every Sunday at 1:15 a.m., your system backs up the entire hard drive on the tape.
  • Monday through Saturday, your system performs an incremental backup at 3:10 a.m. by saving only those files that have changed during the past 24 hours.

To set up this automated backup schedule, log in as root, and type the following lines in a file named backups (assuming that you’re using a SCSI tape drive):

15 1 * * 0 tar zcvf /dev/st0 /
10 3 * * 1-6 tar zcvf /dev/st0 `find / -mtime -1 -type f -print`

Next, submit this job schedule by using the following crontab command:

crontab backups

Now you’re set for an automated backup. All you need to do is to place a new tape in the tape drive every day. Remember also to give each tape an appropriate label.