By Doug Lowe

An XML document can have a DTD, which spells out exactly what elements can appear in an XML document and in what order the elements can appear. DTD stands for Document Type Definition, but that won’t be on the test.

A DTD for an XML document about movies, for example, may specify that each Movie element must have Title and Price subelements and an attribute named year. It can also specify that the root element must be named Movies and consist of any number of Movie elements.

The main purpose of the DTD is to spell out the structure of an XML document so that users of the document know how to interpret it. Another, equally important use of the DTD is to validate the document to make sure that it doesn’t have any structural errors. If you create a Movies XML document that has two titles for a movie, for example, you can use the DTD to detect the error.

You can store the DTD for an XML document in the same file as the XML data, but more often, you store the DTD in a separate file. That way, you can use a DTD to govern the format of several XML documents of the same type. To indicate the name of the file that contains the DTD, you add a <!DOCTYPE> declaration to the XML document. Here’s an example:

<!DOCTYPE Movies SYSTEM “movies.dtd”>

Here the XML file is identified as a Movies document, whose DTD you can find in the file movies.dtd. Add this tag near the beginning of the movies.xml file, right after the <?xml> tag.

This code shows a DTD file for the movies.xml file.

<?xml version=“1.0” encoding=“UTF-8”?>

<!ELEMENT Movies (Movie*)>

<!ELEMENT Movie (Title, Price)>

<!ATTLIST Movie year CDATA #REQUIRED>

<!ELEMENT Title (#PCDATA)>

<!ELEMENT Price (#PCDATA)>

Each of the ELEMENT tags in a DTD defines a type of element that can appear in the document and indicates what can appear as the content for that element type. The general form of the ELEMENT tag is this:

<!ELEMENT element (content)>

Use the rules listed here to express the content.

Specifying Element Content
Content Description
element* The specified element can occur 0 or more times.
element+ The specified element can occur 1 or more times.
element? The specified element can occur 0 or 1 time.
element1|element2 Either element1 or element2 can appear.
element1, element2 element1 appears, followed by element2.
#PCDATA Text data is allowed.
ANY Any child elements are allowed.
EMPTY No child elements of any type are allowed.

The first ELEMENT tag in the DTD shown above, for example, says that a Movies element consists of zero or more Movie elements. The second ELEMENT tag says that a Movie element consists of a Title element followed by a Price element. The third and fourth ELEMENT tags say that the Title and Price elements consist of text data.

If this notation looks vaguely familiar, that’s because it’s derived from regular expressions.

The ATTLIST tag provides the name of each attribute. Its general form is this:

<!ATTLIST element attribute type default-value>

Here’s a breakdown of this tag:

  • element names the element whose tag the attribute can appear in.
  • attribute provides the name of the attribute.
  • type specifies what can appear as the attribute’s value. The type can be any of the items listed in this table.
  • default provides a default value and indicates whether the attribute is required or optional. default can be any of the items listed in the following table.
Attribute Types
Element The Attribute Value …
CDATA Can be any character string.
(string1|string2…) Can be one of the listed strings.
NMTOKEN Must be a name token, which is a string made up of letters and numbers.
NMTOKENS Must be one or more name tokens separated by white space.
ID Is a name token that must be unique. In other words, no other element in the document can have the same value for this attribute.
IDREF Must be the same as an ID value used elsewhere in the document.
IDREFS Is a list of IDREF values separated by white space.

Check out the attribute defaults here.

Attribute Defaults
Default Optional or Required?
#REQUIRED Required.
#IMPLIED Optional.
value Optional. This value is used if the attribute is omitted.
#FIXED value Optional. If included, however, it must be this value, and if omitted, this value is used by default.

Here’s the ATTLIST tag declaration from movies.dtd:

<!ATTLIST Movie year CDATA #REQUIRED>

This declaration indicates that the attribute goes with the Movie element, is named year, can be any kind of data, and is required.

Here’s an ATTLIST tag that specifies a list of possible values along with a default:

<!ATTLIST Movie genre (SciFi|Action|Comedy|Drama) Comedy>

This form of the ATTLIST tag lets you create an attribute that’s similar to an enumeration, with a list of acceptable values.