Parsing XML Events with SAX - dummies

Parsing XML Events with SAX

By Barry Burd

The word event conjures up all kinds of images. For a nonprogrammer, an event is just “something that happens.” If you’re used to dealing with windows and frames in Java, then you probably think of an event as an occurrence that wakes up a piece of code. For example, a user’s mouse click or keystroke wakes up the code that sets an option and displays an OK box. The click or keystroke itself is called an event because it happens independently of the running program. Only the user knows when he or she will press that button. And when the button gets pressed, some part of the Java program just wakes up and deals with the situation. This scenario is called event-driven programming.

Event-driven programming

SAX programs are event-driven. For example, you get into bed for a good night’s sleep. You reach over to set your alarm clock and then settle in, close your eyes, and become unconscious for a number of hours. Then an important event happens: A certain time of day arrives. When the event takes place, the alarm clock goes into its “wake up” mode — and makes an awful din to stir you out of your restful sleep.

Here’s another scenario. You’re a busy executive and you’ll be out for several hours, but you don’t want to miss any important business. Before leaving the office, you tell your assistant, “Call me if anything important comes up.” Issuing this order is akin to setting the alarm clock. You’re telling your assistant (your alarm clock) to wake you up if an event takes place. Making this request to your assistant (or to your alarm clock) is called registration. In either scenario, you’re registering yourself with a wakeup service. After you’ve registered, you can pursue your leisurely non-activity, ignoring all real business until some event happens. Then . . .

Ring, ring. Your cell phone is hollering at you. “Hello?”

“Hello. This is your assistant. I have the sales figures for the first quarter. They’re 1 million, 4 million, and 2 million.”

“Let’s see. That’s a total of 7 million,” you say. “I’ll note it on my PalmPilot. Thanks.” You hang up.

Several moments later, you get another call. “The president of Big Bucks, Inc., wants to close that deal. They’re talking 10 million dollars.”

“Hmm,” you respond. “That’ll bring our year-to-date revenue up to 17 megabucks. I’ll store that information in my spreadsheet application. Thanks for calling.”

Each of these interactions is known as a callback. Earlier in the day, when you registered your wish with the assistant, you requested a callback. Then, whenever an event takes place, the assistant makes a callback to notify you about the event. In Java programming terms, the assistant calls one of your many methods (one of your Java subprograms).

The essence of event-driven programming

Event-driven programming has three parts:

  • Registration: You register your wish to be notified whenever an event occurs. You register this wish with another piece of code — another object, usually something you’ve imported (such as a piece of code that’s part of somebody else’s API). This object then watches, from behind the scenes, for the occurrence of the event you specified.
  • Event occurrence: A specific event happens.
  • Callback: The other piece of code performs a callback. One of your methods gets called.

Two kinds of code

Distinguishing between active code and passive code is useful:

  • Active code has a main method. Active code, once it starts running, takes the center stage. Active code contains the thread of execution that controls the whole ball game.
  • Passive code just sits there, waiting to be called. A passive Dice class does nothing until some other code calls Dice.roll().

Now, you may think that passive code is all you need for event-driven programming but it’s not. For event handling, you need this registration step. The passive code starts by getting registered with some other piece of code.

To firm up this notion of registration, think about an example from the on-screen world of mice, windows, and buttons. You create a window or frame. You want your frame to respond to mouse clicks, so you issue the following command:


This command registers your frame with the button. The command says, in effect, Whenever a mouse event happens, call one of the frame’s mouse handling methods. Later, when the user clicks the mouse, the frame gets a callback. The computer calls the frame’s mouseClicked method.

SAX events

Sure, SAX is event-driven, but this doesn’t mean that a SAX program waits for mouse clicks. Instead, SAX code follows the register-event-callback model described in the last several paragraphs. Every SAX program has two indispensable pieces of code:

  • A piece of code that you write — called the handler. (Your handler can extend a pre-written DefaultHandler class.)
    The handler is like the million-dollar executive in the preceding section.
  • A piece of code that you normally don’t write — the parser. The parser plays a role like that of the executive’s assistant. The Java 1.4 API has a built-in parser. You create an instance of this parser, and then you register your handler with that parser instance. In effect, you tell the instance to call back your handler whenever an event takes place.

Anything having to do with XML is new, and is still in a state of flux. Because of this, the terminology is patched together in some peculiar ways. While developing SAX version 2, some techies had a make-up-new-names festival. What’s normally called a “parser” is embodied in a Java interface named XMLReader. There used to be a class named org.xml.sax.Parser, but the class got deprecated (which means that you should scrape it off the bottom of your shoe). To make things a bit more complicated, there’s still another parsing tool, javax.xml.parsers.SAXParser. You use this SAXParser to make yourself an XMLReader. With any luck, you’ll quickly become accustomed to this convoluted terminology. For now, remember that what is called a “parser” is usually an instance of XMLReader.

The register-and-callback scenario is what makes SAX event-driven. Now the funny thing is, a SAX event isn’t tangible. A SAX event won’t remind you of a keystroke or a button click. In SAX, the parser scans an XML document from top to bottom. Whenever the parser encounters something interesting, the parser fires off an event and calls the handler. Then, it’s up to the handler to do something about this interesting encounter.