How to Work with SQL Indexes - dummies

How to Work with SQL Indexes

By Allen G. Taylor

The SQL specification doesn’t address the topic of indexes, but that omission doesn’t mean that indexes are rare or even optional parts of a database system. Every SQL implementation supports indexes, but you’ll find no universal agreement on how to support them.

What’s an index?

Data appears in a table in the order in which you entered the information in SQL. That order may have nothing to do with the order in which you later want to process the data. Say that you want to process your CLIENT table in ClientName order. The computer must first sort the table in ClientName order. Sorting the data this way takes time. Larger tables take longer to sort.

What if you have a table with 100,000 rows? Or a table with a million rows? In some applications, such table sizes are not rare. The best sort algorithms would have to make some 20 million comparisons and millions of swaps to put the table in the desired order. Even if you’re using a very fast computer, you may not want to wait that long.

Indexes can be a great timesaver. An index is a subsidiary or support table that goes along with a data table. For every row in the data table, you have a corresponding row in the index table. The order of the rows in the index table is different.

ClientName Address1 Address2 City State
Butternut Animal Clinic 5 Butternut Lane Hudson NH
Amber Veterinary, Inc. 470 Kolvir Circle Amber MI
Vets R Us 2300 Geoffrey Road Suite 230 Anaheim CA
Doggie Doctor 32 Terry Terrace Nutley NJ
The Equestrian Center Veterinary 7890 Paddock Parkway Gallup NM
Dolphin Institute 1002 Marine Drive Key West FL
J. C. Campbell, Credit Vet 2500 Main Street Los Angeles CA
Wenger’s Worm Farm 15 Bait Boulevard Sedona AZ

Here the rows are not in alphabetical order by ClientName. In fact, they aren’t in any useful order at all. The rows are simply in the order in which somebody entered the data.

ClientName Pointer to Data Table
Amber Veterinary, Inc. 2
Butternut Animal Clinic 1
Doggie Doctor 4
Dolphin Institute 6
J. C. Campbell, Credit Vet 7
The Equestrian Center 5
Vets R Us 3
Wenger’s Worm Farm 8

The index contains the field that forms the basis of the index (in this case, ClientName) and a pointer into the data table. The pointer in each index row gives the row number of the corresponding row in the data table.

Why you should want an index

If you want to process a table in ClientName order, and you have an index arranged in ClientName order, you can perform your operation almost as fast as you could if the data table itself were already in ClientName order. You can work through the index, moving immediately to each index row’s corresponding data record by using the pointer in the index.

If you use an index, the table processing time is proportional to N, where N is the number of records in the table. Without an index, the processing time for the same operation is proportional to N lg N, where lg N is the logarithm of N to the base 2. For large tables, the difference is great. Some operations aren’t practical to perform without the help of indexes.

Suppose you have a table containing 1,000,000 records (N = 1,000,000), and processing each record takes one millisecond (one-thousandth of a second). If you have an index, processing the entire table takes only 1,000 seconds — less than 17 minutes.

Without an index, you need to go through the table approximately 1,000,000 x 20 times to achieve the same result. This process would take 20,000 seconds — more than five and a half hours. The difference between 17 minutes and five and a half hours is substantial.

How to maintain an index

After you create an index, you must maintain it. Fortunately, your DBMS maintains your indexes for you automatically, by updating them every time you update the corresponding data tables. This process takes extra time, but it’s worth it. When you create an index and your DBMS maintains it, the index is always available to speed up your data processing, no matter how many times you need to call on it.

The best time to create an index is at the same time you create its corresponding data table. Try to anticipate all the ways that you may want to access your data, and then create an index for each possibility.

Some DBMS products give you the capability to turn off index maintenance. You may want to do so in some real-time applications where updating indexes takes a great deal of time and you have precious little to spare. You may even elect to update the indexes as a separate operation during off-peak hours. As usual, “do what works for you” is the rule.

Don’t fall into the trap of creating an index for retrieval orders that you’re unlikely to use. Index maintenance is an extra operation that the computer must perform every time it modifies the index field or adds or deletes a data table row — and this operation affects performance. Create only those indexes that you expect to use as retrieval keys — and only for tables containing a large number of rows.

You may need to compile something such as a monthly or quarterly report that requires the data in an odd order that you don’t ordinarily need. Create an index just before running that periodic report, run the report, and then drop the index so the DBMS isn’t burdened with maintaining the index during the long period between reports.