By Allen G. Taylor

The SQL SELECT statement is your tool for digging up hidden information in a database. Even if you have a clear idea of what you want to retrieve, translating that idea into SQL can be a challenge. If your formulation is just a little off, you may end up with the wrong results. To reduce your chances of being misled, use the following ten principles.

Verify the database structure

If you retrieve data from a database and your results don’t seem reasonable, check the database design. Many poorly designed databases are in use, and if you’re working with one, fix the design before you try any other remedy. Remember — good design is a prerequisite of data integrity.

Try queries on a test database

Create a test database that has the same structure as your production database, but with only a few representative rows in the tables. Choose the data so that you know in advance what the results of your queries should be.

Run each test query on the test data and see whether the results match your expectations. If they don’t, you may need to reformulate your queries. If a query is properly formulated but you end up with bad results all the same, you may need to restructure your database.

Build several sets of test data and be sure to include odd cases, such as empty tables and extreme values at the very limit of allowable ranges. Try to think of unlikely scenarios and check for proper behavior when they occur. In the course of checking for unlikely cases, you may gain insight into problems that are more likely to happen.

Double-check queries that include joins

Joins are notoriously counterintuitive. If your query contains one, make sure that it’s doing what you expect before you add WHERE clauses or other complicating factors.

Triple-check queries with subselects

Queries with subselects take data from one table and, based on what is retrieved, take some data from another table. Therefore, by definition, such queries can really be hard to get right. Make sure the data that the inner SELECT retrieves is the data that the outer SELECT needs to produce the desired result. If you have two or more levels of subselects, you need to be even more careful.

Summarize data with GROUP BY

Say that you have a table (NATIONAL) that contains the name (Player), team (Team), and number of home runs hit (Homers) by every baseball player in the National League. You can retrieve the team homer total for all teams with a query like this:

SELECT Team, SUM (Homers)
 FROM NATIONAL
 GROUP BY Team ;

This query lists each team, followed by the total number of home runs hit by all that team’s players.

Watch GROUP BY clause restrictions

Suppose that you want a list of National League power hitters. Consider the following query:

SELECT Player, Team, Homers
 FROM NATIONAL
 WHERE Homers >= 20
 GROUP BY Team ;

In most implementations, this query returns an error. Generally, only columns used for grouping or columns used in a set function may appear in the select list. However, if you want to view this data, the following formulation works:

SELECT Player, Team, Homers
 FROM NATIONAL
 WHERE Homers >= 20
 GROUP BY Team, Player, Homers ;

Because all the columns you want to display appear in the GROUP BY clause, the query succeeds and delivers the desired results. This formulation sorts the resulting list first by Team, then by Player, and finally by Homers.

Use parentheses with AND, OR, and NOT

Sometimes when you mix AND and OR, SQL doesn’t process the expression in the order that you expect. Use parentheses in complex expressions to make sure that you get the desired results. Typing a few extra keystrokes is a small price to pay for better results.

Parentheses also help to ensure that the NOT keyword is applied to the term or expression that you want it to apply to.

Control retrieval privileges

Many people don’t use the security features available in their DBMS. They don’t want to bother with them because they think misuse and misappropriation of data are things that only happen to other people. Don’t wait to get burned. Establish and maintain security for all databases that have any value.

Back up your databases regularly

Understatement alert: Data is hard to retrieve after a power surge, a fire, an earthquake, or some other disaster destroys your hard drive. (Remember, sometimes computers just die for no good reason.) Make frequent backups and put the backup media in a safe place.

What constitutes a safe place depends on how critical your data is. It might be a fireproof safe in the same room as your computer. It might be in another building. It might be in the cloud. It might be in a concrete bunker under a mountain that has been hardened to withstand a nuclear attack. Decide what level of safety is appropriate for your data.

Handle error conditions gracefully

Whether you’re making ad hoc queries from a workstation or embedding queries in an application, occasionally SQL returns an error message rather than the desired results. At a workstation, you can decide what to do next, based on the message returned.

In an application, the situation is different. The application user probably doesn’t know what action is appropriate. Put extensive error handling into your applications to cover every conceivable error that may occur. Creating error-handling code takes a great deal of effort, but it’s better than having the user stare quizzically at a frozen screen.