How to Use Nested SQL Queries that Return Sets of Rows
To illustrate how a nested SQL query returns a set of rows, imagine that you work for a systems integrator of computer equipment. Your company, Zetec Corporation, assembles systems from components that you buy, and then it sells them to companies and government agencies. You keep track of your business with a relational database.
The database consists of many tables, but right now you’re concerned with only three of them: the PRODUCT table, the COMP_USED table, and the COMPONENT table. The PRODUCT table contains a list of all your standard products.
|Model||CHAR (6)||PRIMARY KEY|
The COMPONENT table lists components that go into your products.
|CompID||CHAR (6)||PRIMARY KEY|
The COMP_USED table tracks which components go into each product.
|Model||CHAR (6)||FOREIGN KEY (for PRODUCT)|
|CompID||CHAR (6)||FOREIGN KEY (for COMPONENT)|
A component may be used in multiple products, and a product can contain multiple components (a many-to-many relationship). This situation can cause integrity problems. To circumvent the problems, create the linking table COMP_USED to relate COMPONENT to PRODUCT. A component may appear in many rows in the COMP_USED table, but each of those rows will reference only one component (a one-to-many relationship).
Similarly, a product may appear in many rows in COMP_USED, but each row references only one product (another one-to-many relationship). Adding the linking table transforms a troublesome many-to-many relationship into two relatively simple one-to-many relationships. This process of reducing the complexity of relationships is one example of normalization.
Subqueries introduced by the keyword IN
One form of a nested query compares a single value with the set of values returned by a SELECT statement. It uses the IN predicate with the following syntax:
SELECT column_list FROM table WHERE expression IN (subquery) ;
The expression in the WHERE clause evaluates to a value. If that value is IN the list returned by the subquery, then the WHERE clause returns a True value. The specified columns from the table row being processed are added to the result table. The subquery may reference the same table referenced by the outer query, or it may reference a different table.
In the following example, the Zetec’s database is used to demonstrate this type of query. Assume that there’s a shortage of computer monitors in the computer industry, so that when you run out of monitors, you can no longer deliver products that include them. You want to know which products are affected. Glancing gratefully at your own monitor, enter the following query:
SELECT Model FROM COMP_USED WHERE CompID IN (SELECT CompID FROM COMPONENT WHERE CompType = ‘Monitor’) ;
SQL processes the innermost query first, so it processes the COMPONENT table, returning the value of CompID for every row where CompType is ‘Monitor’. The result is a list of the ID numbers of all monitors. The outer query then compares the value of CompID in every row in the COMP_USED table against the list.
If the comparison is successful, the value of the Model column for that row is added to the outer SELECT’s result table. The result is a list of all product models that include a monitor. The following example shows what happens when you run the query:
Model ----- CX3000 CX3010 CX3020 MB3030 MX3020 MX3030
You now know which products will soon be out of stock. It’s time to go to the sales force and tell them to slow down on promoting these products.
When you use this form of nested query, the subquery must specify a single column, and that column’s data type must match the data type of the argument preceding the IN keyword.
Hopefully, you remember the KISS principle. Keeping things simple is important when you’re dealing with software of any kind, but it is especially important when dealing with database software. Statements that include nested SELECTs can be difficult to get right.
One way to get them working the way they should is to run the inner SELECT all by itself first and then verify that the result you get is the result you expect. When you’re sure the inner SELECT is functioning properly, you can enclose it in the outer part of the statement and have a better chance that the whole thing will work as advertised.
Subqueries introduced by the keyword NOT IN
Just as you can introduce a subquery with the IN keyword, you can do the opposite and introduce it with the NOT IN keywords. In fact, now is a great time for Zetec management to make such a query. Zetec management found out what products not to sell. That is valuable information, but what Zetec management really wants to know is what products to sell.
Management wants to emphasize the sale of products that don’t contain monitors. A nested query featuring a subquery introduced by the NOT IN keywords provides the requested information:
SELECT Model FROM COMP_USED WHERE CompID NOT IN (SELECT CompID FROM COMPONENT WHERE CompType = 'Monitor')) ;
This query produces the following result:
Model ----- PX3040 PB3050 PX3040 PB3050
In the example, the number of rows does not create a problem because the result table is short. In the real world, such a result table may have hundreds or thousands of rows. To avoid confusion, it’s best to eliminate duplicates. You can do so easily by adding the DISTINCT keyword to the query. Only rows that are distinct from all previously retrieved rows are added to the result table:
SELECT DISTINCT Model FROM COMP_USED WHERE CompID NOT IN (SELECT CompID FROM COMPONENT WHERE CompType = 'Monitor')) ;
As expected, the result is as follows:
Model ----- PX3040 PB3050