How to Use the SQL UNION Operator

The UNION operator is the SQL implementation of relational algebra’s union operator. The UNION operator enables you to draw information from two or more tables that have the same structure. Same structure means

  • The tables must all have the same number of columns.

  • Corresponding columns must all have identical data types and lengths.

When these criteria are met, the tables are union-compatible: The union of the two tables returns all the rows that appear in either table and eliminates duplicates.

Suppose you create a baseball-statistics database. It contains two union-compatible tables named AMERICAN and NATIONAL. Both tables have three columns, and corresponding columns are all the same type. In fact, corresponding columns have identical column names (although this condition isn’t required for union compatibility).

NATIONAL lists the players’ names and the number of complete games pitched by National League pitchers. AMERICAN lists the same information about pitchers in the American League. The UNION of the two tables gives you a virtual result table containing all the rows in the first table plus all the rows in the second table. For this example, a few rows were put in each table to illustrate the operation:

SELECT * FROM NATIONAL ;
FirstName LastName CompleteGames
--------- -------- -------------
Sal   Maglie     11
Don   Newcombe    9
Sandy  Koufax     13
Don   Drysdale    12
SELECT * FROM AMERICAN ;
FirstName LastName CompleteGames
--------- -------- -------------
Whitey  Ford     12
Don   Larson     10
Bob   Turley     8
Allie  Reynolds    14
SELECT * FROM NATIONAL
UNION
SELECT * FROM AMERICAN ;
FirstName LastName CompleteGames
--------- -------- -------------
Allie  Reynolds    14
Bob   Turley     8
Don   Drysdale    12
Don   Larson     10
Don   Newcombe    9
Sal   Maglie     11
Sandy  Koufax     13
Whitey  Ford     12

The UNION DISTINCT operator functions identically to the UNION operator without the DISTINCT keyword. In both cases, duplicate rows are eliminated from the result set.

The asterisk (*) can be used as shorthand for all the columns in a table. This shortcut is fine most of the time, but it can get you into trouble when you use relational operators in embedded or module-language SQL.

If you add one or more new columns to one table and not to another, or you add different columns to the two tables, the two tables are no longer union-compatible — and your program will be invalid the next time it’s recompiled.

Even if the same new columns are added to both tables so they’re still union-compatible, your program is probably not prepared to deal with additional data. You should list the columns you want, rather than relying on the * shorthand. When you’re entering ad hoc SQL queries from the console, the asterisk works fine, because you can quickly display a table structure to verify union compatibility if queries aren’t successful.

The UNION ALL operation

The UNION operation usually eliminates any duplicate rows that result from its operation, which is the desired result most of the time. Sometimes, however, you may want to preserve duplicate rows. On those occasions, use UNION ALL.

Referring to the example, suppose that “Bullet” Bob Turley had been traded in midseason from the New York Yankees in the American League to the Brooklyn Dodgers in the National League. Now suppose that during the season, he pitched eight complete games for each team.

The ordinary UNION displayed in the example throws away one of the two lines containing Turley’s data. Although he seemed to pitch only 8 complete games in the season, he actually hurled a remarkable 16 complete games. The following query gives you the true facts:

SELECT * FROM NATIONAL
UNION ALL
SELECT * FROM AMERICAN ;

You can sometimes form a UNION of two tables even if they’re not union-compatible. If the columns you want in your result table are present and compatible in both tables, you can perform a UNION CORRESPONDING operation. Only the specified columns are considered — and they are the only columns displayed in the result table.

The CORRESPONDING operation

Baseball statisticians keep different statistics on pitchers than they keep on outfielders. In both cases, first names, last names, putouts, errors, and fielding percentages are recorded. Outfielders don’t have a won/lost record, a saves record, or a number of other stats that pertain to pitching. You can still perform a UNION that takes data from the OUTFIELDER table and from the PITCHER table to give you information about defensive skill:

SELECT *
 FROM OUTFIELDER
UNION CORRESPONDING
 (FirstName, LastName, Putouts, Errors, FieldPct)
SELECT *
 FROM PITCHER ;

The result table holds the first and last names of all the outfielders and pitchers, along with the putouts, errors, and fielding percentage of each player. As with the simple UNION, duplicates are eliminated. Thus, if a player spent some time in the outfield and also pitched in one or more games, the UNION CORRESPONDING operation loses some of his statistics. To avoid this problem, use UNION ALL CORRESPONDING.

Each column name in the list following the CORRESPONDING keyword must be a name that exists in both union-joined tables. If you omit this list of names, an implicit list of all names that appear in both tables is used. But this implicit list of names may change when new columns are added to the tables. Therefore you’re better off explicitly listing the column names than you omitting them.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win $500. Easy.