How to Use the SQL Union Join - dummies

How to Use the SQL Union Join

By Allen G. Taylor

SQL joins are extremely useful. Unlike the other kinds of SQL join, the union join makes no attempt to match a row from the left source table with any rows in the right source table. It creates a new virtual table that contains the SQL union of all the columns in both source tables. In the virtual result table, the columns that came from the left source table contain all the rows that were in the left source table. For those rows, the columns that came from the right source table all have the null value.

Similarly, the columns that came from the right source table contain all the rows that were in the right source table. For those rows, the columns that came from the left source table all have the null value. Thus, the table resulting from a union join contains all the columns of both source tables — and the number of rows it contains is the sum of the number of rows in the two source tables.

The result of a SQL union join by itself is not immediately useful in most cases; it produces a result table with many nulls in it. But you can get useful information from a union join when you use it in conjunction with the COALESCE expression. Look at an example.

Suppose that you work for a company that designs and builds experimental rockets. You have several projects in the works. You also have several design engineers who have skills in multiple areas. As a manager, you want to know which employees, having which skills, have worked on which projects. Currently, this data is scattered among the EMPLOYEE table, the PROJECTS table, and the SKILLS table.

The EMPLOYEE table carries data about employees, and EMPLOYEE.EmpID is its primary key. The PROJECTS table has a row for each project that an employee has worked on. PROJECTS.EmpID is a foreign key that references the EMPLOYEE table. The SKILLS table shows the expertise of each employee. SKILLS.EmpID is a foreign key that references the EMPLOYEE table.

The EMPLOYEE table has one row for each employee; the PROJECTS table and the SKILLS table have zero or more rows.

The following tables show example data.

EMPLOYEE Table
EmpID Name
1 Ferguson
2 Frost
3 Toyon
PROJECTS Table
ProjectName EmpID
X-63 Structure 1
X-64 Structure 1
X-63 Guidance 2
X-64 Guidance 2
X-63 Telemetry 3
X-64 Telemetry 3
SKILLS Table
Skill EmpID
Mechanical Design 1
Aerodynamic Loading 1
Analog Design 2
Gyroscope Design 2
Digital Design 3
R/F Design 3

From the tables, you can see that Ferguson has worked on X-63 and X-64 structure design and has expertise in mechanical design and aerodynamic loading.

Now suppose that, as a manager, you want to see all the information about all the employees. You decide to apply an equi-join to the EMPLOYEE, PROJECTS, and SKILLS tables:

SELECT *
  FROM EMPLOYEE E, PROJECTS P, SKILLS S
  WHERE E.EmpID = P.EmpID
   AND E.EmpID = S.EmpID ;

You can express this same operation as an inner join by using the following syntax:

Both formulations give the same result.

Result of SQL Inner Join
E.EmpID Name P.EmpID ProjectName S.EmpID Skill
1 Ferguson 1 X-63 Structure 1 Mechanical Design
1 Ferguson 1 X-63 Structure 1 Aerodynamic Loading
1 Ferguson 1 X-64 Structure 1 Mechanical Design
1 Ferguson 1 X-64 Structure 1 Aerodynamic Loading
2 Frost 2 X-63 Guidance 2 Analog Design
2 Frost 2 X-63 Guidance 2 Gyroscope Design
2 Frost 2 X-64 Guidance 2 Analog Design
2 Frost 2 X-64 Guidance 2 Gyroscope Design
3 Toyon 3 X-63 Telemetry 3 Digital Design
3 Toyon 3 X-63 Telemetry 3 R/F Design
3 Toyon 3 X-64 Telemetry 3 Digital Design
3 Toyon 3 X-64 Telemetry 3 R/F Design

This data arrangement is not particularly enlightening. The employee ID numbers appear three times, and the projects and skills are duplicated for each employee. Bottom line: The SQL inner joins are not well suited to answering this type of question. You can put the SQL union join to work here, along with some strategically chosen SELECT statements, to produce a more suitable result. You begin with the basic SQL union join:

Notice that the union join has no ON clause. It doesn’t filter the data, so an ON clause isn’t needed. This statement produces the result shown in the following table.

Result of SQL Union Join
E.EmpID Name P.EmpID ProjectName S.EmpID Skill
1 Ferguson NULL NULL NULL NULL
NULL NULL 1 X-63 Structure NULL NULL
NULL NULL 1 X-64 Structure NULL NULL
NULL NULL NULL NULL 1 Mechanical Design
NULL NULL NULL NULL 1 Aerodynamic Loading
2 Frost NULL NULL NULL NULL
NULL NULL 2 X-63 Guidance NULL NULL
NULL NULL 2 X-64 Guidance NULL NULL
NULL NULL NULL NULL 2 Analog Design
NULL NULL NULL NULL 2 Gyroscope Design
3 Toyon NULL NULL NULL NULL
NULL NULL 3 X-63 Telemetry NULL NULL
NULL NULL 3 X-64 Telemetry NULL NULL
NULL NULL NULL NULL 3 Digital Design
NULL NULL NULL NULL 3 R/F Design

Each table has been extended to the right or left with nulls, and those null-extended rows have been union joined. The order of the rows is arbitrary and depends on the implementation. Now you can massage the data to put it in a more useful form.

Notice that the table has three ID columns, two of which are null in any row. You can improve the display by coalescing the ID columns. The COALESCE expression takes on the value of the first non-null value in a list of values. In the present case, it takes on the value of the only non-null value in a column list:

SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID,
   E.Name, P.ProjectName, S.Skill
  FROM EMPLOYEE E UNION JOIN PROJECTS P
   UNION JOIN SKILLS S
  ORDER BY ID ;

The FROM clause is the same as in the previous example, but now the three EMP_ID columns are coalesced into a single column named ID. You’re also ordering the result by ID. The following table shows the result.

Result of SQL Union Join with COALESCE Expression
ID Name ProjectName Skill
1 Ferguson X-63 Structure NULL
1 Ferguson X-64 Structure NULL
1 Ferguson NULL Mechanical Design
1 Ferguson NULL Aerodynamic Loading
2 Frost X-63 Guidance NULL
2 Frost X-64 Guidance NULL
2 Frost NULL Analog Design
2 Frost NULL Gyroscope Design
3 Toyon X-63 Telemetry NULL
3 Toyon X-64 Telemetry NULL
3 Toyon NULL Digital Design
3 Toyon NULL R/F Design

Each row in this result has data about a project or a skill, but not both. When you read the result, you first must determine what type of information is in each row (project or skill). If the ProjectName column has a non-null value, the row names a project on which the employee has worked. If the Skill column is not null, the row names one of the employee’s skills.

You can make the result a little clearer by adding another COALESCE to the SELECT statement, as follows:

SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID,
   E.Name, COALESCE (P.Type, S.Type) AS Type,
   P.ProjectName, S.Skill
  FROM EMPLOYEE E
   UNION JOIN (SELECT "Project" AS Type, P.*
           FROM PROJECTS) P
   UNION JOIN (SELECT "Skill" AS Type, S.*
           FROM SKILLS) S
  ORDER BY ID, Type ;

In this union join, the PROJECTS table in the previous example is replaced with a nested SELECT that appends a column named P.Type with a constant value "Project" to the columns coming from the PROJECTS table. Similarly, the SKILLS table is replaced with a nested SELECT that appends a column named S.Type with a constant value "Skill" to the columns coming from the SKILLS table. In each row, P.Type is either null or "Project", and S.Type is either null or "Skill".

The outer SELECT list specifies a COALESCE of those two Type columns into a single column named Type. You then specify Type in the ORDER BY clause, which sorts the rows that all have the same ID in an order that puts all projects first, followed by all skills. The result is shown in the following table.

Refined Result of SQL Union Join with COALESCE Expressions
ID Name Type ProjectName Skill
1 Ferguson Project X-63 Structure NULL
1 Ferguson Project X-64 Structure NULL
1 Ferguson Skill NULL Mechanical Design
1 Ferguson Skill NULL Aerodynamic Loading
2 Frost Project X-63 Guidance NULL
2 Frost Project X-64 Guidance NULL
2 Frost Skill NULL Analog Design
2 Frost Skill NULL Gyroscope Design
3 Toyon Project X-63 Telemetry NULL
3 Toyon Project X-64 Telemetry NULL
3 Toyon Skill NULL Digital Design
3 Toyon Skill NULL R/F Design

The result table now presents a very readable account of the project experience and skill sets of all employees in the EMPLOYEE table.

Considering the number of SQL JOIN operations available, relating data from different tables shouldn’t be a problem, regardless of the tables’ structure. You can trust that if the raw data exists in your database, SQL has the means to get it out and display it in a meaningful form.