How to Use the SQL Union Join
SQL joins are extremely useful. Unlike the other kinds of SQL join, the union join makes no attempt to match a row from the left source table with any rows in the right source table. It creates a new virtual table that contains the SQL union of all the columns in both source tables. In the virtual result table, the columns that came from the left source table contain all the rows that were in the left source table. For those rows, the columns that came from the right source table all have the null value.
Similarly, the columns that came from the right source table contain all the rows that were in the right source table. For those rows, the columns that came from the left source table all have the null value. Thus, the table resulting from a union join contains all the columns of both source tables — and the number of rows it contains is the sum of the number of rows in the two source tables.
The result of a SQL union join by itself is not immediately useful in most cases; it produces a result table with many nulls in it. But you can get useful information from a union join when you use it in conjunction with the
COALESCE expression. Look at an example.
Suppose that you work for a company that designs and builds experimental rockets. You have several projects in the works. You also have several design engineers who have skills in multiple areas. As a manager, you want to know which employees, having which skills, have worked on which projects. Currently, this data is scattered among the EMPLOYEE table, the PROJECTS table, and the SKILLS table.
The EMPLOYEE table carries data about employees, and
EMPLOYEE.EmpID is its primary key. The PROJECTS table has a row for each project that an employee has worked on.
PROJECTS.EmpID is a foreign key that references the EMPLOYEE table. The SKILLS table shows the expertise of each employee.
SKILLS.EmpID is a foreign key that references the EMPLOYEE table.
The EMPLOYEE table has one row for each employee; the PROJECTS table and the SKILLS table have zero or more rows.
The following tables show example data.
From the tables, you can see that Ferguson has worked on X-63 and X-64 structure design and has expertise in mechanical design and aerodynamic loading.
Now suppose that, as a manager, you want to see all the information about all the employees. You decide to apply an equi-join to the EMPLOYEE, PROJECTS, and SKILLS tables:
SELECT * FROM EMPLOYEE E, PROJECTS P, SKILLS S WHERE E.EmpID = P.EmpID AND E.EmpID = S.EmpID ;
You can express this same operation as an inner join by using the following syntax:
Both formulations give the same result.
|1||Ferguson||1||X-63 Structure||1||Mechanical Design|
|1||Ferguson||1||X-63 Structure||1||Aerodynamic Loading|
|1||Ferguson||1||X-64 Structure||1||Mechanical Design|
|1||Ferguson||1||X-64 Structure||1||Aerodynamic Loading|
|2||Frost||2||X-63 Guidance||2||Analog Design|
|2||Frost||2||X-63 Guidance||2||Gyroscope Design|
|2||Frost||2||X-64 Guidance||2||Analog Design|
|2||Frost||2||X-64 Guidance||2||Gyroscope Design|
|3||Toyon||3||X-63 Telemetry||3||Digital Design|
|3||Toyon||3||X-63 Telemetry||3||R/F Design|
|3||Toyon||3||X-64 Telemetry||3||Digital Design|
|3||Toyon||3||X-64 Telemetry||3||R/F Design|
This data arrangement is not particularly enlightening. The employee ID numbers appear three times, and the projects and skills are duplicated for each employee. Bottom line: The SQL inner joins are not well suited to answering this type of question. You can put the SQL union join to work here, along with some strategically chosen
SELECT statements, to produce a more suitable result. You begin with the basic SQL union join:
Notice that the union join has no
ON clause. It doesn’t filter the data, so an
ON clause isn’t needed. This statement produces the result shown in the following table.
Each table has been extended to the right or left with nulls, and those null-extended rows have been union joined. The order of the rows is arbitrary and depends on the implementation. Now you can massage the data to put it in a more useful form.
Notice that the table has three ID columns, two of which are null in any row. You can improve the display by coalescing the ID columns. The
COALESCE expression takes on the value of the first non-null value in a list of values. In the present case, it takes on the value of the only non-null value in a column list:
SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID, E.Name, P.ProjectName, S.Skill FROM EMPLOYEE E UNION JOIN PROJECTS P UNION JOIN SKILLS S ORDER BY ID ;
FROM clause is the same as in the previous example, but now the three
EMP_ID columns are coalesced into a single column named
ID. You’re also ordering the result by
ID. The following table shows the result.
Each row in this result has data about a project or a skill, but not both. When you read the result, you first must determine what type of information is in each row (project or skill). If the
ProjectName column has a non-null value, the row names a project on which the employee has worked. If the
Skill column is not null, the row names one of the employee’s skills.
You can make the result a little clearer by adding another
COALESCE to the
SELECT statement, as follows:
SELECT COALESCE (E.EmpID, P.EmpID, S.EmpID) AS ID, E.Name, COALESCE (P.Type, S.Type) AS Type, P.ProjectName, S.Skill FROM EMPLOYEE E UNION JOIN (SELECT "Project" AS Type, P.* FROM PROJECTS) P UNION JOIN (SELECT "Skill" AS Type, S.* FROM SKILLS) S ORDER BY ID, Type ;
In this union join, the PROJECTS table in the previous example is replaced with a nested
SELECT that appends a column named
P.Type with a constant value
"Project" to the columns coming from the PROJECTS table. Similarly, the SKILLS table is replaced with a nested
SELECT that appends a column named
S.Type with a constant value
"Skill" to the columns coming from the SKILLS table. In each row,
P.Type is either null or
S.Type is either null or
SELECT list specifies a
COALESCE of those two
Type columns into a single column named
Type. You then specify
Type in the
ORDER BY clause, which sorts the rows that all have the same ID in an order that puts all projects first, followed by all skills. The result is shown in the following table.
The result table now presents a very readable account of the project experience and skill sets of all employees in the EMPLOYEE table.
Considering the number of SQL
JOIN operations available, relating data from different tables shouldn’t be a problem, regardless of the tables’ structure. You can trust that if the raw data exists in your database, SQL has the means to get it out and display it in a meaningful form.