SQL JOINS & SET OPERATORS
Combining data across tables and queries
PDF Slides
1. Short summary
Two big ways to combine data in SQL:
- JOINS – glue tables side by side (add columns).
- SET OPERATORS – stack result sets on top of each other (add rows).
Main join types (INNER, LEFT, RIGHT, FULL, ANTI, CROSS), how to join multiple tables, and how to use UNION, UNION ALL, EXCEPT and INTERSECT safely and correctly.
2. Big picture – where this fits
Two main families for combining data:
- Joins – combine data by columns (A ⟷ B), matching rows using keys.
- Set operators – combine data by rows (A ↑ B), treating each SELECT result as a set.
Typical tasks:
- Recombine data – get the “big picture” by merging related tables (customers + orders + addresses).
- Data enrichment – add extra info from lookup/reference tables (e.g., ZIP codes).
- Check existence / filtering – keep or remove rows based on whether they have matches elsewhere (e.g., customers with no orders – anti joins).
- Combine information from similar tables (Employees + Customers) or current vs archive (Orders + OrdersArchive).
- Delta detection / completeness checks using EXCEPT/INTERSECT + unions.
3. Key concepts & definitions
3.1 Types of JOINS
(“A” = left table, “B” = right table)
| Join type | What rows do you get? | Does table order matter? | Typical use |
|---|---|---|---|
| No join | Separate results from each table (two independent SELECTs). | N/A | Just view data from each table, no relationship. |
| INNER JOIN | Only rows that match in both tables (the overlapping part of the Venn diagram). | No – A ⋈ B = B ⋈ A. | When you only care about matching data (e.g., customers with orders). |
| LEFT JOIN | All rows from left table + matching rows from right; non-matches on right = NULL. | Yes – left side is “kept”. | Primary table + optional extra info; find “missing” matches via NULL. |
| RIGHT JOIN | All rows from right table + matching rows from left; non-matches on left = NULL. | Yes – right side is “kept”. | Same as LEFT, but with emphasis on right table (less common). |
| FULL JOIN | All rows from both tables; matches combined, non-matches filled with NULLs. | No – A FULL B = B FULL A. | Union of both sides with alignment where possible. |
| LEFT ANTI JOIN | Rows from left table with no match in right (using LEFT JOIN + WHERE B.key IS NULL). | Yes | Find “only in A” rows, e.g. customers without orders. |
| RIGHT ANTI JOIN | Rows from right table with no match in left. | Yes | Find “only in B” rows, e.g. orders without customers. |
| FULL ANTI JOIN | Rows that don’t match on either side (unmatched from A + unmatched from B). | No | See all inconsistencies between two tables. |
| CROSS JOIN | All combinations of rows from A and B (Cartesian product). | N/A | Rare, mostly for testing, generating grids, or careful analytics. |
Extra ideas from the diagrams:
- “Primary / main source of data” = the table from which you want all rows (LEFT / RIGHT).
- “Lookup table” for anti joins is used as a filter, not as a source of columns.
3.2 Set operators
They combine results of two SELECTs (each SELECT is like its own virtual table).
| Operator | Effect (set theory) | Duplicates? |
|---|---|---|
| UNION | All rows from both sets, duplicates removed (like mathematical union). | Removed. |
| UNION ALL | All rows from both sets, keeping duplicates. | Kept. |
| EXCEPT | Rows in first set that are not in second (“A minus B”). | Unique results. |
| INTERSECT | Only rows that appear in both sets (common rows). | Unique results. |
ORDER BY can only be used at the very end of the combined query, not after each SELECT.
3.3 Shared rules for SET operators (from 06_SETs.sql)
When using UNION/UNION ALL/EXCEPT/INTERSECT:
- The data types of corresponding columns must match (or be compatible).
- The order of columns in each SELECT must be the same.
- The column names in the final result come from the first SELECT.
- You must use the correct columns in matching positions; swapping first/last names etc. may produce nonsense.
- (Implied by SQL standard, though not literally written): the SELECTs must have the same number of columns.
The script includes example queries under “RULE: Data Types / Column Order / Column Aliases / Correct Columns” to illustrate these rules.
4. Syntax cheat-sheet
4.1 Core join patterns
Basic INNER JOIN
SELECT
A.col1,
B.col2
FROM A
INNER JOIN B
ON A.key = B.key;
- Returns only rows where
A.keyequalsB.key.
LEFT JOIN
SELECT
A.col1,
B.col2
FROM A
LEFT JOIN B
ON A.key = B.key;
- Keeps all rows from A, adds B’s columns when a match exists; otherwise B’s columns are NULL.
RIGHT JOIN
SELECT
A.col1,
B.col2
FROM A
RIGHT JOIN B
ON A.key = B.key;
- Keeps all rows from B.
- Alternative: flip the tables and use LEFT JOIN instead (see slides and script).
FULL JOIN
SELECT
A.col1,
B.col2
FROM A
FULL JOIN B
ON A.key = B.key;
- Returns everything from both tables, matching where possible.
4.2 Anti-join patterns (from slides + SQL)
LEFT ANTI JOIN – “rows in A with no match in B”
SELECT
A.*
FROM A
LEFT JOIN B
ON A.key = B.key
WHERE B.key IS NULL;
RIGHT ANTI JOIN
SELECT
B.*
FROM A
RIGHT JOIN B
ON A.key = B.key
WHERE A.key IS NULL;
Alternative to RIGHT ANTI JOIN using LEFT JOIN
SELECT
B.*
FROM B
LEFT JOIN A
ON A.key = B.key
WHERE A.key IS NULL;
FULL ANTI JOIN – unmatched on either side
SELECT
A.*,
B.*
FROM A
FULL JOIN B
ON A.key = B.key
WHERE A.key IS NULL
OR B.key IS NULL;
4.3 Cross join pattern
SELECT
*
FROM A
CROSS JOIN B;
- Returns all combinations of rows from A and B (A×B).
4.4 Joining multiple tables (4-table pattern)
From the multi-table example in 05_JOINs.sql:
SELECT
o.OrderID,
o.Sales,
c.FirstName AS CustomerFirstName,
c.LastName AS CustomerLastName,
p.Product AS ProductName,
p.Price,
e.FirstName AS EmployeeFirstName,
e.LastName AS EmployeeLastName
FROM Sales.Orders AS o
LEFT JOIN Sales.Customers AS c
ON o.CustomerID = c.CustomerID
LEFT JOIN Sales.Products AS p
ON o.ProductID = p.ProductID
LEFT JOIN Sales.Employees AS e
ON o.SalesPersonID = e.EmployeeID;
General pattern:
FROM fact_table ft
LEFT JOIN dim1 d1 ON ft.key1 = d1.key1
LEFT JOIN dim2 d2 ON ft.key2 = d2.key2
LEFT JOIN dim3 d3 ON ft.key3 = d3.key3
...
4.5 Set operator pattern
General template :
SELECT ...
FROM ...
-- JOIN / WHERE / GROUP BY etc.
<SET OPERATOR> -- UNION / UNION ALL / EXCEPT / INTERSECT
SELECT ...
FROM ...
-- JOIN / WHERE / GROUP BY etc.
ORDER BY ...; -- only here, at the very end
Important:
- You can use joins, WHERE, GROUP BY, etc. in each SELECT.
- You cannot put an
ORDER BYinside each part (unless you wrap each SELECT in parentheses / subqueries); by defaultORDER BYgoes once at the end.
5. Worked examples (with explanations)
5.1 Basic joins with Customers & Orders
No join
SELECT * FROM customers;
SELECT * FROM orders;
- Two completely separate result sets; nothing combined.
INNER JOIN – “customers with orders”
SELECT
c.id,
c.first_name,
o.order_id,
o.sales
FROM customers AS c
INNER JOIN orders AS o
ON c.id = o.customer_id;
- Only customers whose
idappears ascustomer_idinorders. - If a customer never ordered, they do not appear.
LEFT JOIN – “all customers, orders optional”
SELECT
c.id,
c.first_name,
o.order_id,
o.sales
FROM customers AS c
LEFT JOIN orders AS o
ON c.id = o.customer_id;
- All customers appear.
order_idandsalesare NULL if the customer has no orders.- Good for “data enrichment” or for later filtering to find “no orders” via
WHERE o.order_id IS NULL.
RIGHT JOIN – “all orders, customers optional”
SELECT
c.id,
c.first_name,
o.order_id,
o.customer_id,
o.sales
FROM customers AS c
RIGHT JOIN orders AS o
ON c.id = o.customer_id;
- Every order appears, even if
customer_iddoesn’t match a real customer. - If there’s no matching customer,
c.idandc.first_nameare NULL.
Alternative using LEFT JOIN (from script & slide “Alternative To RIGHT JOIN”):
SELECT
c.id,
c.first_name,
o.order_id,
o.sales
FROM orders AS o
LEFT JOIN customers AS c
ON c.id = o.customer_id;
- Same result but often easier to read because we mostly use LEFT JOINs.
FULL JOIN – “everything”
SELECT
c.id,
c.first_name,
o.order_id,
o.customer_id,
o.sales
FROM customers AS c
FULL JOIN orders AS o
ON c.id = o.customer_id;
-
Contains:
- matching customer+order rows,
- customers without orders (order columns NULL),
- orders without customers (customer columns NULL).
5.2 Anti-joins
LEFT ANTI JOIN – “customers with no orders”
SELECT *
FROM customers AS c
LEFT JOIN orders AS o
ON c.id = o.customer_id
WHERE o.customer_id IS NULL;
- LEFT JOIN keeps all customers.
WHERE o.customer_id IS NULLfilters down to rows where no order matched.- This matches the “Left Anti Join” diagram: we keep only unmatched rows from the left, using the right as a filter.
RIGHT ANTI JOIN – “orders without customers”
SELECT *
FROM customers AS c
RIGHT JOIN orders AS o
ON c.id = o.customer_id
WHERE c.id IS NULL;
Alternative using LEFT JOIN (slide “Alternative To RIGHT Anti JOIN”):
SELECT *
FROM orders AS o
LEFT JOIN customers AS c
ON c.id = o.customer_id
WHERE c.id IS NULL;
- Same logic: keep unmatched rows from the table you keep all rows from.
FULL ANTI JOIN – “all mismatches”
SELECT
c.id,
c.first_name,
o.order_id,
o.customer_id,
o.sales
FROM customers AS c
FULL JOIN orders AS o
ON c.id = o.customer_id
WHERE o.customer_id IS NULL
OR c.id IS NULL;
-
Collects both:
- customers without orders,
- orders without customers.
5.3 Alternative INNER JOIN via LEFT JOIN
SELECT *
FROM customers AS c
LEFT JOIN orders AS o
ON c.id = o.customer_id
WHERE o.customer_id IS NOT NULL;
- LEFT JOIN first, then filter out non-matches.
- Final effect is exactly like an INNER JOIN (only matching rows).
5.4 CROSS JOIN
SELECT *
FROM customers
CROSS JOIN orders;
- If you have 10 customers and 5 orders, you get 50 rows.
- Useful mainly when you want all combinations (e.g., scenario modelling).
5.5 Multi-table join (SalesDB example)
From the script:
USE SalesDB;
SELECT
o.OrderID,
o.Sales,
c.FirstName AS CustomerFirstName,
c.LastName AS CustomerLastName,
p.Product AS ProductName,
p.Price,
e.FirstName AS EmployeeFirstName,
e.LastName AS EmployeeLastName
FROM Sales.Orders AS o
LEFT JOIN Sales.Customers AS c
ON o.CustomerID = c.CustomerID
LEFT JOIN Sales.Products AS p
ON o.ProductID = p.ProductID
LEFT JOIN Sales.Employees AS e
ON o.SalesPersonID = e.EmployeeID;
What’s happening:
-
Sales.Ordersis the fact table (one row per order). -
Each LEFT JOIN enriches each order with:
- customer info,
- product info,
- employee (salesperson) info.
-
LEFT JOIN means you still see orders even if some dimension data is missing.
5.6 Set operators – Employees & Customers
Using Sales.Customers and Sales.Employees.
UNION – combine without duplicates
SELECT
FirstName,
LastName
FROM Sales.Customers
UNION
SELECT
FirstName,
LastName
FROM Sales.Employees;
- Returns each unique (FirstName, LastName) pair at most once.
- Think of it as “unique people either as customer or employee”.
UNION ALL – combine with duplicates
SELECT
FirstName,
LastName
FROM Sales.Customers
UNION ALL
SELECT
FirstName,
LastName
FROM Sales.Employees;
- Keeps duplicates – useful when counts matter (e.g., number of records).
EXCEPT – “employees who are not customers”
SELECT
FirstName,
LastName
FROM Sales.Employees
EXCEPT
SELECT
FirstName,
LastName
FROM Sales.Customers;
- Returns employees whose (FirstName, LastName) pair never appears in Customers.
INTERSECT – “employees who are also customers”
SELECT
FirstName,
LastName
FROM Sales.Employees
INTERSECT
SELECT
FirstName,
LastName
FROM Sales.Customers;
- Returns only names that are in both tables.
5.7 UNION over Orders & OrdersArchive
SELECT
'Orders' AS SourceTable,
OrderID,
ProductID,
CustomerID,
SalesPersonID,
OrderDate,
ShipDate,
OrderStatus,
ShipAddress,
BillAddress,
Quantity,
Sales,
CreationTime
FROM Sales.Orders
UNION
SELECT
'OrdersArchive' AS SourceTable,
OrderID,
ProductID,
CustomerID,
SalesPersonID,
OrderDate,
ShipDate,
OrderStatus,
ShipAddress,
BillAddress,
Quantity,
Sales,
CreationTime
FROM Sales.OrdersArchive
ORDER BY OrderID;
Key points:
- Uses UNION (duplicates removed) to merge current and archived orders.
- Adds a
SourceTablecolumn to show where each row came from. ORDER BYappears once at the end.
6. Common mistakes & tips
6.1 Joins
-
Forgetting the join condition (
ONclause)- This effectively becomes a CROSS JOIN and multiplies rows.
- Always check: “What keys am I matching on?”
-
Mixing up LEFT and RIGHT tables
- Remember: in
A LEFT JOIN B, A is the one that never gets dropped. - Slides emphasise “The order of tables is IMPORTANT” for LEFT/RIGHT and anti joins.
- Remember: in
-
Using RIGHT JOIN when a LEFT JOIN would do
- Often clearer to flip tables and use LEFT JOIN (slide “Alternative To RIGHT JOIN”).
-
Wrong anti-join condition
- For LEFT ANTI JOIN, the NULL check must be on the right table’s key column (
WHERE B.key IS NULL). - For FULL ANTI JOIN, you need
A.key IS NULL OR B.key IS NULL.
- For LEFT ANTI JOIN, the NULL check must be on the right table’s key column (
-
Putting filters in WHERE instead of ON (with OUTER joins)
- If you add
WHERE B.some_col = 'X'after a LEFT JOIN, you can unintentionally turn it into something like an INNER join (because rows with NULL B values are dropped). - Strategy: if the condition is about matching rows, put it in ON; if it’s logical filtering after the join, use WHERE.
- If you add
-
Ambiguous column names in SELECT/WHERE/ORDER BY
- When multiple tables have
idorFirstName, always qualify:c.FirstName,o.OrderID.
- When multiple tables have
6.2 Set operators
-
Mismatched column lists / types
- E.g., selecting
(FirstName, LastName, Country)on one side and(FirstName, LastName)on the other breaks the “data types and order” rule shown in the script.
- E.g., selecting
-
Wrong column order
- The
RULE: Column Orderexample showsLastName, CustomerIDvsEmployeeID, LastName– this flips meaning of columns. - Always match columns by position and meaning.
- The
-
Relying on column names from second SELECT
- Final column names come from the first SELECT (
CustomerID AS ID→ final column name isID).
- Final column names come from the first SELECT (
-
Using ORDER BY in each SELECT
- As slides note,
ORDER BYcan only be used once at the end of the combined result (unless you wrap subqueries).
- As slides note,
-
Expecting UNION and UNION ALL to behave the same
- UNION removes duplicates; UNION ALL doesn’t – this affects counts and performance.
7. Memory boosters (mnemonics & analogies)
7.1 Joins
- INNER JOIN – “inner circle only” → intersection of both circles.
- LEFT JOIN – “LEFT is Loved” → the left table is never dropped.
- RIGHT JOIN – same idea, but “Right = Retained”.
- FULL JOIN – “FULL = Everything” → all rows from both sides.
- Anti joins – think of anti-social rows: they appear only on one side with no friend on the other.
- Cross join – “grid of all pairs”, like a multiplication table: rows(A) × rows(B).
Visual hook (from diagrams):
-
Picture the Venn diagrams from slides:
- Inner = overlapping middle.
- Left = full left + overlap.
- Left anti = left side excluding overlap.
7.2 Set operators
-
UNION vs UNION ALL
- “UNION is Unique” (both start with U): removes duplicates.
- “UNION ALL keeps All”.
-
EXCEPT
- Think “Employees EXCEPT Customers” = who is only an employee.
-
INTERSECT
- The intersection of sets – common part.
Mini-story:
You have two guest lists: Employees and Customers.
- The UNION is everyone invited to the party (no duplicate invitations).
- UNION ALL is all invitations actually sent (even duplicates).
- EXCEPT (Employees EXCEPT Customers) are staff-only invites.
- INTERSECT are VIPs who are both employees and customers.
8. Active-recall questions & mini-exercises
Use these like flashcards. Cover the answer, try to recall, then check.
8.1 Concept questions
-
Q: In a
LEFT JOIN, which table’s rows are always kept? A: The left table (primary source of data). -
Q: What does an INNER JOIN return? A: Only rows that have matching keys in both tables.
-
Q: How do you implement a LEFT ANTI JOIN using customers and orders? A:
customers LEFT JOIN orders ON ... WHERE orders.customer_id IS NULL. -
Q: In a FULL JOIN, what types of rows appear? A: Matching rows, plus unmatched rows from each side with NULLs.
-
Q: Which set operator returns rows that are in the first result but not the second? A: EXCEPT.
-
Q: Who “owns” the column names in a UNION result? A: The first SELECT.
-
Q: Where should
ORDER BYappear in a UNION query? A: At the very end, after all SELECTs and set operators.
8.2 Join exercises (using Customers, Orders, Products, Employees)
Try to write the SQL without looking; then compare to the patterns.
-
Customers without orders
- Write a query that returns all columns from
customersfor those who never placed an order.
- Write a query that returns all columns from
-
Orders without customers
- Using the RIGHT ANTI JOIN pattern or its LEFT JOIN alternative, list all orders that don’t have a matching customer.
-
Orders with full details
- Recreate the 4-table query: Orders + Customers + Products + Employees, showing the columns from the script.
-
Inner join via LEFT join
- Rewrite the customers–orders INNER JOIN using LEFT JOIN + WHERE
IS NOT NULL.
- Rewrite the customers–orders INNER JOIN using LEFT JOIN + WHERE
-
Cross join
- Write a query that generates all combinations of customers and orders (just
SELECT *).
- Write a query that generates all combinations of customers and orders (just
8.3 Set operator exercises
-
Employees who are not customers
- Write the EXCEPT query using
Sales.EmployeesandSales.Customers.
- Write the EXCEPT query using
-
People who are both employees and customers
- Write the INTERSECT query.
-
Combine Customers and Employees into a single de-duplicated list of names
- Use UNION.
-
Combine Orders and OrdersArchive into one report
- Include a
SourceTablecolumn and order byOrderID, as in the script.
- Include a
-
Spot the bug
-
Given the rule examples:
SELECT CustomerID, LastName FROM Sales.Customers UNION SELECT FirstName, LastName FROM Sales.Employees;What’s wrong?
- (Answer: columns by position don’t match in type/meaning:
CustomerIDvsFirstName.)
- (Answer: columns by position don’t match in type/meaning:
-
9. Short recap – key takeaways (5–10 bullets)
- There are two main ways to combine data: JOINS (add columns) and SET OPERATORS (add rows).
- INNER/LEFT/RIGHT/FULL joins differ only in which unmatched rows they keep.
- Anti joins are implemented using OUTER JOIN +
WHERE <other side’s key> IS NULLand are great for “who doesn’t have X?” questions. - CROSS JOIN returns all combinations of rows (use with care).
- When joining multiple tables, start from a main/fact table and add others one by one with clear keys.
- For set operators, make sure number, order, and types of columns match, and remember that final column names come from the first SELECT.
- UNION removes duplicates, UNION ALL keeps them; EXCEPT and INTERSECT work like set difference and intersection.
ORDER BYin a set-operator query goes only at the end of the entire statement.- Joins are perfect for recombining/enriching data; set operators shine for combining similar datasets, delta detection, and completeness checks.