Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SQL JOINS & SET OPERATORS

Combining data across tables and queries


PDF Slides


1. Short summary

Two big ways to combine data in SQL:

  1. JOINS – glue tables side by side (add columns).
  2. SET OPERATORS – stack result sets on top of each other (add rows).

Main join types (INNER, LEFT, RIGHT, FULL, ANTI, CROSS), how to join multiple tables, and how to use UNION, UNION ALL, EXCEPT and INTERSECT safely and correctly.


2. Big picture – where this fits

Two main families for combining data:

  • Joins – combine data by columns (A ⟷ B), matching rows using keys.
  • Set operators – combine data by rows (A ↑ B), treating each SELECT result as a set.

Typical tasks:

  • Recombine data – get the “big picture” by merging related tables (customers + orders + addresses).
  • Data enrichment – add extra info from lookup/reference tables (e.g., ZIP codes).
  • Check existence / filtering – keep or remove rows based on whether they have matches elsewhere (e.g., customers with no orders – anti joins).
  • Combine information from similar tables (Employees + Customers) or current vs archive (Orders + OrdersArchive).
  • Delta detection / completeness checks using EXCEPT/INTERSECT + unions.

3. Key concepts & definitions

3.1 Types of JOINS

(“A” = left table, “B” = right table)

Join typeWhat rows do you get?Does table order matter?Typical use
No joinSeparate results from each table (two independent SELECTs).N/AJust view data from each table, no relationship.
INNER JOINOnly rows that match in both tables (the overlapping part of the Venn diagram).No – A ⋈ B = B ⋈ A.When you only care about matching data (e.g., customers with orders).
LEFT JOINAll rows from left table + matching rows from right; non-matches on right = NULL.Yes – left side is “kept”.Primary table + optional extra info; find “missing” matches via NULL.
RIGHT JOINAll rows from right table + matching rows from left; non-matches on left = NULL.Yes – right side is “kept”.Same as LEFT, but with emphasis on right table (less common).
FULL JOINAll rows from both tables; matches combined, non-matches filled with NULLs.No – A FULL B = B FULL A.Union of both sides with alignment where possible.
LEFT ANTI JOINRows from left table with no match in right (using LEFT JOIN + WHERE B.key IS NULL).YesFind “only in A” rows, e.g. customers without orders.
RIGHT ANTI JOINRows from right table with no match in left.YesFind “only in B” rows, e.g. orders without customers.
FULL ANTI JOINRows that don’t match on either side (unmatched from A + unmatched from B).NoSee all inconsistencies between two tables.
CROSS JOINAll combinations of rows from A and B (Cartesian product).N/ARare, mostly for testing, generating grids, or careful analytics.

Extra ideas from the diagrams:

  • “Primary / main source of data” = the table from which you want all rows (LEFT / RIGHT).
  • “Lookup table” for anti joins is used as a filter, not as a source of columns.

3.2 Set operators

They combine results of two SELECTs (each SELECT is like its own virtual table).

OperatorEffect (set theory)Duplicates?
UNIONAll rows from both sets, duplicates removed (like mathematical union).Removed.
UNION ALLAll rows from both sets, keeping duplicates.Kept.
EXCEPTRows in first set that are not in second (“A minus B”).Unique results.
INTERSECTOnly rows that appear in both sets (common rows).Unique results.

ORDER BY can only be used at the very end of the combined query, not after each SELECT.


3.3 Shared rules for SET operators (from 06_SETs.sql)

When using UNION/UNION ALL/EXCEPT/INTERSECT:

  • The data types of corresponding columns must match (or be compatible).
  • The order of columns in each SELECT must be the same.
  • The column names in the final result come from the first SELECT.
  • You must use the correct columns in matching positions; swapping first/last names etc. may produce nonsense.
  • (Implied by SQL standard, though not literally written): the SELECTs must have the same number of columns.

The script includes example queries under “RULE: Data Types / Column Order / Column Aliases / Correct Columns” to illustrate these rules.


4. Syntax cheat-sheet

4.1 Core join patterns

Basic INNER JOIN

SELECT
    A.col1,
    B.col2
FROM A
INNER JOIN B
    ON A.key = B.key;
  • Returns only rows where A.key equals B.key.

LEFT JOIN

SELECT
    A.col1,
    B.col2
FROM A
LEFT JOIN B
    ON A.key = B.key;
  • Keeps all rows from A, adds B’s columns when a match exists; otherwise B’s columns are NULL.

RIGHT JOIN

SELECT
    A.col1,
    B.col2
FROM A
RIGHT JOIN B
    ON A.key = B.key;
  • Keeps all rows from B.
  • Alternative: flip the tables and use LEFT JOIN instead (see slides and script).

FULL JOIN

SELECT
    A.col1,
    B.col2
FROM A
FULL JOIN B
    ON A.key = B.key;
  • Returns everything from both tables, matching where possible.

4.2 Anti-join patterns (from slides + SQL)

LEFT ANTI JOIN – “rows in A with no match in B”

SELECT
    A.*
FROM A
LEFT JOIN B
    ON A.key = B.key
WHERE B.key IS NULL;

RIGHT ANTI JOIN

SELECT
    B.*
FROM A
RIGHT JOIN B
    ON A.key = B.key
WHERE A.key IS NULL;

Alternative to RIGHT ANTI JOIN using LEFT JOIN

SELECT
    B.*
FROM B
LEFT JOIN A
    ON A.key = B.key
WHERE A.key IS NULL;

FULL ANTI JOIN – unmatched on either side

SELECT
    A.*,
    B.*
FROM A
FULL JOIN B
    ON A.key = B.key
WHERE A.key IS NULL
   OR B.key IS NULL;

4.3 Cross join pattern

SELECT
    *
FROM A
CROSS JOIN B;
  • Returns all combinations of rows from A and B (A×B).

4.4 Joining multiple tables (4-table pattern)

From the multi-table example in 05_JOINs.sql:

SELECT
    o.OrderID,
    o.Sales,
    c.FirstName AS CustomerFirstName,
    c.LastName AS CustomerLastName,
    p.Product  AS ProductName,
    p.Price,
    e.FirstName AS EmployeeFirstName,
    e.LastName  AS EmployeeLastName
FROM Sales.Orders AS o
LEFT JOIN Sales.Customers AS c
    ON o.CustomerID = c.CustomerID
LEFT JOIN Sales.Products AS p
    ON o.ProductID = p.ProductID
LEFT JOIN Sales.Employees AS e
    ON o.SalesPersonID = e.EmployeeID;

General pattern:

FROM fact_table ft
LEFT JOIN dim1 d1 ON ft.key1 = d1.key1
LEFT JOIN dim2 d2 ON ft.key2 = d2.key2
LEFT JOIN dim3 d3 ON ft.key3 = d3.key3
...

4.5 Set operator pattern

General template :

SELECT ...
FROM ...
-- JOIN / WHERE / GROUP BY etc.
<SET OPERATOR>   -- UNION / UNION ALL / EXCEPT / INTERSECT
SELECT ...
FROM ...
-- JOIN / WHERE / GROUP BY etc.
ORDER BY ...;    -- only here, at the very end

Important:

  • You can use joins, WHERE, GROUP BY, etc. in each SELECT.
  • You cannot put an ORDER BY inside each part (unless you wrap each SELECT in parentheses / subqueries); by default ORDER BY goes once at the end.

5. Worked examples (with explanations)

5.1 Basic joins with Customers & Orders

No join

SELECT * FROM customers;
SELECT * FROM orders;
  • Two completely separate result sets; nothing combined.

INNER JOIN – “customers with orders”

SELECT
    c.id,
    c.first_name,
    o.order_id,
    o.sales
FROM customers AS c
INNER JOIN orders AS o
    ON c.id = o.customer_id;
  • Only customers whose id appears as customer_id in orders.
  • If a customer never ordered, they do not appear.

LEFT JOIN – “all customers, orders optional”

SELECT
    c.id,
    c.first_name,
    o.order_id,
    o.sales
FROM customers AS c
LEFT JOIN orders AS o
    ON c.id = o.customer_id;
  • All customers appear.
  • order_id and sales are NULL if the customer has no orders.
  • Good for “data enrichment” or for later filtering to find “no orders” via WHERE o.order_id IS NULL.

RIGHT JOIN – “all orders, customers optional”

SELECT
    c.id,
    c.first_name,
    o.order_id,
    o.customer_id,
    o.sales
FROM customers AS c 
RIGHT JOIN orders AS o 
    ON c.id = o.customer_id;
  • Every order appears, even if customer_id doesn’t match a real customer.
  • If there’s no matching customer, c.id and c.first_name are NULL.

Alternative using LEFT JOIN (from script & slide “Alternative To RIGHT JOIN”):

SELECT
    c.id,
    c.first_name,
    o.order_id,
    o.sales
FROM orders AS o 
LEFT JOIN customers AS c
    ON c.id = o.customer_id;
  • Same result but often easier to read because we mostly use LEFT JOINs.

FULL JOIN – “everything”

SELECT
    c.id,
    c.first_name,
    o.order_id,
    o.customer_id,
    o.sales
FROM customers AS c 
FULL JOIN orders AS o 
    ON c.id = o.customer_id;
  • Contains:

    • matching customer+order rows,
    • customers without orders (order columns NULL),
    • orders without customers (customer columns NULL).

5.2 Anti-joins

LEFT ANTI JOIN – “customers with no orders”

SELECT *
FROM customers AS c
LEFT JOIN orders AS o
    ON c.id = o.customer_id
WHERE o.customer_id IS NULL;
  • LEFT JOIN keeps all customers.
  • WHERE o.customer_id IS NULL filters down to rows where no order matched.
  • This matches the “Left Anti Join” diagram: we keep only unmatched rows from the left, using the right as a filter.

RIGHT ANTI JOIN – “orders without customers”

SELECT *
FROM customers AS c
RIGHT JOIN orders AS o
    ON c.id = o.customer_id
WHERE c.id IS NULL;

Alternative using LEFT JOIN (slide “Alternative To RIGHT Anti JOIN”):

SELECT *
FROM orders AS o 
LEFT JOIN customers AS c
    ON c.id = o.customer_id
WHERE c.id IS NULL;
  • Same logic: keep unmatched rows from the table you keep all rows from.

FULL ANTI JOIN – “all mismatches”

SELECT
    c.id,
    c.first_name,
    o.order_id,
    o.customer_id,
    o.sales
FROM customers AS c 
FULL JOIN orders AS o 
    ON c.id = o.customer_id
WHERE o.customer_id IS NULL
   OR c.id IS NULL;
  • Collects both:

    • customers without orders,
    • orders without customers.

5.3 Alternative INNER JOIN via LEFT JOIN

SELECT *
FROM customers AS c
LEFT JOIN orders AS o
    ON c.id = o.customer_id
WHERE o.customer_id IS NOT NULL;
  • LEFT JOIN first, then filter out non-matches.
  • Final effect is exactly like an INNER JOIN (only matching rows).

5.4 CROSS JOIN

SELECT *
FROM customers
CROSS JOIN orders;
  • If you have 10 customers and 5 orders, you get 50 rows.
  • Useful mainly when you want all combinations (e.g., scenario modelling).

5.5 Multi-table join (SalesDB example)

From the script:

USE SalesDB;

SELECT 
    o.OrderID,
    o.Sales,
    c.FirstName AS CustomerFirstName,
    c.LastName  AS CustomerLastName,
    p.Product   AS ProductName,
    p.Price,
    e.FirstName AS EmployeeFirstName,
    e.LastName  AS EmployeeLastName
FROM Sales.Orders AS o
LEFT JOIN Sales.Customers AS c
    ON o.CustomerID = c.CustomerID
LEFT JOIN Sales.Products AS p
    ON o.ProductID = p.ProductID
LEFT JOIN Sales.Employees AS e
    ON o.SalesPersonID = e.EmployeeID;

What’s happening:

  • Sales.Orders is the fact table (one row per order).

  • Each LEFT JOIN enriches each order with:

    • customer info,
    • product info,
    • employee (salesperson) info.
  • LEFT JOIN means you still see orders even if some dimension data is missing.


5.6 Set operators – Employees & Customers

Using Sales.Customers and Sales.Employees.

UNION – combine without duplicates

SELECT
    FirstName,
    LastName
FROM Sales.Customers
UNION
SELECT
    FirstName,
    LastName
FROM Sales.Employees;
  • Returns each unique (FirstName, LastName) pair at most once.
  • Think of it as “unique people either as customer or employee”.

UNION ALL – combine with duplicates

SELECT
    FirstName,
    LastName
FROM Sales.Customers
UNION ALL
SELECT
    FirstName,
    LastName
FROM Sales.Employees;
  • Keeps duplicates – useful when counts matter (e.g., number of records).

EXCEPT – “employees who are not customers”

SELECT
    FirstName,
    LastName
FROM Sales.Employees
EXCEPT
SELECT
    FirstName,
    LastName
FROM Sales.Customers;
  • Returns employees whose (FirstName, LastName) pair never appears in Customers.

INTERSECT – “employees who are also customers”

SELECT
    FirstName,
    LastName
FROM Sales.Employees
INTERSECT
SELECT
    FirstName,
    LastName
FROM Sales.Customers;
  • Returns only names that are in both tables.

5.7 UNION over Orders & OrdersArchive

SELECT
    'Orders' AS SourceTable,
    OrderID,
    ProductID,
    CustomerID,
    SalesPersonID,
    OrderDate,
    ShipDate,
    OrderStatus,
    ShipAddress,
    BillAddress,
    Quantity,
    Sales,
    CreationTime
FROM Sales.Orders
UNION
SELECT
    'OrdersArchive' AS SourceTable,
    OrderID,
    ProductID,
    CustomerID,
    SalesPersonID,
    OrderDate,
    ShipDate,
    OrderStatus,
    ShipAddress,
    BillAddress,
    Quantity,
    Sales,
    CreationTime
FROM Sales.OrdersArchive
ORDER BY OrderID;

Key points:

  • Uses UNION (duplicates removed) to merge current and archived orders.
  • Adds a SourceTable column to show where each row came from.
  • ORDER BY appears once at the end.

6. Common mistakes & tips

6.1 Joins

  1. Forgetting the join condition (ON clause)

    • This effectively becomes a CROSS JOIN and multiplies rows.
    • Always check: “What keys am I matching on?”
  2. Mixing up LEFT and RIGHT tables

    • Remember: in A LEFT JOIN B, A is the one that never gets dropped.
    • Slides emphasise “The order of tables is IMPORTANT” for LEFT/RIGHT and anti joins.
  3. Using RIGHT JOIN when a LEFT JOIN would do

    • Often clearer to flip tables and use LEFT JOIN (slide “Alternative To RIGHT JOIN”).
  4. Wrong anti-join condition

    • For LEFT ANTI JOIN, the NULL check must be on the right table’s key column (WHERE B.key IS NULL).
    • For FULL ANTI JOIN, you need A.key IS NULL OR B.key IS NULL.
  5. Putting filters in WHERE instead of ON (with OUTER joins)

    • If you add WHERE B.some_col = 'X' after a LEFT JOIN, you can unintentionally turn it into something like an INNER join (because rows with NULL B values are dropped).
    • Strategy: if the condition is about matching rows, put it in ON; if it’s logical filtering after the join, use WHERE.
  6. Ambiguous column names in SELECT/WHERE/ORDER BY

    • When multiple tables have id or FirstName, always qualify: c.FirstName, o.OrderID.

6.2 Set operators

  1. Mismatched column lists / types

    • E.g., selecting (FirstName, LastName, Country) on one side and (FirstName, LastName) on the other breaks the “data types and order” rule shown in the script.
  2. Wrong column order

    • The RULE: Column Order example shows LastName, CustomerID vs EmployeeID, LastName – this flips meaning of columns.
    • Always match columns by position and meaning.
  3. Relying on column names from second SELECT

    • Final column names come from the first SELECT (CustomerID AS ID → final column name is ID).
  4. Using ORDER BY in each SELECT

    • As slides note, ORDER BY can only be used once at the end of the combined result (unless you wrap subqueries).
  5. Expecting UNION and UNION ALL to behave the same

    • UNION removes duplicates; UNION ALL doesn’t – this affects counts and performance.

7. Memory boosters (mnemonics & analogies)

7.1 Joins

  • INNER JOIN – “inner circle only” → intersection of both circles.
  • LEFT JOIN – “LEFT is Loved” → the left table is never dropped.
  • RIGHT JOIN – same idea, but “Right = Retained”.
  • FULL JOIN – “FULL = Everything” → all rows from both sides.
  • Anti joins – think of anti-social rows: they appear only on one side with no friend on the other.
  • Cross join – “grid of all pairs”, like a multiplication table: rows(A) × rows(B).

Visual hook (from diagrams):

  • Picture the Venn diagrams from slides:

    • Inner = overlapping middle.
    • Left = full left + overlap.
    • Left anti = left side excluding overlap.

7.2 Set operators

  • UNION vs UNION ALL

    • “UNION is Unique” (both start with U): removes duplicates.
    • “UNION ALL keeps All”.
  • EXCEPT

    • Think “Employees EXCEPT Customers” = who is only an employee.
  • INTERSECT

    • The intersection of sets – common part.

Mini-story:

You have two guest lists: Employees and Customers.

  • The UNION is everyone invited to the party (no duplicate invitations).
  • UNION ALL is all invitations actually sent (even duplicates).
  • EXCEPT (Employees EXCEPT Customers) are staff-only invites.
  • INTERSECT are VIPs who are both employees and customers.

8. Active-recall questions & mini-exercises

Use these like flashcards. Cover the answer, try to recall, then check.

8.1 Concept questions

  1. Q: In a LEFT JOIN, which table’s rows are always kept? A: The left table (primary source of data).

  2. Q: What does an INNER JOIN return? A: Only rows that have matching keys in both tables.

  3. Q: How do you implement a LEFT ANTI JOIN using customers and orders? A: customers LEFT JOIN orders ON ... WHERE orders.customer_id IS NULL.

  4. Q: In a FULL JOIN, what types of rows appear? A: Matching rows, plus unmatched rows from each side with NULLs.

  5. Q: Which set operator returns rows that are in the first result but not the second? A: EXCEPT.

  6. Q: Who “owns” the column names in a UNION result? A: The first SELECT.

  7. Q: Where should ORDER BY appear in a UNION query? A: At the very end, after all SELECTs and set operators.


8.2 Join exercises (using Customers, Orders, Products, Employees)

Try to write the SQL without looking; then compare to the patterns.

  1. Customers without orders

    • Write a query that returns all columns from customers for those who never placed an order.
  2. Orders without customers

    • Using the RIGHT ANTI JOIN pattern or its LEFT JOIN alternative, list all orders that don’t have a matching customer.
  3. Orders with full details

    • Recreate the 4-table query: Orders + Customers + Products + Employees, showing the columns from the script.
  4. Inner join via LEFT join

    • Rewrite the customers–orders INNER JOIN using LEFT JOIN + WHERE IS NOT NULL.
  5. Cross join

    • Write a query that generates all combinations of customers and orders (just SELECT *).

8.3 Set operator exercises

  1. Employees who are not customers

    • Write the EXCEPT query using Sales.Employees and Sales.Customers.
  2. People who are both employees and customers

    • Write the INTERSECT query.
  3. Combine Customers and Employees into a single de-duplicated list of names

    • Use UNION.
  4. Combine Orders and OrdersArchive into one report

    • Include a SourceTable column and order by OrderID, as in the script.
  5. Spot the bug

    • Given the rule examples:

      SELECT CustomerID, LastName
      FROM Sales.Customers
      UNION
      SELECT FirstName, LastName
      FROM Sales.Employees;
      

      What’s wrong?

      • (Answer: columns by position don’t match in type/meaning: CustomerID vs FirstName.)

9. Short recap – key takeaways (5–10 bullets)

  • There are two main ways to combine data: JOINS (add columns) and SET OPERATORS (add rows).
  • INNER/LEFT/RIGHT/FULL joins differ only in which unmatched rows they keep.
  • Anti joins are implemented using OUTER JOIN + WHERE <other side’s key> IS NULL and are great for “who doesn’t have X?” questions.
  • CROSS JOIN returns all combinations of rows (use with care).
  • When joining multiple tables, start from a main/fact table and add others one by one with clear keys.
  • For set operators, make sure number, order, and types of columns match, and remember that final column names come from the first SELECT.
  • UNION removes duplicates, UNION ALL keeps them; EXCEPT and INTERSECT work like set difference and intersection.
  • ORDER BY in a set-operator query goes only at the end of the entire statement.
  • Joins are perfect for recombining/enriching data; set operators shine for combining similar datasets, delta detection, and completeness checks.