4. Aggregation Analytical Functions
PDF Slides
1. Title & Short Summary
Aggregate & Window (Analytical) Functions in SQL
This lesson shows how to:
- Summarize data with aggregate functions (
COUNT,SUM,AVG,MIN,MAX). - Use GROUP BY to turn detailed rows into summary rows.
- Use window (analytic) functions to compute aggregates, rankings, and comparisons without losing row-level detail.
2. Big Picture – Where This Fits
In SQL learning, this chapter sits in “data analysis & analytics”:
-
Earlier you learned basic SELECT–FROM–WHERE and maybe simple
GROUP BY. -
Here you go beyond simple aggregation:
- Aggregate functions + GROUP BY → classic reports (“sales per customer”).
- Window functions → advanced analytics: running totals, moving averages, top-N per group, month-over-month changes, percentiles, etc.
Think of it like this:
- GROUP BY → compress rows into summaries (you lose row detail).
- WINDOW → add extra calculated columns alongside the original rows. (Diagram on pages 6–8 of the slides illustrates this contrast. )
3. Key Concepts & Definitions
3.1 Core aggregates
| Concept | Meaning | Notes / Data types |
|---|---|---|
COUNT(expr) | Number of rows (or non-NULL values of expr) | Works for any type. COUNT(*) counts rows including NULLs; COUNT(col) ignores NULLs. |
SUM(expr) | Sum of numeric values | Numbers only. NULLs are ignored. |
AVG(expr) | Average of numeric values | Numbers only. NULLs ignored by default. Use COALESCE to treat NULLs as 0. |
MIN(expr) | Smallest value | Any comparable type. |
MAX(expr) | Largest value | Any comparable type. |
3.2 GROUP BY vs Window functions
| Feature | GROUP BY | Window functions (OVER) |
|---|---|---|
| Output row count | Decreases (rows collapsed per group) | Same as input (row-level) |
| Purpose | Simple aggregation reports | Aggregation + detailed rows |
| Typical use | “Total sales per product” | “Total sales per product on each row”, running totals, rankings |
| Syntax idea | SELECT group_col, SUM(x) FROM t GROUP BY group_col | SELECT x, SUM(x) OVER (PARTITION BY group_col) FROM t |
(See the side-by-side diagrams on pages 6–10. )
3.3 Window syntax pieces
From the slides, a generic window expression looks like:
<window_function>(expression)
OVER (
[PARTITION BY ...]
[ORDER BY ...]
[frame_clause]
)
Pieces:
- Window function – what calculation to do (e.g.
SUM,ROW_NUMBER,LAG). - Expression – argument(s) you pass, e.g.
Sales,NTILE(3),LEAD(Sales, 2, 0). - PARTITION BY – splits rows into windows (groups); calculations restart in each partition.
- ORDER BY – defines the order inside each partition (essential for running totals, rankings, LAG/LEAD).
- FRAME clause (
ROWS/RANGE ...) – defines the subset of rows in the partition the function can “see” (for running vs rolling totals).
3.4 Categories of window functions
Visual on pages 9, 15, 39, 41, 98 groups window functions into:
| Category | Functions | What they do |
|---|---|---|
| Window aggregates | COUNT, SUM, AVG, MIN, MAX | One summary value per row (over a window) |
| Ranking | ROW_NUMBER, RANK, DENSE_RANK, NTILE, CUME_DIST, PERCENT_RANK | Assign positions / buckets based on ordering |
| Value / analytic | LAG, LEAD, FIRST_VALUE, LAST_VALUE | Pull a value from another row in the window |
4. Syntax Cheat-Sheet
4.1 Plain aggregate functions
From 13_Aggregate_Functions.sql:
-- Count rows
SELECT COUNT(*) AS total_customers
FROM customers;
-- Total sales
SELECT SUM(sales) AS total_sales
FROM orders;
-- Average, max, min
SELECT AVG(sales) AS avg_sales FROM orders;
SELECT MAX(score) AS max_score FROM customers;
SELECT MIN(score) AS min_score FROM customers;
4.2 GROUP BY pattern
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(sales) AS total_sales,
AVG(sales) AS avg_sales,
MAX(sales) AS highest_sales,
MIN(sales) AS lowest_sales
FROM orders
GROUP BY customer_id;
This matches the classic “aggregations per group” example.
4.3 Basic window function patterns
Total across all rows (no partition):
SUM(Sales) OVER () AS Total_Sales
Used in several scripts and in slides to show “entire dataset” calculations.
Per group / partition:
SUM(Sales) OVER (PARTITION BY ProductID) AS Sales_By_Product
Multiple partition levels:
SUM(Sales) OVER () AS Total_Sales,
SUM(Sales) OVER (PARTITION BY ProductID) AS Sales_By_Product,
SUM(Sales) OVER (PARTITION BY ProductID, OrderStatus) AS Sales_By_Product_Status
(from Window Basics Task 5).
4.4 ORDER and FRAME – running vs rolling
Running total by status (default frame):
SUM(Sales) OVER (
PARTITION BY OrderStatus
ORDER BY OrderDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS Total_Sales
Rolling window (current + next 2 rows):
SUM(Sales) OVER (
PARTITION BY OrderStatus
ORDER BY OrderDate
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) AS Total_Sales
(from Tasks 7–11 in Window Basics and the rolling/ running diagrams on pages 59–64).
4.5 Window aggregate samples
-- Count orders overall and per customer
SELECT
OrderID,
OrderDate,
CustomerID,
COUNT(*) OVER () AS TotalOrders,
COUNT(*) OVER (PARTITION BY CustomerID) AS OrdersByCustomers
FROM Sales.Orders;
-- Moving average of Sales per product
SELECT
OrderID,
ProductID,
OrderDate,
Sales,
AVG(Sales) OVER (PARTITION BY ProductID) AS AvgByProduct,
AVG(Sales) OVER (PARTITION BY ProductID ORDER BY OrderDate) AS MovingAvg
FROM Sales.Orders;
(from 15_Window_Aggregations.sql).
4.6 Ranking function patterns
-- Basic ranking three ways
SELECT
OrderID,
ProductID,
Sales,
ROW_NUMBER() OVER (ORDER BY Sales DESC) AS SalesRank_Row,
RANK() OVER (ORDER BY Sales DESC) AS SalesRank_Rank,
DENSE_RANK() OVER (ORDER BY Sales DESC) AS SalesRank_Dense
FROM Sales.Orders;
-- Top-1 sale per product (Top-N per group)
SELECT *
FROM (
SELECT
OrderID,
ProductID,
Sales,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Sales DESC) AS RankByProduct
FROM Sales.Orders
) AS TopProductSales
WHERE RankByProduct = 1;
-- Buckets with NTILE
SELECT
OrderID,
Sales,
NTILE(3) OVER (ORDER BY Sales DESC) AS Buckets
FROM Sales.Orders;
(from 16_Window_Ranking.sql and ranking diagrams on pages 75–85).
4.7 Value function patterns (LEAD/LAG/FIRST/LAST)
-- Month-over-month change using LAG
SELECT
* ,
CurrentMonthSales - PreviousMonthSales AS MoM_Change,
ROUND(
CAST((CurrentMonthSales - PreviousMonthSales) AS FLOAT)
/ PreviousMonthSales * 100, 1
) AS MoM_Perc
FROM (
SELECT
MONTH(OrderDate) AS OrderMonth,
SUM(Sales) AS CurrentMonthSales,
LAG(SUM(Sales)) OVER (ORDER BY MONTH(OrderDate)) AS PreviousMonthSales
FROM Sales.Orders
GROUP BY MONTH(OrderDate)
) AS MonthlySales;
-- FIRST_VALUE / LAST_VALUE per product
SELECT
OrderID,
ProductID,
Sales,
FIRST_VALUE(Sales) OVER (PARTITION BY ProductID ORDER BY Sales) AS LowestSales,
LAST_VALUE(Sales) OVER (
PARTITION BY ProductID
ORDER BY Sales
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS HighestSales,
Sales - FIRST_VALUE(Sales) OVER (PARTITION BY ProductID ORDER BY Sales) AS SalesDifference
FROM Sales.Orders;
(from 17_Window_Value_Functions.sql and the value-function diagrams on pages 99–104, 106–110).
5. Worked Examples with Explanations
5.1 Aggregates vs GROUP BY
Example 1 – basic aggregates
SELECT COUNT(*) AS total_customers
FROM customers;
- What it does: returns one row with the total count of customers.
- Pattern: “Apply aggregate to whole table → 1-row summary”.
Example 2 – grouped aggregates
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(sales) AS total_sales
FROM orders
GROUP BY customer_id;
- Each
customer_idbecomes one row. - Aggregates are computed within each group.
- This matches the “squeeze many rows per customer into 1 summary row” diagram on page 6.
5.2 GROUP BY vs Window SUM
From Window Basics Tasks 1–4:
-- Task 1: plain total
SELECT SUM(Sales) AS Total_Sales
FROM Sales.Orders;
-- Task 2: total per product using GROUP BY
SELECT
ProductID,
SUM(Sales) AS Total_Sales
FROM Sales.Orders
GROUP BY ProductID;
-- Task 3: total across all orders as a window (detail kept)
SELECT
OrderID,
OrderDate,
ProductID,
Sales,
SUM(Sales) OVER () AS Total_Sales
FROM Sales.Orders;
-- Task 4: per-product totals as a window
SELECT
OrderID,
OrderDate,
ProductID,
Sales,
SUM(Sales) OVER () AS Total_Sales,
SUM(Sales) OVER (PARTITION BY ProductID) AS Sales_By_Product
FROM Sales.Orders;
Key idea:
-
GROUP BYversion returns one row per product (no order details). -
Window version returns one row per order, plus columns showing:
- total over all orders
- total per product
This exactly matches the “Window Functions keep details” diagram on page 7.
5.3 Multi-level partitions
From Task 5:
SELECT
OrderID,
OrderDate,
ProductID,
OrderStatus,
Sales,
SUM(Sales) OVER () AS Total_Sales,
SUM(Sales) OVER (PARTITION BY ProductID) AS Sales_By_Product,
SUM(Sales) OVER (PARTITION BY ProductID, OrderStatus) AS Sales_By_Product_Status
FROM Sales.Orders;
SUM(Sales) OVER ()– full dataset total.PARTITION BY ProductID– total per product.PARTITION BY ProductID, OrderStatus– total per product + status pair. This corresponds to the slide that shows combined columns in PARTITION BY.
5.4 Frames: running & rolling totals
Compare with the running/rolling Total diagrams on pages 59–64.
Running total example:
SELECT
OrderID,
OrderDate,
ProductID,
OrderStatus,
Sales,
SUM(Sales) OVER (
PARTITION BY OrderStatus
ORDER BY OrderDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS Total_Sales
FROM Sales.Orders;
- For each order status, sums from first order in time up to current order.
Rolling total example:
SELECT
OrderID,
ProductID,
OrderDate,
Sales,
AVG(Sales) OVER (
PARTITION BY ProductID
ORDER BY OrderDate
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING
) AS RollingAvg
FROM Sales.Orders;
- For each product, takes the average of current order + next order only (2-row moving window).
5.5 Window aggregates in analysis use cases
-
Counting & duplicate detection
SELECT * FROM ( SELECT *, COUNT(*) OVER(PARTITION BY OrderID) AS CheckDuplicates FROM Sales.OrdersArchive ) t WHERE CheckDuplicates > 1;COUNT(*) OVER(PARTITION BY OrderID)counts how many rows share the sameOrderID.- Filtering for
> 1returns only duplicated orders.
-
Percent of total
SELECT OrderID, ProductID, Sales, SUM(Sales) OVER () AS TotalSales, ROUND(CAST(Sales AS FLOAT) / SUM(Sales) OVER () * 100, 2) AS PercentageOfTotal FROM Sales.Orders;- Each row knows its own share of the global total.
-
Deviation from min / max
SELECT OrderID, OrderDate, ProductID, Sales, MAX(Sales) OVER () AS HighestSales, MIN(Sales) OVER () AS LowestSales, Sales - MIN(Sales) OVER () AS DeviationFromMin, MAX(Sales) OVER () - Sales AS DeviationFromMax FROM Sales.Orders;- Lets you compare each row to global extremes quickly.
5.6 Ranking examples
From 16_Window_Ranking.sql and ranking slides.
-
Top sale per product (Top-N per group)
SELECT * FROM ( SELECT OrderID, ProductID, Sales, ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Sales DESC) AS RankByProduct FROM Sales.Orders ) AS TopProductSales WHERE RankByProduct = 1; -
Bottom 2 customers by total sales
SELECT * FROM ( SELECT CustomerID, SUM(Sales) AS TotalSales, ROW_NUMBER() OVER (ORDER BY SUM(Sales)) AS RankCustomers FROM Sales.Orders GROUP BY CustomerID ) AS BottomCustomerSales WHERE RankCustomers <= 2; -
Bucketization with NTILE
SELECT OrderID, Sales, Buckets, CASE WHEN Buckets = 1 THEN 'High' WHEN Buckets = 2 THEN 'Medium' WHEN Buckets = 3 THEN 'Low' END AS SalesSegmentations FROM ( SELECT OrderID, Sales, NTILE(3) OVER (ORDER BY Sales DESC) AS Buckets FROM Sales.Orders ) AS SalesBuckets;- Similar to NTILE bucket diagrams on pages 83–85: rows are split into roughly equal groups; higher buckets get bigger groups first.
-
CUME_DIST for top 40% of prices
SELECT Product, Price, DistRank, CONCAT(DistRank * 100, '%') AS DistRankPerc FROM ( SELECT Product, Price, CUME_DIST() OVER (ORDER BY Price DESC) AS DistRank FROM Sales.Products ) AS PriceDistribution WHERE DistRank <= 0.4;CUME_DIST= (# rows with value ≥ current when ordered DESC) / total rows, as in formulas on pages 89–91.
5.7 Value function examples
From 17_Window_Value_Functions.sql and LAG/LEAD diagrams on pages 102–110.
-
Month-over-month sales change
SELECT *, CurrentMonthSales - PreviousMonthSales AS MoM_Change, ROUND( CAST((CurrentMonthSales - PreviousMonthSales) AS FLOAT) / PreviousMonthSales * 100, 1 ) AS MoM_Perc FROM ( SELECT MONTH(OrderDate) AS OrderMonth, SUM(Sales) AS CurrentMonthSales, LAG(SUM(Sales)) OVER (ORDER BY MONTH(OrderDate)) AS PreviousMonthSales FROM Sales.Orders GROUP BY MONTH(OrderDate) ) AS MonthlySales;LAGpeeks at the previous month’s aggregated sales.
-
Customer loyalty – avg days between orders
SELECT CustomerID, AVG(DaysUntilNextOrder) AS AvgDays, RANK() OVER (ORDER BY COALESCE(AVG(DaysUntilNextOrder), 999999)) AS RankAvg FROM ( SELECT OrderID, CustomerID, OrderDate AS CurrentOrder, LEAD(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS NextOrder, DATEDIFF( day, OrderDate, LEAD(OrderDate) OVER (PARTITION BY CustomerID ORDER BY OrderDate) ) AS DaysUntilNextOrder FROM Sales.Orders ) AS CustomerOrdersWithNext GROUP BY CustomerID;- Shorter average gaps → more frequent, “loyal” customers.
-
First & last value per product
Already shown in syntax section; it matches the “timeline” diagrams where FIRST_VALUE picks the first sales value and LAST_VALUE the last one (with an explicit frame).
6. Common Mistakes & Tips
-
Using window functions in
WHEREorGROUP BY- Rule slide: “Window functions can only be used in SELECT or ORDER BY clauses.”
- The script shows an invalid example:
WHERE SUM(Sales) OVER (...) > 100. Instead, wrap the query and filter in an outer SELECT.
-
Nesting window functions
- Slides explicitly say: “Not allowed to nest window functions!” and show
AVG( SUM(Sales) OVER() ) OVER(...)as invalid.
- Slides explicitly say: “Not allowed to nest window functions!” and show
-
Confusing
COUNT(*)vsCOUNT(col)COUNT(*)/COUNT(1)counts rows including NULLs.COUNT(col)ignores rows wherecolis NULL (example table withGloves NULLonly counted byCOUNT(*)in the slides).
-
Forgetting that AVG ignores NULLs
- Slides and script show using
COALESCE(Score,0)to include NULLs in average.
- Slides and script show using
-
Misunderstanding frames
- Frame requires
ORDER BY. - “Lower” boundary must not be after the “higher” boundary.
- Default frame for ordered windows is effectively
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(running from start to current).
- Frame requires
-
Assuming all window functions allow frames
-
Slides show a matrix:
- Aggregate and value functions: frame optional, and especially recommended for
LAST_VALUE. - Ranking functions: frame not allowed; they always consider the entire partition.
- Aggregate and value functions: frame optional, and especially recommended for
-
-
Expecting ROW_NUMBER to handle ties “nicely”
ROW_NUMBERalways gives unique numbers → ties are broken arbitrarily by the ORDER BY.- Use
RANK(with gaps) orDENSE_RANK(no gaps) when ties should share the same rank, as highlighted in the ranking diagrams.
-
Forgetting default offset / default value in LEAD/LAG
- Offset default is 1.
- Default value is NULL, but you can override it (slides show
LEAD(Sales, 2, 10)).
7. Memory Boosters (Mnemonics & Hooks)
-
“C-S-A-M-M” for aggregates
- Count, Sum, Average, Min, Max.
- Imagine squeezing an orange: you Count oranges, Sum the juice, find Average juice per orange, Min/Max juiciest and driest.
-
“P-O-F inside OVER” – Window clause order
- PARTITION BY → groups.
- ORDER BY → sequence.
- FRAME → slice of that sequence.
-
“RowNumber, Rank, DenseRank – 1, gap, no-gap”
ROW_NUMBER→ 1-2-3-4… always unique.RANK→ 1-2-2-4 (gap after tie).DENSE_RANK→ 1-2-2-3 (no gap).
-
Running vs Rolling totals: “Snowball vs Window”
- Running total = snowball rolling downhill, keeps growing (from first row to current).
- Rolling total = fixed window sliding along (e.g., last 3 months).
-
Value functions = “Time machine”
LAG= travel to previous row.LEAD= jump to future row.FIRST_VALUE= go to the start;LAST_VALUE= go to the end of the partition.
8. Active-Recall Questions & Mini-Exercises
Use these as flashcards. Try to answer without looking back.
8.1 Aggregates & GROUP BY
- What is the difference between
COUNT(*)andCOUNT(Sales)when someSalesvalues are NULL? - Write a query to show total sales per customer using
GROUP BY. - Why does
GROUP BYusually produce fewer rows than the original table?
8.2 Window basics & PARTITION / ORDER / FRAME
- Rewrite “total sales per product” using a window function instead of
GROUP BYso that each order row still appears. - What does
PARTITION BY ProductID, OrderStatusdo compared toPARTITION BY ProductIDalone? - What is the effect of
ORDER BYinsideOVERwhen computing a running total?
8.3 Window aggregates
- How would you compute a moving average of
Salesper product, ordered byOrderDate? - How can you use window functions to detect duplicate
OrderIDvalues? - How could you calculate, for each order, the percentage of total sales?
8.4 Ranking functions
- When should you choose
ROW_NUMBERvsRANKvsDENSE_RANK? Give a scenario for each. - How would you find the top 1 order per product using
ROW_NUMBER? - Explain what
NTILE(4) OVER (ORDER BY Sales DESC)does. - In the CUME_DIST price example, what does a value of
0.4mean?
8.5 Value functions
- How can
LAGbe used to calculate month-over-month sales change? - How would you compute the average days between orders per customer using
LEAD? - Why does
LAST_VALUEoften require an explicit frame (ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)?
9. Ultra-Short Recap (Key Takeaways)
- Aggregate functions (
COUNT,SUM,AVG,MIN,MAX) summarize multiple rows into one value. GROUP BYcollapses rows into summary rows; window functions keep all rows and just add extra columns.- A window function is
func(expr) OVER (PARTITION BY ... ORDER BY ... frame); remember P-O-F. - Use window aggregates for running totals, rolling averages, percent of total, min/max comparisons, and duplicate checks.
- Ranking functions (
ROW_NUMBER,RANK,DENSE_RANK,NTILE,CUME_DIST,PERCENT_RANK) let you do top-N, segmentation, and distribution analysis. - Value functions (
LAG,LEAD,FIRST_VALUE,LAST_VALUE) are perfect for time-based and before/after comparisons. - Window functions cannot be used in
WHEREand cannot be nested. Put them in a subquery if you need to filter by them. COUNT(*)vsCOUNT(col)and default AVG behaviour with NULLs are classic gotchas—always check how NULLs are treated.