You know how to get individual rows. But what if you need answers like “How many books did we sell?” or “What is the average book price?” That is where aggregation comes in.
In the previous article, you learned how to combine tables with JOINs. Now you will learn how to summarize data across many rows into a single answer.
Aggregate Functions
Aggregate functions take many rows and return a single value.
COUNT — How Many Rows?
SELECT COUNT(*) FROM books;
| COUNT(*) |
|----------|
| 7 |
There are 7 books in the table.
Count rows that match a condition:
SELECT COUNT(*) FROM books WHERE in_stock = 1;
| COUNT(*) |
|----------|
| 5 |
Count non-NULL values in a column:
SELECT COUNT(email) FROM customers;
| COUNT(email) |
|--------------|
| 3 |
COUNT(email) counts only rows where email is not NULL. COUNT(*) counts all rows, including those with NULL values.
SUM — Total of a Column
SELECT SUM(price) FROM books;
| SUM(price) |
|------------|
| 88.93 |
The total price of all books combined.
Sum with a condition:
SELECT SUM(quantity) FROM orders;
| SUM(quantity) |
|---------------|
| 7 |
AVG — Average Value
SELECT AVG(price) FROM books;
| AVG(price) |
|------------------|
| 12.7042857142857 |
You can round the result:
SELECT ROUND(AVG(price), 2) AS avg_price FROM books;
| avg_price |
|-----------|
| 12.70 |
MIN and MAX — Smallest and Largest
SELECT MIN(price) AS cheapest, MAX(price) AS most_expensive
FROM books;
| cheapest | most_expensive |
|----------|----------------|
| 8.99 | 15.99 |
MIN and MAX also work with text (alphabetical order) and dates.
SELECT MIN(order_date) AS first_order, MAX(order_date) AS last_order
FROM orders;
| first_order | last_order |
|-------------|------------|
| 2026-01-15 | 2026-03-05 |
GROUP BY — Aggregation per Group
GROUP BY splits rows into groups and applies the aggregate function to each group.
Count books per author:
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id;
| author_id | book_count |
|-----------|------------|
| 1 | 2 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
| 5 | 1 |
Author 1 and author 3 each have 2 books. The others have 1.
Total revenue per customer:
SELECT
c.name AS customer,
SUM(b.price * o.quantity) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN books b ON o.book_id = b.id
GROUP BY c.name
ORDER BY total_spent DESC;
| customer | total_spent |
|---------------|-------------|
| Alex Johnson | 45.96 |
| Sam Wilson | 27.98 |
| Jordan Smith | 13.99 |
Average book price per country:
SELECT
a.country,
ROUND(AVG(b.price), 2) AS avg_price
FROM books b
INNER JOIN authors a ON b.author_id = a.id
GROUP BY a.country;
| country | avg_price |
|---------|-----------|
| UK | 12.32 |
| USA | 13.32 |
Important rule: every column in SELECT must either be in GROUP BY or inside an aggregate function. You cannot select title if you are grouping by author_id — which title would it show?
HAVING — Filtering Groups
WHERE filters individual rows before grouping. HAVING filters groups after aggregation.
Find authors with more than one book:
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id
HAVING COUNT(*) > 1;
| author_id | book_count |
|-----------|------------|
| 1 | 2 |
| 3 | 2 |
You cannot use WHERE here because COUNT(*) does not exist yet when WHERE runs. WHERE runs before GROUP BY. HAVING runs after.
The order of operations:
1. FROM — pick the table
2. WHERE — filter individual rows
3. GROUP BY — group the remaining rows
4. HAVING — filter the groups
5. SELECT — pick the columns to show
6. ORDER BY — sort the results
7. LIMIT — limit the number of results
Find customers who spent more than $20:
SELECT
c.name,
SUM(b.price * o.quantity) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN books b ON o.book_id = b.id
GROUP BY c.name
HAVING SUM(b.price * o.quantity) > 20
ORDER BY total_spent DESC;
| name | total_spent |
|---------------|-------------|
| Alex Johnson | 45.96 |
| Sam Wilson | 27.98 |
Subqueries
A subquery is a query inside another query. It runs first, and its result is used by the outer query.
Subquery in WHERE
Find books that cost more than the average price:
SELECT title, price
FROM books
WHERE price > (SELECT AVG(price) FROM books);
| title | price |
|--------------------------|-------|
| To Kill a Mockingbird | 14.99 |
| The Catcher in the Rye | 13.99 |
| Tender Is the Night | 15.99 |
The inner query (SELECT AVG(price) FROM books) calculates the average price first. Then the outer query uses that value to filter.
Find customers who have placed orders:
SELECT name
FROM customers
WHERE id IN (SELECT DISTINCT customer_id FROM orders);
| name |
|----------------|
| Alex Johnson |
| Sam Wilson |
| Jordan Smith |
The IN operator checks if a value exists in the subquery result.
Subquery in FROM
You can use a subquery as a temporary table in the FROM clause.
Find the customer who spent the most:
SELECT customer, total_spent
FROM (
SELECT
c.name AS customer,
SUM(b.price * o.quantity) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN books b ON o.book_id = b.id
GROUP BY c.name
) AS spending
ORDER BY total_spent DESC
LIMIT 1;
| customer | total_spent |
|--------------|-------------|
| Alex Johnson | 45.96 |
The inner query calculates spending per customer. The outer query sorts and picks the top one. The AS spending gives the subquery a name (an alias). This is required.
Correlated Subqueries
A correlated subquery references the outer query. It runs once for each row of the outer query.
Find books that cost more than the average price of books by the same author:
SELECT b1.title, b1.price, b1.author_id
FROM books b1
WHERE b1.price > (
SELECT AVG(b2.price)
FROM books b2
WHERE b2.author_id = b1.author_id
);
For each book in the outer query, the inner query calculates the average price of books by that same author. If the book costs more than its author’s average, it appears in the result.
Correlated subqueries can be slow on large tables because the inner query runs for every row.
EXISTS — Does the Subquery Return Anything?
EXISTS returns true if the subquery returns at least one row.
Find authors who have at least one book in stock:
SELECT a.name
FROM authors a
WHERE EXISTS (
SELECT 1
FROM books b
WHERE b.author_id = a.id AND b.in_stock = 1
);
| name |
|---------------------|
| F. Scott Fitzgerald |
| Harper Lee |
| George Orwell |
| J.D. Salinger |
EXISTS does not care about the subquery’s columns. It only checks if any rows exist. Using SELECT 1 is a convention that shows this.
Find customers who have NOT placed any orders:
SELECT name
FROM customers
WHERE NOT EXISTS (
SELECT 1
FROM orders
WHERE orders.customer_id = customers.id
);
| name |
|--------------|
| Taylor Brown |
Common Table Expressions (CTEs)
CTEs make complex queries easier to read. A CTE is like giving a name to a subquery and using it like a table.
WITH customer_spending AS (
SELECT
c.id AS customer_id,
c.name,
SUM(b.price * o.quantity) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN books b ON o.book_id = b.id
GROUP BY c.id, c.name
)
SELECT name, total_spent
FROM customer_spending
WHERE total_spent > 20
ORDER BY total_spent DESC;
| name | total_spent |
|---------------|-------------|
| Alex Johnson | 45.96 |
| Sam Wilson | 27.98 |
The WITH ... AS part defines the CTE. Then you use it in the main query just like a regular table.
Multiple CTEs:
WITH
book_stats AS (
SELECT
author_id,
COUNT(*) AS book_count,
AVG(price) AS avg_price
FROM books
GROUP BY author_id
),
prolific_authors AS (
SELECT author_id
FROM book_stats
WHERE book_count > 1
)
SELECT a.name, bs.book_count, ROUND(bs.avg_price, 2) AS avg_price
FROM prolific_authors pa
INNER JOIN authors a ON pa.author_id = a.id
INNER JOIN book_stats bs ON pa.author_id = bs.author_id;
| name | book_count | avg_price |
|---------------------|------------|-----------|
| F. Scott Fitzgerald | 2 | 14.49 |
| George Orwell | 2 | 10.49 |
CTEs are much easier to read than nested subqueries. Use them whenever your query gets complex.
CASE WHEN — Conditional Logic
CASE WHEN is like an if-else statement inside SQL.
Categorize books by price:
SELECT
title,
price,
CASE
WHEN price < 10 THEN 'Budget'
WHEN price < 14 THEN 'Standard'
ELSE 'Premium'
END AS price_category
FROM books
ORDER BY price;
| title | price | price_category |
|--------------------------|-------|----------------|
| Animal Farm | 8.99 | Budget |
| Pride and Prejudice | 9.99 | Budget |
| 1984 | 11.99 | Standard |
| The Great Gatsby | 12.99 | Standard |
| The Catcher in the Rye | 13.99 | Standard |
| To Kill a Mockingbird | 14.99 | Premium |
| Tender Is the Night | 15.99 | Premium |
Use CASE WHEN with aggregation:
SELECT
CASE
WHEN price < 10 THEN 'Budget'
WHEN price < 14 THEN 'Standard'
ELSE 'Premium'
END AS price_category,
COUNT(*) AS book_count
FROM books
GROUP BY price_category;
| price_category | book_count |
|----------------|------------|
| Budget | 2 |
| Standard | 3 |
| Premium | 2 |
COALESCE — Handling NULL Values
COALESCE returns the first non-NULL value from a list.
SELECT
name,
COALESCE(email, 'No email provided') AS email
FROM customers;
| name | email |
|----------------|--------------------|
| Alex Johnson | alex@example.com |
| Sam Wilson | sam@example.com |
| Jordan Smith | jordan@example.com |
| Taylor Brown | No email provided |
Taylor Brown has no email (NULL), so COALESCE replaces it with the default text. This is much cleaner than showing NULL in your results.
COALESCE with multiple fallbacks:
SELECT COALESCE(phone, email, 'No contact info') AS contact
FROM customers;
It checks phone first. If NULL, it checks email. If both are NULL, it uses the default text.
Common Mistakes
1. Using WHERE instead of HAVING with aggregates
-- Wrong: WHERE cannot use aggregate functions
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id
WHERE COUNT(*) > 1;
-- Correct: use HAVING for aggregate conditions
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id
HAVING COUNT(*) > 1;
WHERE runs before GROUP BY, so it has no access to aggregated values.
2. Selecting non-grouped columns without aggregation
-- Wrong: which title should SQL show for each author_id group?
SELECT author_id, title, COUNT(*)
FROM books
GROUP BY author_id;
-- Correct: aggregate the title or include it in GROUP BY
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id;
3. Making subqueries too complex
If your subquery has subqueries inside it, rewrite using CTEs. Nested subqueries are hard to read and debug.
-- Hard to read
SELECT * FROM (
SELECT * FROM (
SELECT author_id, COUNT(*) AS cnt FROM books GROUP BY author_id
) WHERE cnt > 1
);
-- Easier to read with a CTE
WITH book_counts AS (
SELECT author_id, COUNT(*) AS cnt
FROM books
GROUP BY author_id
)
SELECT * FROM book_counts WHERE cnt > 1;
What You Learned
In this article, you learned:
COUNT,SUM,AVG,MIN,MAXsummarize dataGROUP BYaggregates per groupHAVINGfilters groups (unlikeWHEREwhich filters rows)- Subqueries in
WHEREandFROMrun queries inside queries - Correlated subqueries reference the outer query
EXISTSchecks if a subquery returns any rows- CTEs (
WITH ... AS) make complex queries readable CASE WHENadds conditional logicCOALESCEhandles NULL values with defaults
Related Articles
- JOINs — Combining Data from Multiple Tables — previous article
- SQL Cheat Sheet — quick reference for all SQL commands
What’s Next?
In the next article, you will learn about window functions. They let you do calculations across rows without collapsing them with GROUP BY. Think running totals, rankings, and month-over-month comparisons.