You know how to get individual rows. But what if you need answers like “How many books did we sell?” or “What is the average book price?” That is where aggregation comes in.

In the previous article, you learned how to combine tables with JOINs. Now you will learn how to summarize data across many rows into a single answer.

Aggregate Functions

Aggregate functions take many rows and return a single value.

COUNT — How Many Rows?

SELECT COUNT(*) FROM books;
| COUNT(*) |
|----------|
| 7        |

There are 7 books in the table.

Count rows that match a condition:

SELECT COUNT(*) FROM books WHERE in_stock = 1;
| COUNT(*) |
|----------|
| 5        |

Count non-NULL values in a column:

SELECT COUNT(email) FROM customers;
| COUNT(email) |
|--------------|
| 3            |

COUNT(email) counts only rows where email is not NULL. COUNT(*) counts all rows, including those with NULL values.

SUM — Total of a Column

SELECT SUM(price) FROM books;
| SUM(price) |
|------------|
| 88.93      |

The total price of all books combined.

Sum with a condition:

SELECT SUM(quantity) FROM orders;
| SUM(quantity) |
|---------------|
| 7             |

AVG — Average Value

SELECT AVG(price) FROM books;
| AVG(price)       |
|------------------|
| 12.7042857142857 |

You can round the result:

SELECT ROUND(AVG(price), 2) AS avg_price FROM books;
| avg_price |
|-----------|
| 12.70     |

MIN and MAX — Smallest and Largest

SELECT MIN(price) AS cheapest, MAX(price) AS most_expensive
FROM books;
| cheapest | most_expensive |
|----------|----------------|
| 8.99     | 15.99          |

MIN and MAX also work with text (alphabetical order) and dates.

SELECT MIN(order_date) AS first_order, MAX(order_date) AS last_order
FROM orders;
| first_order | last_order |
|-------------|------------|
| 2026-01-15  | 2026-03-05 |

GROUP BY — Aggregation per Group

GROUP BY splits rows into groups and applies the aggregate function to each group.

Count books per author:

SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id;
| author_id | book_count |
|-----------|------------|
| 1         | 2          |
| 2         | 1          |
| 3         | 2          |
| 4         | 1          |
| 5         | 1          |

Author 1 and author 3 each have 2 books. The others have 1.

Total revenue per customer:

SELECT
    c.name AS customer,
    SUM(b.price * o.quantity) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN books b ON o.book_id = b.id
GROUP BY c.name
ORDER BY total_spent DESC;
| customer      | total_spent |
|---------------|-------------|
| Alex Johnson  | 45.96       |
| Sam Wilson    | 27.98       |
| Jordan Smith  | 13.99       |

Average book price per country:

SELECT
    a.country,
    ROUND(AVG(b.price), 2) AS avg_price
FROM books b
INNER JOIN authors a ON b.author_id = a.id
GROUP BY a.country;
| country | avg_price |
|---------|-----------|
| UK      | 12.32     |
| USA     | 13.32     |

Important rule: every column in SELECT must either be in GROUP BY or inside an aggregate function. You cannot select title if you are grouping by author_id — which title would it show?

HAVING — Filtering Groups

WHERE filters individual rows before grouping. HAVING filters groups after aggregation.

Find authors with more than one book:

SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id
HAVING COUNT(*) > 1;
| author_id | book_count |
|-----------|------------|
| 1         | 2          |
| 3         | 2          |

You cannot use WHERE here because COUNT(*) does not exist yet when WHERE runs. WHERE runs before GROUP BY. HAVING runs after.

The order of operations:

1. FROM     — pick the table
2. WHERE    — filter individual rows
3. GROUP BY — group the remaining rows
4. HAVING   — filter the groups
5. SELECT   — pick the columns to show
6. ORDER BY — sort the results
7. LIMIT    — limit the number of results

Find customers who spent more than $20:

SELECT
    c.name,
    SUM(b.price * o.quantity) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN books b ON o.book_id = b.id
GROUP BY c.name
HAVING SUM(b.price * o.quantity) > 20
ORDER BY total_spent DESC;
| name          | total_spent |
|---------------|-------------|
| Alex Johnson  | 45.96       |
| Sam Wilson    | 27.98       |

Subqueries

A subquery is a query inside another query. It runs first, and its result is used by the outer query.

Subquery in WHERE

Find books that cost more than the average price:

SELECT title, price
FROM books
WHERE price > (SELECT AVG(price) FROM books);
| title                    | price |
|--------------------------|-------|
| To Kill a Mockingbird    | 14.99 |
| The Catcher in the Rye   | 13.99 |
| Tender Is the Night      | 15.99 |

The inner query (SELECT AVG(price) FROM books) calculates the average price first. Then the outer query uses that value to filter.

Find customers who have placed orders:

SELECT name
FROM customers
WHERE id IN (SELECT DISTINCT customer_id FROM orders);
| name           |
|----------------|
| Alex Johnson   |
| Sam Wilson     |
| Jordan Smith   |

The IN operator checks if a value exists in the subquery result.

Subquery in FROM

You can use a subquery as a temporary table in the FROM clause.

Find the customer who spent the most:

SELECT customer, total_spent
FROM (
    SELECT
        c.name AS customer,
        SUM(b.price * o.quantity) AS total_spent
    FROM orders o
    INNER JOIN customers c ON o.customer_id = c.id
    INNER JOIN books b ON o.book_id = b.id
    GROUP BY c.name
) AS spending
ORDER BY total_spent DESC
LIMIT 1;
| customer     | total_spent |
|--------------|-------------|
| Alex Johnson | 45.96       |

The inner query calculates spending per customer. The outer query sorts and picks the top one. The AS spending gives the subquery a name (an alias). This is required.

Correlated Subqueries

A correlated subquery references the outer query. It runs once for each row of the outer query.

Find books that cost more than the average price of books by the same author:

SELECT b1.title, b1.price, b1.author_id
FROM books b1
WHERE b1.price > (
    SELECT AVG(b2.price)
    FROM books b2
    WHERE b2.author_id = b1.author_id
);

For each book in the outer query, the inner query calculates the average price of books by that same author. If the book costs more than its author’s average, it appears in the result.

Correlated subqueries can be slow on large tables because the inner query runs for every row.

EXISTS — Does the Subquery Return Anything?

EXISTS returns true if the subquery returns at least one row.

Find authors who have at least one book in stock:

SELECT a.name
FROM authors a
WHERE EXISTS (
    SELECT 1
    FROM books b
    WHERE b.author_id = a.id AND b.in_stock = 1
);
| name                |
|---------------------|
| F. Scott Fitzgerald |
| Harper Lee          |
| George Orwell       |
| J.D. Salinger       |

EXISTS does not care about the subquery’s columns. It only checks if any rows exist. Using SELECT 1 is a convention that shows this.

Find customers who have NOT placed any orders:

SELECT name
FROM customers
WHERE NOT EXISTS (
    SELECT 1
    FROM orders
    WHERE orders.customer_id = customers.id
);
| name         |
|--------------|
| Taylor Brown |

Common Table Expressions (CTEs)

CTEs make complex queries easier to read. A CTE is like giving a name to a subquery and using it like a table.

WITH customer_spending AS (
    SELECT
        c.id AS customer_id,
        c.name,
        SUM(b.price * o.quantity) AS total_spent
    FROM orders o
    INNER JOIN customers c ON o.customer_id = c.id
    INNER JOIN books b ON o.book_id = b.id
    GROUP BY c.id, c.name
)
SELECT name, total_spent
FROM customer_spending
WHERE total_spent > 20
ORDER BY total_spent DESC;
| name          | total_spent |
|---------------|-------------|
| Alex Johnson  | 45.96       |
| Sam Wilson    | 27.98       |

The WITH ... AS part defines the CTE. Then you use it in the main query just like a regular table.

Multiple CTEs:

WITH
    book_stats AS (
        SELECT
            author_id,
            COUNT(*) AS book_count,
            AVG(price) AS avg_price
        FROM books
        GROUP BY author_id
    ),
    prolific_authors AS (
        SELECT author_id
        FROM book_stats
        WHERE book_count > 1
    )
SELECT a.name, bs.book_count, ROUND(bs.avg_price, 2) AS avg_price
FROM prolific_authors pa
INNER JOIN authors a ON pa.author_id = a.id
INNER JOIN book_stats bs ON pa.author_id = bs.author_id;
| name                | book_count | avg_price |
|---------------------|------------|-----------|
| F. Scott Fitzgerald | 2          | 14.49     |
| George Orwell       | 2          | 10.49     |

CTEs are much easier to read than nested subqueries. Use them whenever your query gets complex.

CASE WHEN — Conditional Logic

CASE WHEN is like an if-else statement inside SQL.

Categorize books by price:

SELECT
    title,
    price,
    CASE
        WHEN price < 10 THEN 'Budget'
        WHEN price < 14 THEN 'Standard'
        ELSE 'Premium'
    END AS price_category
FROM books
ORDER BY price;
| title                    | price | price_category |
|--------------------------|-------|----------------|
| Animal Farm              | 8.99  | Budget         |
| Pride and Prejudice      | 9.99  | Budget         |
| 1984                     | 11.99 | Standard       |
| The Great Gatsby         | 12.99 | Standard       |
| The Catcher in the Rye   | 13.99 | Standard       |
| To Kill a Mockingbird    | 14.99 | Premium        |
| Tender Is the Night      | 15.99 | Premium        |

Use CASE WHEN with aggregation:

SELECT
    CASE
        WHEN price < 10 THEN 'Budget'
        WHEN price < 14 THEN 'Standard'
        ELSE 'Premium'
    END AS price_category,
    COUNT(*) AS book_count
FROM books
GROUP BY price_category;
| price_category | book_count |
|----------------|------------|
| Budget         | 2          |
| Standard       | 3          |
| Premium        | 2          |

COALESCE — Handling NULL Values

COALESCE returns the first non-NULL value from a list.

SELECT
    name,
    COALESCE(email, 'No email provided') AS email
FROM customers;
| name           | email              |
|----------------|--------------------|
| Alex Johnson   | alex@example.com   |
| Sam Wilson     | sam@example.com    |
| Jordan Smith   | jordan@example.com |
| Taylor Brown   | No email provided  |

Taylor Brown has no email (NULL), so COALESCE replaces it with the default text. This is much cleaner than showing NULL in your results.

COALESCE with multiple fallbacks:

SELECT COALESCE(phone, email, 'No contact info') AS contact
FROM customers;

It checks phone first. If NULL, it checks email. If both are NULL, it uses the default text.

Common Mistakes

1. Using WHERE instead of HAVING with aggregates

-- Wrong: WHERE cannot use aggregate functions
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id
WHERE COUNT(*) > 1;

-- Correct: use HAVING for aggregate conditions
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id
HAVING COUNT(*) > 1;

WHERE runs before GROUP BY, so it has no access to aggregated values.

2. Selecting non-grouped columns without aggregation

-- Wrong: which title should SQL show for each author_id group?
SELECT author_id, title, COUNT(*)
FROM books
GROUP BY author_id;

-- Correct: aggregate the title or include it in GROUP BY
SELECT author_id, COUNT(*) AS book_count
FROM books
GROUP BY author_id;

3. Making subqueries too complex

If your subquery has subqueries inside it, rewrite using CTEs. Nested subqueries are hard to read and debug.

-- Hard to read
SELECT * FROM (
    SELECT * FROM (
        SELECT author_id, COUNT(*) AS cnt FROM books GROUP BY author_id
    ) WHERE cnt > 1
);

-- Easier to read with a CTE
WITH book_counts AS (
    SELECT author_id, COUNT(*) AS cnt
    FROM books
    GROUP BY author_id
)
SELECT * FROM book_counts WHERE cnt > 1;

What You Learned

In this article, you learned:

  • COUNT, SUM, AVG, MIN, MAX summarize data
  • GROUP BY aggregates per group
  • HAVING filters groups (unlike WHERE which filters rows)
  • Subqueries in WHERE and FROM run queries inside queries
  • Correlated subqueries reference the outer query
  • EXISTS checks if a subquery returns any rows
  • CTEs (WITH ... AS) make complex queries readable
  • CASE WHEN adds conditional logic
  • COALESCE handles NULL values with defaults

What’s Next?

In the next article, you will learn about window functions. They let you do calculations across rows without collapsing them with GROUP BY. Think running totals, rankings, and month-over-month comparisons.