SQL SELECT Queries: From Basic to Advanced

SQL SELECT retrieves data from databases. This guide covers basic to advanced queries using a single Indian e-commerce dataset with realistic names, dates, and rupee amounts.

Dataset

We’ll use an online store with three tables: customers, orders, and products.

customers:

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100),
    city VARCHAR(50),
    state VARCHAR(50),
    joined_date DATE
);

INSERT INTO customers VALUES
(1, 'Rajesh Kumar', 'rajesh.k@email.com', 'Mumbai', 'Maharashtra', '2023-01-15'),
(2, 'Priya Sharma', 'priya.s@email.com', 'Delhi', 'Delhi', '2023-02-20'),
(3, 'Amit Patel', 'amit.p@email.com', 'Ahmedabad', 'Gujarat', '2023-03-10'),
(4, 'Sunita Reddy', 'sunita.r@email.com', 'Hyderabad', 'Telangana', '2023-04-05'),
(5, 'Vikram Singh', 'vikram.s@email.com', 'Jaipur', 'Rajasthan', '2023-05-12'),
(6, 'Anjali Desai', 'anjali.d@email.com', 'Pune', 'Maharashtra', '2023-06-18'),
(7, 'Rahul Gupta', 'rahul.g@email.com', 'Bangalore', 'Karnataka', '2023-07-22'),
(8, 'Deepa Iyer', 'deepa.i@email.com', 'Chennai', 'Tamil Nadu', '2023-08-30'),
(9, 'Suresh Menon', 'suresh.m@email.com', 'Kochi', 'Kerala', '2023-09-14'),
(10, 'Kavita Joshi', 'kavita.j@email.com', 'Mumbai', 'Maharashtra', '2023-10-25');

products:

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10, 2)
);

INSERT INTO products VALUES
(101, 'Samsung Galaxy M34', 'Electronics', 18999.00),
(102, 'Boat Airdopes 131', 'Electronics', 1299.00),
(103, 'Puma Running Shoes', 'Footwear', 2499.00),
(104, 'Levi''s Jeans', 'Clothing', 2999.00),
(105, 'Prestige Pressure Cooker', 'Home & Kitchen', 1899.00),
(106, 'Milton Water Bottle', 'Home & Kitchen', 349.00),
(107, 'Noise ColorFit Watch', 'Electronics', 2499.00),
(108, 'Nike Sports T-Shirt', 'Clothing', 1299.00),
(109, 'Woodland Casual Shoes', 'Footwear', 3499.00),
(110, 'Havells Table Fan', 'Home & Kitchen', 1699.00);

orders:

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    quantity INT,
    order_date DATE,
    total_amount DECIMAL(10, 2),
    status VARCHAR(20),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

INSERT INTO orders VALUES
(1001, 1, 101, 1, '2024-01-10', 18999.00, 'Delivered'),
(1002, 2, 103, 2, '2024-01-12', 4998.00, 'Delivered'),
(1003, 3, 105, 1, '2024-01-15', 1899.00, 'Delivered'),
(1004, 1, 102, 2, '2024-01-18', 2598.00, 'Delivered'),
(1005, 4, 104, 1, '2024-01-20', 2999.00, 'Shipped'),
(1006, 5, 107, 1, '2024-01-22', 2499.00, 'Delivered'),
(1007, 2, 106, 3, '2024-01-25', 1047.00, 'Delivered'),
(1008, 6, 101, 1, '2024-02-01', 18999.00, 'Delivered'),
(1009, 7, 109, 1, '2024-02-05', 3499.00, 'Shipped'),
(1010, 3, 108, 2, '2024-02-08', 2598.00, 'Delivered'),
(1011, 8, 110, 1, '2024-02-10', 1699.00, 'Delivered'),
(1012, 1, 107, 1, '2024-02-12', 2499.00, 'Cancelled'),
(1013, 9, 103, 1, '2024-02-15', 2499.00, 'Delivered'),
(1014, 10, 105, 2, '2024-02-18', 3798.00, 'Shipped'),
(1015, 4, 102, 1, '2024-02-20', 1299.00, 'Delivered');

Basic SELECT

All columns:

SELECT *
FROM customers;

How it works: SELECT * retrieves all columns from the customers table. The asterisk (*) is a wildcard meaning “all columns”. SQL reads the entire table and returns every row with all column values.

Result (10 rows):

customer_id | name          | email               | city       | state        | joined_date
------------|---------------|---------------------|------------|--------------|------------
1           | Rajesh Kumar  | rajesh.k@email.com  | Mumbai     | Maharashtra  | 2023-01-15
2           | Priya Sharma  | priya.s@email.com   | Delhi      | Delhi        | 2023-02-20
3           | Amit Patel    | amit.p@email.com    | Ahmedabad  | Gujarat      | 2023-03-10
...

Specific columns:

SELECT name, city, state
FROM customers;

How it works: Instead of *, we specify exact columns. SQL scans the customers table but only returns the three requested columns (name, city, state) for each row. This is more efficient than SELECT * when you don’t need all data.

Result:

name          | city       | state
--------------|------------|-------------
Rajesh Kumar  | Mumbai     | Maharashtra
Priya Sharma  | Delhi      | Delhi
Amit Patel    | Ahmedabad  | Gujarat
...

Limit results:

SELECT name, email
FROM customers
LIMIT 5;

How it works: LIMIT 5 tells SQL to stop after returning 5 rows. SQL processes the query normally but halts execution after the first 5 matching rows. Useful for testing queries or pagination.

Result (5 rows):

name          | email
--------------|--------------------
Rajesh Kumar  | rajesh.k@email.com
Priya Sharma  | priya.s@email.com
Amit Patel    | amit.p@email.com
Sunita Reddy  | sunita.r@email.com
Vikram Singh  | vikram.s@email.com

WHERE Clause

Exact match:

SELECT name, city
FROM customers
WHERE state = 'Maharashtra';

How it works: SQL scans each row in the customers table. For each row, it checks if the state column equals ‘Maharashtra’. Only rows where this condition is true are included in the result. The comparison is case-sensitive in most databases.

Selection process:

Read row 1: state = ‘Maharashtra’ ✓ (include Rajesh Kumar)
Read row 2: state = ‘Delhi’ ✗ (skip)
Read row 3: state = ‘Gujarat’ ✗ (skip) …continues through all rows

Result:

name          | city
--------------|--------
Rajesh Kumar  | Mumbai
Anjali Desai  | Pune
Kavita Joshi  | Mumbai

Comparison operators:

SELECT product_name, price
FROM products
WHERE price > 2000;

How it works: SQL evaluates price > 2000 for each product. The > operator performs numeric comparison. Rows where price is greater than 2000 are selected. Other operators: <, >=, <=, != or <>.

Result:

product_name         | price
---------------------|----------
Samsung Galaxy M34   | 18999.00
Puma Running Shoes   | 2499.00
Levi's Jeans         | 2999.00
Noise ColorFit Watch | 2499.00
Woodland Casual Shoes| 3499.00

Multiple conditions (AND):

SELECT order_id, total_amount, status
FROM orders
WHERE total_amount > 2000
  AND status = 'Delivered';

How it works: Both conditions must be true for a row to be selected. SQL evaluates each condition separately, then combines results with AND logic. If total_amount > 2000 is false OR status = 'Delivered' is false, the row is excluded.

Selection logic:

Order 1001: amount=18999 (>2000 ✓) AND status=‘Delivered’ ✓ → Include
Order 1003: amount=1899 (>2000 ✗) AND status=‘Delivered’ ✓ → Exclude
Order 1012: amount=2499 (>2000 ✓) AND status=‘Cancelled’ ✗ → Exclude

Result:

order_id | total_amount | status
---------|--------------|----------
1001     | 18999.00     | Delivered
1002     | 4998.00      | Delivered
1006     | 2499.00      | Delivered
1008     | 18999.00     | Delivered
1009     | 3499.00      | Delivered
1010     | 2598.00      | Delivered
1013     | 2499.00      | Delivered
1014     | 3798.00      | Delivered

Multiple conditions (OR):

SELECT name, state
FROM customers
WHERE state = 'Maharashtra'
   OR state = 'Karnataka';

How it works: With OR, only ONE condition needs to be true. SQL includes a row if either state = 'Maharashtra' OR state = 'Karnataka' is true. More permissive than AND.

Selection logic:

Rajesh Kumar: state=‘Maharashtra’ ✓ → Include (first condition matches)
Priya Sharma: state=‘Delhi’ ✗ AND state=‘Delhi’ ✗ → Exclude (neither matches)
Rahul Gupta: state=‘Karnataka’ ✓ → Include (second condition matches)

Result:

name          | state
--------------|-------------
Rajesh Kumar  | Maharashtra
Anjali Desai  | Maharashtra
Rahul Gupta   | Karnataka
Kavita Joshi  | Maharashtra

IN operator:

SELECT product_name, category, price
FROM products
WHERE category IN ('Electronics', 'Footwear');

How it works: IN is shorthand for multiple OR conditions. SQL checks if the category value exists in the provided list. Equivalent to: WHERE category = 'Electronics' OR category = 'Footwear'. Cleaner syntax when checking against multiple values.

Result:

product_name         | category    | price
---------------------|-------------|----------
Samsung Galaxy M34   | Electronics | 18999.00
Boat Airdopes 131    | Electronics | 1299.00
Puma Running Shoes   | Footwear    | 2499.00
Noise ColorFit Watch | Electronics | 2499.00
Woodland Casual Shoes| Footwear    | 3499.00

BETWEEN:

SELECT order_id, order_date, total_amount
FROM orders
WHERE order_date BETWEEN '2024-02-01' AND '2024-02-15';

How it works: BETWEEN includes both boundary values. It’s equivalent to WHERE order_date >= '2024-02-01' AND order_date <= '2024-02-15'. SQL selects rows where order_date falls within the inclusive range.

Selection process:

Order 1008: date=‘2024-02-01’ (>= start ✓, <= end ✓) → Include
Order 1007: date=‘2024-01-25’ (< start ✗) → Exclude
Order 1014: date=‘2024-02-18’ (> end ✗) → Exclude

Result:

order_id | order_date  | total_amount
---------|-------------|-------------
1008     | 2024-02-01  | 18999.00
1009     | 2024-02-05  | 3499.00
1010     | 2024-02-08  | 2598.00
1011     | 2024-02-10  | 1699.00
1012     | 2024-02-12  | 2499.00
1013     | 2024-02-15  | 2499.00

LIKE (pattern matching):

SELECT name, email
FROM customers
WHERE name LIKE 'R%';

How it works: LIKE performs pattern matching. The % wildcard matches zero or more characters. 'R%' means “starts with R followed by anything”. SQL checks each name against this pattern.

Pattern examples:

'R%' → starts with R (Rajesh, Rahul)
'%Kumar' → ends with Kumar
'%a%' → contains ‘a’ anywhere
'_ajesh' → second letter onwards is ‘ajesh’ (_ matches exactly one character)

Result:

name         | email
-------------|-------------------
Rajesh Kumar | rajesh.k@email.com
Rahul Gupta  | rahul.g@email.com

SELECT name
FROM customers
WHERE name LIKE '%Kumar%';

Result:

name
-------------
Rajesh Kumar

IS NULL / IS NOT NULL:

SELECT product_name, price
FROM products
WHERE price IS NOT NULL;

ORDER BY

Ascending order:

SELECT product_name, price
FROM products
ORDER BY price ASC;

Result:

product_name              | price
--------------------------|----------
Milton Water Bottle       | 349.00
Boat Airdopes 131         | 1299.00
Nike Sports T-Shirt       | 1299.00
Havells Table Fan         | 1699.00
Prestige Pressure Cooker  | 1899.00
Puma Running Shoes        | 2499.00
Noise ColorFit Watch      | 2499.00
Levi's Jeans              | 2999.00
Woodland Casual Shoes     | 3499.00
Samsung Galaxy M34        | 18999.00

Descending order:

SELECT name, joined_date
FROM customers
ORDER BY joined_date DESC;

Result:

name          | joined_date
--------------|------------
Kavita Joshi  | 2023-10-25
Suresh Menon  | 2023-09-14
Deepa Iyer    | 2023-08-30
Rahul Gupta   | 2023-07-22
...

Multiple columns:

SELECT name, state, city
FROM customers
ORDER BY state ASC, city ASC;

Result (sorted by state, then city):

name          | state        | city
--------------|--------------|----------
Priya Sharma  | Delhi        | Delhi
Amit Patel    | Gujarat      | Ahmedabad
Rahul Gupta   | Karnataka    | Bangalore
...

Aggregate Functions

COUNT:

SELECT COUNT(*) AS total_customers
FROM customers;

Result:

total_customers
---------------
10

SUM:

SELECT SUM(total_amount) AS revenue
FROM orders
WHERE status = 'Delivered';

Result:

revenue
----------
67933.00

AVG:

SELECT AVG(price) AS average_price
FROM products;

Result:

average_price
-------------
3794.10

MIN and MAX:

SELECT
    MIN(price) AS cheapest_product,
    MAX(price) AS costliest_product
FROM products;

Result:

cheapest_product | costliest_product
-----------------|------------------
349.00           | 18999.00

GROUP BY

Count orders per customer:

SELECT
    customer_id,
    COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC;

How it works: GROUP BY combines rows with the same customer_id into groups. For each group, COUNT(*) counts the number of rows. The result has one row per unique customer_id.

Grouping process:

SQL sorts/groups orders by customer_id
Customer 1: orders 1001, 1004, 1012 → COUNT = 3
Customer 2: orders 1002, 1007 → COUNT = 2
Customer 5: order 1006 → COUNT = 1 …and so on

Without GROUP BY, COUNT(*) would return total rows (15). With GROUP BY, it returns count per group.

Result:

customer_id | order_count
------------|------------
1           | 3
2           | 2
3           | 2
4           | 2
5           | 1
6           | 1
7           | 1
8           | 1
9           | 1
10          | 1

Total sales by category:

SELECT
    p.category,
    SUM(o.total_amount) AS category_revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE o.status = 'Delivered'
GROUP BY p.category
ORDER BY category_revenue DESC;

Result:

category       | category_revenue
---------------|------------------
Electronics    | 45094.00
Footwear       | 9996.00
Home & Kitchen | 7395.00
Clothing       | 5597.00

Average order value by state:

SELECT
    c.state,
    AVG(o.total_amount) AS avg_order_value
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.state
ORDER BY avg_order_value DESC;

Result:

state        | avg_order_value
-------------|----------------
Maharashtra  | 8235.67
Karnataka    | 3499.00
Telangana    | 2149.00
Gujarat      | 2165.33
...

HAVING

Filter grouped results (WHERE filters rows before grouping, HAVING filters after):

SELECT
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(total_amount) > 5000
ORDER BY total_spent DESC;

Result:

customer_id | order_count | total_spent
------------|-------------|------------
1           | 3           | 24096.00
2           | 2           | 6045.00

Multiple conditions:

SELECT
    c.state,
    COUNT(o.order_id) AS order_count,
    AVG(o.total_amount) AS avg_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'Delivered'
GROUP BY c.state
HAVING COUNT(o.order_id) >= 2
   AND AVG(o.total_amount) > 2000;

Result:

state        | order_count | avg_amount
-------------|-------------|------------
Maharashtra  | 4           | 6036.25
Gujarat      | 2           | 2248.50

JOINS

INNER JOIN (matching records only):

SELECT
    c.name,
    o.order_id,
    o.order_date,
    o.total_amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
LIMIT 5;

How it works: INNER JOIN combines rows from two tables where the join condition matches. SQL takes each customer row, finds all matching orders (where customer_id equals), and combines them into result rows.

Join process:

Start with customers table (c)
For each customer, look in orders table (o)
Find orders where o.customer_id = c.customer_id
Combine matching rows: customer data + order data
Only include customers who have placed orders

Example: Rajesh Kumar (customer_id=1) matches orders 1001, 1004, 1012 → creates 3 result rows.

Result:

name          | order_id | order_date  | total_amount
--------------|----------|-------------|-------------
Rajesh Kumar  | 1001     | 2024-01-10  | 18999.00
Priya Sharma  | 1002     | 2024-01-12  | 4998.00
Amit Patel    | 1003     | 2024-01-15  | 1899.00
Rajesh Kumar  | 1004     | 2024-01-18  | 2598.00
Sunita Reddy  | 1005     | 2024-01-20  | 2999.00

LEFT JOIN (all customers, even without orders):

SELECT
    c.name,
    c.city,
    COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.city
ORDER BY order_count DESC;

How it works: LEFT JOIN keeps all rows from the left table (customers) even if there’s no match in the right table (orders). When no match exists, order columns are NULL.

Difference from INNER JOIN:

INNER JOIN: Only customers with orders appear
LEFT JOIN: All customers appear; order_count is 0 for customers without orders

In our dataset, all 10 customers have placed orders, so LEFT JOIN and INNER JOIN return the same customers. But LEFT JOIN ensures no customer is excluded.

Result:

name          | city       | order_count
--------------|------------|------------
Rajesh Kumar  | Mumbai     | 3
Priya Sharma  | Delhi      | 2
Amit Patel    | Ahmedabad  | 2
Sunita Reddy  | Hyderabad  | 2
Vikram Singh  | Jaipur     | 1
Anjali Desai  | Pune       | 1
Rahul Gupta   | Bangalore  | 1
Deepa Iyer    | Chennai    | 1
Suresh Menon  | Kochi      | 1
Kavita Joshi  | Mumbai     | 1

Multiple JOINs:

SELECT
    c.name AS customer_name,
    p.product_name,
    o.quantity,
    o.total_amount,
    o.order_date
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN products p ON o.product_id = p.product_id
WHERE o.order_date >= '2024-02-01'
ORDER BY o.order_date DESC
LIMIT 5;

How it works: Chain multiple JOINs to combine data from three or more tables. SQL processes JOINs left to right.

Execution steps:

Start with orders table (o)
JOIN customers: match o.customer_id = c.customer_id → get customer names
JOIN products: match o.product_id = p.product_id → get product details
Apply WHERE filter: keep only orders from Feb 2024 onwards
Sort by order_date descending
Return top 5 rows

Each order now has: order info + customer name + product name.

Result:

customer_name | product_name              | quantity | total_amount | order_date
--------------|---------------------------|----------|--------------|------------
Sunita Reddy  | Boat Airdopes 131         | 1        | 1299.00      | 2024-02-20
Kavita Joshi  | Prestige Pressure Cooker  | 2        | 3798.00      | 2024-02-18
Suresh Menon  | Puma Running Shoes        | 1        | 2499.00      | 2024-02-15
Rajesh Kumar  | Noise ColorFit Watch      | 1        | 2499.00      | 2024-02-12
Deepa Iyer    | Havells Table Fan         | 1        | 1699.00      | 2024-02-10

Subqueries

IN subquery:

SELECT name, city
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    WHERE total_amount > 3000
);

How it works: A subquery (inner query) runs first, returns a list of values, then the outer query uses those values. Think of it as a two-step process.

Execution order:

Inner query runs first:
- SELECT customer_id FROM orders WHERE total_amount > 3000
- Returns: (1, 2, 6, 7, 3, 10) — customers who placed high-value orders
Outer query uses result:
- SELECT name, city FROM customers WHERE customer_id IN (1, 2, 6, 7, 3, 10)
- Fetches customer details for those IDs

Equivalent to a JOIN but sometimes more readable for simple filters.

Result:

name          | city
--------------|----------
Rajesh Kumar  | Mumbai
Priya Sharma  | Delhi
Anjali Desai  | Pune
Rahul Gupta   | Bangalore
Amit Patel    | Ahmedabad
Kavita Joshi  | Mumbai

Scalar subquery:

SELECT
    product_name,
    price,
    (SELECT AVG(price) FROM products) AS avg_price,
    price - (SELECT AVG(price) FROM products) AS price_difference
FROM products
ORDER BY price_difference DESC;

EXISTS:

SELECT name, email
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
      AND o.status = 'Cancelled'
);

Returns customers with at least one cancelled order

Correlated subquery:

SELECT
    c.name,
    c.city,
    (SELECT COUNT(*)
     FROM orders o
     WHERE o.customer_id = c.customer_id) AS total_orders,
    (SELECT SUM(total_amount)
     FROM orders o
     WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c;

CASE Statements

Simple CASE:

SELECT
    product_name,
    price,
    CASE
        WHEN price < 1000 THEN 'Budget'
        WHEN price BETWEEN 1000 AND 5000 THEN 'Mid-Range'
        ELSE 'Premium'
    END AS price_category
FROM products
ORDER BY price;

Result:

product_name              | price     | price_category
--------------------------|-----------|----------------
Milton Water Bottle       | 349.00    | Budget
Boat Airdopes 131         | 1299.00   | Mid-Range
Nike Sports T-Shirt       | 1299.00   | Mid-Range
Havells Table Fan         | 1699.00   | Mid-Range
Prestige Pressure Cooker  | 1899.00   | Mid-Range
Puma Running Shoes        | 2499.00   | Mid-Range
Noise ColorFit Watch      | 2499.00   | Mid-Range
Levi's Jeans              | 2999.00   | Mid-Range
Woodland Casual Shoes     | 3499.00   | Mid-Range
Samsung Galaxy M34        | 18999.00  | Premium

With aggregation:

SELECT
    status,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue,
    SUM(CASE
        WHEN total_amount > 3000 THEN 1
        ELSE 0
    END) AS high_value_orders
FROM orders
GROUP BY status;

DISTINCT

Unique values:

SELECT DISTINCT state
FROM customers
ORDER BY state;

Count unique values:

SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;

Multiple columns:

SELECT DISTINCT state, city
FROM customers
ORDER BY state, city;

String Functions

CONCAT:

SELECT
    CONCAT(name, ' (', city, ')') AS customer_info,
    email
FROM customers;

Returns: “Rajesh Kumar (Mumbai)”

UPPER / LOWER:

SELECT
    UPPER(name) AS name_upper,
    LOWER(email) AS email_lower
FROM customers
LIMIT 3;

SUBSTRING:

SELECT
    name,
    SUBSTRING(email, 1, POSITION('@' IN email) - 1) AS username
FROM customers;

LENGTH:

SELECT
    product_name,
    LENGTH(product_name) AS name_length
FROM products
WHERE LENGTH(product_name) > 15;

Date Functions

YEAR, MONTH, DAY:

SELECT
    order_id,
    order_date,
    YEAR(order_date) AS order_year,
    MONTH(order_date) AS order_month,
    DAY(order_date) AS order_day
FROM orders
WHERE YEAR(order_date) = 2024;

DATE_FORMAT (MySQL):

SELECT
    order_id,
    DATE_FORMAT(order_date, '%d/%m/%Y') AS formatted_date
FROM orders;

Returns: 10/01/2024

DATEDIFF:

SELECT
    name,
    joined_date,
    DATEDIFF(CURDATE(), joined_date) AS days_since_joined
FROM customers
ORDER BY days_since_joined DESC;

Extract month name:

SELECT
    DATE_FORMAT(order_date, '%M %Y') AS month_year,
    COUNT(*) AS orders,
    SUM(total_amount) AS revenue
FROM orders
GROUP BY month_year
ORDER BY MIN(order_date);

Window Functions

ROW_NUMBER:

SELECT
    name,
    city,
    joined_date,
    ROW_NUMBER() OVER (ORDER BY joined_date) AS signup_sequence
FROM customers;

How it works: Window functions perform calculations across a set of rows related to the current row, WITHOUT collapsing them into groups (unlike GROUP BY).

ROW_NUMBER() assigns sequential numbers to rows based on the ORDER BY clause in the OVER() section.

Execution:

SQL sorts customers by joined_date
Assigns number 1 to first row (earliest date)
Number 2 to second row, and so on
Each row keeps all its original data + gets a row number

Key difference from GROUP BY: All 10 customer rows remain; we just add a calculated column.

Result:

name          | city       | joined_date | signup_sequence
--------------|------------|-------------|----------------
Rajesh Kumar  | Mumbai     | 2023-01-15  | 1
Priya Sharma  | Delhi      | 2023-02-20  | 2
Amit Patel    | Ahmedabad  | 2023-03-10  | 3
Sunita Reddy  | Hyderabad  | 2023-04-05  | 4
Vikram Singh  | Jaipur     | 2023-05-12  | 5
...

RANK by total spent:

SELECT
    c.name,
    SUM(o.total_amount) AS total_spent,
    RANK() OVER (ORDER BY SUM(o.total_amount) DESC) AS spending_rank
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

PARTITION BY:

SELECT
    c.state,
    c.name,
    SUM(o.total_amount) AS total_spent,
    RANK() OVER (
        PARTITION BY c.state
        ORDER BY SUM(o.total_amount) DESC
    ) AS rank_in_state
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.state, c.name;

How it works: PARTITION BY divides rows into separate groups (partitions), then applies the window function independently within each partition. Think of it as “GROUP BY for window functions.”

Execution:

GROUP BY creates one row per customer with their total_spent
PARTITION BY state separates customers by their state
Within each state partition, RANK() orders customers by spending
Ranking resets for each new state (rank 1 appears multiple times)

Example partitions:

Maharashtra partition: Rajesh (24096) → rank 1, Anjali (18999) → rank 2, Kavita (3798) → rank 3
Delhi partition: Priya (6045) → rank 1
Gujarat partition: Amit (4497) → rank 1

Result (partial):

state        | name          | total_spent | rank_in_state
-------------|---------------|-------------|---------------
Maharashtra  | Rajesh Kumar  | 24096.00    | 1
Maharashtra  | Anjali Desai  | 18999.00    | 2
Maharashtra  | Kavita Joshi  | 3798.00     | 3
Delhi        | Priya Sharma  | 6045.00     | 1
Gujarat      | Amit Patel    | 4497.00     | 1
Karnataka    | Rahul Gupta   | 3499.00     | 1
...

Running total:

SELECT
    order_date,
    order_id,
    total_amount,
    SUM(total_amount) OVER (
        ORDER BY order_date, order_id
    ) AS running_total
FROM orders
ORDER BY order_date, order_id;

Moving average (last 3 orders):

SELECT
    order_date,
    order_id,
    total_amount,
    AVG(total_amount) OVER (
        ORDER BY order_date, order_id
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg
FROM orders
ORDER BY order_date, order_id;

Advanced Examples

Top 3 products by revenue:

SELECT
    p.product_name,
    p.category,
    COUNT(o.order_id) AS times_ordered,
    SUM(o.total_amount) AS revenue
FROM products p
INNER JOIN orders o ON p.product_id = o.product_id
WHERE o.status != 'Cancelled'
GROUP BY p.product_id, p.product_name, p.category
ORDER BY revenue DESC
LIMIT 3;

Customers who haven’t ordered in 2024:

SELECT
    c.name,
    c.city,
    MAX(o.order_date) AS last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.city
HAVING MAX(o.order_date) < '2024-01-01'
    OR MAX(o.order_date) IS NULL;

Month-over-month growth:

WITH monthly_sales AS (
    SELECT
        DATE_FORMAT(order_date, '%Y-%m') AS month,
        SUM(total_amount) AS revenue
    FROM orders
    WHERE status = 'Delivered'
    GROUP BY month
)
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
    revenue - LAG(revenue) OVER (ORDER BY month) AS growth,
    ROUND(
        ((revenue - LAG(revenue) OVER (ORDER BY month))
         / LAG(revenue) OVER (ORDER BY month)) * 100,
        2
    ) AS growth_pct
FROM monthly_sales
ORDER BY month;

Customer segmentation by purchase behavior:

SELECT
    c.customer_id,
    c.name,
    COUNT(o.order_id) AS order_count,
    SUM(o.total_amount) AS total_spent,
    AVG(o.total_amount) AS avg_order_value,
    CASE
        WHEN COUNT(o.order_id) >= 3 AND SUM(o.total_amount) > 10000 THEN 'VIP'
        WHEN COUNT(o.order_id) >= 2 AND SUM(o.total_amount) > 5000 THEN 'Regular'
        ELSE 'Occasional'
    END AS customer_segment
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY total_spent DESC;

Result:

customer_id | name          | order_count | total_spent | avg_order_value | customer_segment
------------|---------------|-------------|-------------|-----------------|------------------
1           | Rajesh Kumar  | 3           | 24096.00    | 8032.00         | VIP
2           | Priya Sharma  | 2           | 6045.00     | 3022.50         | Regular
6           | Anjali Desai  | 1           | 18999.00    | 18999.00        | Occasional
10          | Kavita Joshi  | 1           | 3798.00     | 3798.00         | Occasional
4           | Sunita Reddy  | 2           | 4298.00     | 2149.00         | Occasional
3           | Amit Patel    | 2           | 4497.00     | 2248.50         | Occasional
7           | Rahul Gupta   | 1           | 3499.00     | 3499.00         | Occasional
5           | Vikram Singh  | 1           | 2499.00     | 2499.00         | Occasional
9           | Suresh Menon  | 1           | 2499.00     | 2499.00         | Occasional
8           | Deepa Iyer    | 1           | 1699.00     | 1699.00         | Occasional

Products never ordered:

SELECT
    p.product_id,
    p.product_name,
    p.category,
    p.price
FROM products p
LEFT JOIN orders o ON p.product_id = o.product_id
WHERE o.order_id IS NULL;

Best selling day of the week:

SELECT
    DAYNAME(order_date) AS day_of_week,
    COUNT(*) AS order_count,
    SUM(total_amount) AS revenue,
    AVG(total_amount) AS avg_order_value
FROM orders
WHERE status = 'Delivered'
GROUP BY day_of_week
ORDER BY revenue DESC;

Find duplicate emails (if any):

SELECT
    email,
    COUNT(*) AS email_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

Common Table Expressions (CTE)

Basic CTE:

WITH high_value_orders AS (
    SELECT
        customer_id,
        order_id,
        total_amount
    FROM orders
    WHERE total_amount > 3000
)
SELECT
    c.name,
    c.city,
    h.order_id,
    h.total_amount
FROM high_value_orders h
JOIN customers c ON h.customer_id = c.customer_id
ORDER BY h.total_amount DESC;

Multiple CTEs:

WITH customer_stats AS (
    SELECT
        customer_id,
        COUNT(*) AS order_count,
        SUM(total_amount) AS total_spent
    FROM orders
    GROUP BY customer_id
),
avg_spending AS (
    SELECT AVG(total_spent) AS avg_customer_spent
    FROM customer_stats
)
SELECT
    c.name,
    c.city,
    cs.order_count,
    cs.total_spent,
    ROUND(cs.total_spent - a.avg_customer_spent, 2) AS diff_from_avg
FROM customer_stats cs
JOIN customers c ON cs.customer_id = c.customer_id
CROSS JOIN avg_spending a
WHERE cs.total_spent > a.avg_customer_spent
ORDER BY cs.total_spent DESC;

Returns customers spending above average

Performance Tips

Use indexes on frequently queried columns:

CREATE INDEX idx_customer_state ON customers(state);
CREATE INDEX idx_order_date ON orders(order_date);
CREATE INDEX idx_order_status ON orders(status);

Select only needed columns instead of SELECT *

Use EXPLAIN to analyze queries:

EXPLAIN SELECT
    c.name,
    COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

Filter early with WHERE before JOIN when possible

Use LIMIT for large result sets during development

Practice Queries

Try these on your own:

Find customers from Maharashtra who ordered Electronics
Calculate the total quantity sold for each product category
List orders with product names where total_amount > average order amount
Find the most popular product in each category
Get customers who ordered the same product more than once
Calculate each customer’s percentage contribution to total revenue
Find orders placed on weekends
List products that are priced above their category average

Summary

SELECT retrieves data with optional column selection
WHERE filters rows before grouping
GROUP BY aggregates data
HAVING filters grouped results
JOIN combines tables
Subqueries nest queries for complex logic
Window functions perform calculations across row sets
CTEs improve query readability

Start with basic SELECT and WHERE, then progress to joins and aggregations. Master these patterns and you’ll handle most real-world SQL tasks.