LeetCode SQL Interview Questions

Posted by

Table of Contents

The Blueprint for SQL Interview Mastery: Demystifying the LeetCode 77

When a tech interviewer pivots from talking about software architecture to opening up a shared coding environment and says, “Alright, let’s look at a quick database problem,” there’s a noticeable, uncomfortable shift in your chest. For years, the traditional engineering education has treated SQL as a secondary skill, something you could learn on the fly at work with an Object-Relational Mapper (ORM). However, companies have found that engineers who can’t write optimised, clean SQL end up building applications that tank production databases as traffic scales.

The LeetCode 77 SQL collection is widely considered the gold standard for breaking through this barrier. But go onto the discussion boards for these 77 problems and you’ll see a chaotic mess of hyper-optimized, single-line queries littered with unexplained variables and robotic syntax.

This guide aims to change that. We’re going to break down the core patterns hidden within LeetCode’s famous SQL pool, look at real-world interview scenarios, examine structural blueprints for the most notorious problems, and map out exactly how to think your way to an optimised solution.

Part 1: Why Most Engineers Fail SQL Interviews

Most candidates approach SQL with a procedural mindset. They see a problem with the database and try to think like an imperative programmer:

“First I need to loop through this row, then I need to check this condition and if it’s true I’ll add it to a temporary array.”

Database engines don’t think like that. SQL is a Declarative language. You don’t tell the engine how to find your data, you tell the engine exactly what the final output data layout should look like, and the database optimiser figures out the execution path.

If you miss this mental shift, in a live interview you make three fatal mistakes:

1. The WHERE vs. HAVING Confusion

This one is an instant ‘fail’ in mid-to-senior interviews.

  • WHERE filters individual raw rows, before any grouping or math aggregation happens.
  • HAVING filters the summary rows after the GROUP BY clause has already bucketed the data.
  • Using WHERE on an aggregate function like SUM() or COUNT() will crash your code before it even runs.

2. The Unintentional JOIN Explosion

When new users are faced with joining two tables, they immediately write INNER JOIN. If you are working with asymmetrical data, such as a list of users and their occasional purchases, an inner join will completely wipe out any user who has not bought something yet. If the question asks for a complete list of users and their total spend, your inner join just deleted half of your customer base from the output.

3. Coding Blindly Without ‘Dry Running’

You won’t always have a ‘Run Code’ button that spits out a green pass check in a real interview. You need to demonstrate that your query works by showing how individual rows flow through your clauses. If your own GROUP BY clause cannot track a mock row, your interviewer will see it right away.

Part 2: The Core Architectural Patterns of LeetCode 77

The LeetCode SQL problem bank seems to be a random assembly of data requests, but it is actually made up entirely of five foundational architectural patterns. If you learn to recognise these patterns, you can solve almost any problem in the set.

Pattern 1: Handling Asymmetric Connections (The Missing Info Class)

This pattern challenges you to join tables without losing data that does not have a match.

  • Core Problems:
    • #175: Combine Two Tables
    • #183: Customers Who Never Order
    • #1378: Replace Employee ID With The Unique Identifier
    • #1965: Employees With Missing Information

The Interview Scenario

Imagine your interviewer gives you an Employees table with everyone’s corporate name and an EmployeeAddress table with the physical home addresses. They want a report of every single employee with their address.

If you use an INNER JOIN, then any employee who has not yet submitted their address is totally wiped off the final report. To protect your core list you need to use an asymmetric LEFT JOIN or RIGHT JOIN.

SQL

SELECT 
    e.name,  
    a.unique_id
FROM Employees e 
LEFT OUTER JOIN EmployeeAddress a 
    ON e.id = a.id;

A LEFT JOIN tells the database engine to keep every row from the left table (Employees) the same. If there is no matching match in the right table (EmployeeAddress), then it gracefully fills those columns with NULL, instead of destroying the row.

Pattern 2: The Self-Join Hierarchy (The Organisational Mirror)

This pattern occurs when a single table contains a structural relationship with itself. This is common in organisational hierarchies or time-series comparisons.

  • Core Problems:
    • #181: Employees Earning More Than Their Managers
    • #197: Rising Temperature
    • #570: Managers with At Least 5 Direct Reports
    • #1731: The Number of Employees Which Report to Each Employee

The Interview Scenario

You are given a standard employee directory table. Each row has an id, a name, a salary, and a column called manager_id. The problem is to find employees who are getting a higher salary than their direct manager.

Beginners tend to get stuck looking for a second table. The trick here is to conceptually clone the table in your mind and join it to itself. You treat one occurrence of the table as the Worker block and the other occurrence as the Manager block.

SQL

SELECT 
    w.name AS Employee
FROM Employee AS w
INNER JOIN Employee m  
    ON w.manager_id = m.id 
WHERE w.salary > m.salary; 

Instead of simply matching w.manager_id = m.id, you force the database engine to compare a worker row with the corresponding manager row side by side, so that the salary comparison becomes a horizontal comparison.

Pattern 3: Top-N and Nth Query (The Leaderboard Class)

Ranking an element and finding a specific position in a list is one of the most important skills in data analysis.

  • Core Problems:
    • #176: Second Highest Salary
    • #177: Nth Highest Salary
    • #178: Rank Scores
    • #184: Department Highest Salary
    • #185: Department Top Three Salaries

The Interview Scenario

The interviewer asks you to find the second-highest salary from an employee payroll table. If there is a tie or just one salary in the database then the query should return NULL.

There are two main approaches to this. Simple second highest query through simple ordering and offsets. But for real scalability (such as getting the top three salaries per department) you need to use Window Functions.

Approach A: The Baseline Offset Method

SQL

SELECT (
    SELECT DISTINCT salary 
    FROM Employee 
    ORDER BY salary DESC 
    LIMIT 1 OFFSET 1
) AS SecondHighestSalary;

Why do we wrap it in an outer SELECT? If the table has one row only, the inner limit-offset query returns absolutely nothing (empty set). You wrap it in a secondary outer SELECT and you force the engine to return a clean structured NULL that meets the strict edge-case requirements of an enterprise application.

Approach B: The Modern Window Function Method

If you need to handle ties and find top ranks across multiple categories, window functions like DENSE_RANK() are extremely efficient.

SQL

WITH RankedSalaries AS (
    SELECT  
        departmentId,
        name, 
        salary,
        DENSE_RANK() OVER (PARTITION BY departmentId ORDER BY salary DESC) AS salary_rank
    FROM Employee
) 
SELECT 
    departmentId, 
    name, 
    salary 
FROM RankedSalaries 
WHERE salary_rank <= 3;

The OVER (PARTITION BY ... ORDER BY ...) clause is a flexible lens for grouping. It chunks the records into departmental buckets, sorts the salaries in each bucket from highest to lowest, and stamps a rank number right onto the row with no collapsing of the dataset.

Pattern 4: Conditional Transformations (The Pivot & Reshape Class)

Real production data is rarely organised in exactly the way the business teams want to read it. This pattern is about converting data from rows to columns.

  • Core Problems:
    • #627: Swap Sex of Employees
    • #1179: Reformat Department Table
    • #1393: Capital Gain/Loss
    • #1795: Rearrange Products Table
    • #1873: Calculate Special Bonus

The Interview Scenario

You have a ledger of stock transactions that tracks trades. Each row records the trade as either a ‘Buy’ or ‘Sell’, the name of the stock ticker and the cash value. You are going to have to take the absolute net capital gain or loss of every single ticker on the platform.

You can do a conditional CASE WHEN statement wrapped directly in a normal aggregation function instead of doing complex math loops.

SQL

SELECT 
    stock_name,
    SUM(CASE WHEN operation = 'Buy' THEN -price ELSE price END) AS capital_gain
FROM Stocks 
GROUP BY stock_name; 

This query transforms transactional events into simple positive and negative integers. If you bought a stock the cash left your account (negative value) if you sold it cash came back in (positive value). The SUM() function aggregates those conditional values for each ticker group, thus providing a clean net balance.

Pattern 5: Time-Series Comparisons & Tracking State Shifts

Many complex data tasks involve comparisons of a row with sequential tracking data, like looking for patterns across consecutive dates.

  • Core Problems:
    • #180: Consecutive Numbers
    • #550: Game Play Analysis IV
    • #1141: User Activity for the Past 30 Days I
    • #1204: Last Person to Fit in the Bus
    • #1661: Average Time of Process per Machine

The Interview Scenario

You are analysing a log table that logs daily login events. You are tasked with finding all numbers or logs that occur at least three times consecutively.

Use the window functions LEAD() and LAG() to look backward or forward across sequential rows to avoid performance intensive self-joins.

SQL

WITH SequencedLogs AS ( 
    SELECT 
        num, 
        LAG(num, 1) OVER (ORDER BY id) AS prev_num, 
        LEAD(num, 1) OVER (ORDER BY id) AS next_num 
    FROM Logs 
) 
SELECT DISTINCT num AS ConsecutiveNums 
FROM SequencedLogs 
WHERE num = prev_num AND num = next_num;

LAG(num, 1) gets the value from the row just above the current record. LEAD(num, 1) retrieves the value from the row just below. This approach collects the sequential data together onto a single horizontal line, and you can run a clean validation check.

Part 3: Deep Dives into High-Yield Interview Favourites

Let’s break down three of the most highly requested problems from the LeetCode 77 list. We will walk through the tactical steps to solve them and see the explicit reasoning behind the code.

1. LeetCode #196: Delete Duplicate Emails

  • Difficulty: Easy (Heavy demand in first technical interviews)
  • Goal: Delete duplicate emails from Person table, but keep the first occurrence with the lowest id.

The Failure of Intuition

Many engineers try to run a subquery like:

SQL

DELETE FROM Person WHERE id NOT IN (SELECT MIN(id) FROM Person GROUP BY email);

MySQL and other standard SQL databases will give a safety error: “You can’t specify target table ‘Person’ for update in FROM clause.” The database will not allow you to modify a table and read from it at the same time in a subquery.

The Right Blueprint

To safely work-around this limitation, perform an explicit self-join in your delete statement to compare the rows side-by-side.

SQL

DELETE p1 
FROM Person p1 
INNER JOIN Person p2 
    ON p1.email = p2.email 
WHERE p1.id > p2.id;

Step by Step Breakdown

  • The Target: DELETE p1 informs the database optimiser that rows will only be deleted from the p1 instance of the table.
  • The Connection: INNER JOIN Person p2 ON p1.email = p2.email matches up every single row with every other row with the same email address.
  • The Target Filter: WHERE p1.id > p2.id means that when two records have the exact same email, the one with the larger ID (the duplicate) is flagged for deletion, while the smaller (the original ID) is safely left intact.

2. LeetCode #1321: Restaurant Growth

  • Difficulty: Medium (Common for Data Engineering & Senior Analyst tracks)
  • Task: Calculate the rolling 7-day total customer spend and average for the same period, sorted by visit date, given a restaurant ledger.

Challenge

In this case you can’t just use a GROUP BY date because you need each individual date row to contain data from a previous 6 day history window.

The Right Blueprint

SQL

WITH DailyTotals AS ( 
    SELECT 
        visited_on, 
        SUM(amount) AS daily_amount 
    FROM Customer 
    GROUP BY visited_on 
), 
RollingWindows AS ( 
    SELECT 
        visited_on, 
        SUM(daily_amount) OVER(ORDER BY visited_on ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS amount, 
        ROUND(AVG(daily_amount) OVER(ORDER BY visited_on ROWS BETWEEN 6 PRECEDING AND CURRENT ROW), 2) AS average_amount, 
        ROW_NUMBER() OVER(ORDER BY visited_on) AS day_row_num 
    FROM DailyTotals 
) 
SELECT 
    visited_on, 
    amount, 
    average_amount 
FROM RollingWindows 
WHERE day_row_num >= 7; 

Breaking it down

  1. The Baseline CTE (DailyTotals): First, collapse the raw transactional customer table down to clean, single-day summaries so you have exactly one row per calendar date.
  2. The Rolling Window Engine: Use frame specification ROWS BETWEEN 6 PRECEDING AND CURRENT ROW. This makes the database engine scan the current date line, go back exactly six rows, and sum and average over that entire 7 day window.
  3. The Cold-Start Filter: Since the first week of operations doesn’t have 6 days of history behind it, we use ROW_NUMBER() to filter out records where day_row_num < 7.

3. LeetCode #608: Tree Node

  • Difficulty: Medium (Structural Logic Mapping)
  • The Issue: Classify each node in a hierarchical tree structure as a Root (the top node), Inner branch node, or Leaf (the last endpoints).

Structural Table Rules

You have a table with two columns id (current node) and p_id (immediate parent node).

  • If a node’s p_id is completely empty (NULL) it is the Root.
  • An Inner node is a node whose id appears in the parent column of other rows.
  • Leaf: It has a parent, but no other object ever uses its own ID as a parent.

The Right Blueprint

SQL

SELECT 
    id,
    CASE 
        WHEN p_id IS NULL THEN 'Root' 
        WHEN id IN (SELECT DISTINCT p_id FROM Tree WHERE p_id IS NOT NULL) THEN 'Inner' 
        ELSE 'Leaf' 
    END AS type 
FROM Tree;

Step by Step Breakdown

  • Evaluation Order: The conditions in the CASE statement are evaluated from the top to the bottom. Once a condition is met, it will not be evaluated anymore.
  • The Root Check: First, check that the node has no parent node. If p_id IS NULL then it is the ‘Root’.
  • Inner Node Check: Is the node id in a subquery list of all parents? Important optimisation tip: always filter out NULL values inside an IN subquery (WHERE p_id IS NOT NULL). In many SQL dialects, an IN check against a set that contains a NULL will cause the whole evaluation to fail and return unexpected empty results.
  • The Default Fallback: If a node fails both checks, it means it has a valid parent but has no children of its own. It is classified as a “Leaf” by the fallback ELSE catch.

Part 4: Technical Interview Cheat Sheet

Before you jump into your technical screening, review this core syntax checklist:

Algorithmic PatternKey SQL MechanicsWhen to Use It
Missing / Unmatched ElementsLEFT JOIN / IS NULLIdentifying customers who never ordered, or tracking down missing logs.
Rankings / LeaderboardsDENSE_RANK() OVER(...)Finding top earners, third-highest prices, or ranked scores with ties.
Consecutive / Rolling EventsLEAD() / LAG() / Window FramingDetecting consecutive item logins or running 7-day transaction averages.
Conditional PivotingSUM(CASE WHEN ... THEN ...)Splitting mixed transaction ledgers into separate balance columns.
Hierarchical QueriesSelf-Joins (Table1 A JOIN Table1 B)Comparing individual employee records directly against their managers.

Part 5: Frequently Asked Questions (FAQs)

Here are the top ten questions candidates face when tackling SQL problem sets.

1. Which SQL dialect to pick on LeetCode for interviews?

Stick with MySQL or PostgreSQL. They are industry standard and used by the vast majority of engineering teams worldwide. The basic declarative syntax is mostly the same, but PostgreSQL has just a bit more support for advanced window functions and complex analytics.

2. Is it bad to use Subqueries vs CTEs?

Standard subqueries have nothing wrong with them, but CTEs (Common Table Expressions) are much better for production code. CTEs let you define nice, named blocks of data at the top of your file, so your code is super readable and maintainable. Interviewers love CTEs because you can walk them through your logic easily step-by-step.

3. What’s the real difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?

  • ROW_NUMBER() assigns a unique sequential integer to each row, regardless of ties (1, 2, 3, 4).
  • RANK() assigns the same rank to identical values, but leaves gaps in the sequence to account for ties (1, 2, 2, 4).
  • DENSE_RANK() assigns the same rank to identical values and does not leave gaps in the sequence (1, 2, 2, 3).

4. Why UNION ALL instead of UNION?

UNION performs an internal deduplication pass to eliminate identical rows, which costs additional processing time and memory, while UNION ALL combines the datasets exactly as they are without deduplication. If you know your data chunks are distinct, UNION ALL is much faster.

5. How do I troubleshoot a slow SQL query that times out?

First, go over your filtering points. Make sure to properly index columns used in JOIN conditions and WHERE clauses. Don’t use SELECT * wildcards in production; use explicit column names instead. Also, try to replace complex nested subqueries with optimised window functions or explicit joins.

6. Can you use aggregate functions like COUNT() or SUM() in a WHERE clause?

No. The WHERE clause filters individual rows as they are read from disc, before the database engine groups the rows together. If you want to filter data on an aggregate total, you must use a GROUP BY clause, then a HAVING clause.

7. What is a CROSS JOIN and when would I use it?

A CROSS JOIN produces a Cartesian product, where every row from your first table is combined with every row from your second table. This technique is very useful for producing exhaustive combinations, for instance a full matrix of product types by calendar months for inventory reporting, but can be dangerous on large tables because it can explode the data.

8. How does SQL deal with NULLs in mathematical operations?

In standard SQL, any mathematical operation directly involving a NULL value will result in a NULL (e.g. 5 + NULL = NULL). To avoid breaking your metrics, use fallback functions such as COALESCE(column, 0) or IFNULL() to replace missing entries with a safe fallback value before you run calculations.

9. Why do interviewers obsess over Window Functions?

Using window functions, you can calculate aggregate metrics (such as running totals, moving averages, or localised ranks) without losing the details of each individual row. Without window functions you would have to write complex self-joins and subqueries which are harder to maintain and an order of magnitude slower on large datasets.

10. How many SQL questions should I expect to be asked in a mid-level engineering interview?

Less is more. It is much better to learn 30 to 40 well selected problems in the 5 core patterns in this guide thoroughly than to skim through all 77 problems without any understanding of the underlying mechanics. You want to understand the structure, and write your queries out on paper or a blank whiteboard.

Your Action Plan for Today

Don’t try to memorise all 77 problems. Choose a SQL platform, choose a relational database option, and start with #175: Combine Two Tables. Think about how your data flows through your queries, remember the structural constraints of your clauses, and practice explaining your logic out loud. Once you learn to think declaratively, database problems are no longer a stressful hurdle, but an intuitive tool in your engineering toolkit.

Leave a Reply

Your email address will not be published. Required fields are marked *