How We Made Our SQL Query 80% Faster by Thinking Differently

SQL often presents challenges, particularly when dealing with extensive datasets. Queries that perform efficiently in development can falter dramatically in production environments, leaving developers puzzled by their sluggishness. This narrative explores the transformation from painfully slow SQL queries to remarkably swift executions, while also shedding light on the underlying issues that contributed to the initial performance bottlenecks.

The Use Case

We were tasked with querying a table named user_events, which meticulously logs user activities such as logins, clicks, and page views. Our objective was straightforward:

For every user, retrieve their most recent event.

At first glance, this seems like a simple request. We crafted a conventional SQL query that many might employ in similar scenarios:

SELECT * 
FROM user_events 
WHERE (user_id, event_time) IN (
  SELECT user_id, MAX(event_time) 
  FROM user_events 
  GROUP BY user_id
);

This query produced the correct results, but it performed at a glacial pace in production.

What’s Actually Happening Behind the Scenes?

To understand the inefficiencies, let’s dissect the query:

1. The Inner Query:

SELECT user_id, MAX(event_time) 
FROM user_events 
GROUP BY user_id;

This segment of the query groups the entire table by user and identifies the latest event_time for each user. However, the implications of this operation can be staggering:

If your table contains 100 million rows and 5 million users, Postgres must scan through all 100 million rows, perform 5 million aggregations, and generate a temporary table with 5 million rows.

Yet, this is only part of the problem.

2. The Outer Query:

SELECT * 
FROM user_events 
WHERE (user_id, event_time) IN (...);

Here, the performance issues become pronounced. For each row in the main table, Postgres evaluates:

“Is this (user_id, event_time) tuple in the list of 5 million…?”

This repetitive checking across a vast dataset is where the query’s efficiency plummets, leading to the frustratingly slow execution times experienced in production environments.

Tech Optimizer