# Why you should definitely learn SQL Window Functions

Imagine having a tool that can automatically detect if you are using JPA and Hibernate properly. Hypersistence Optimizer is that tool!

# Introduction

I found this question on the Hibernate forum, and it’s a very good opportunity to show why mastering Windows Functions is a very important skill for any backend software developer.

# Domain Model

Let’s assume we have the following `entries` table in our database:

id c1 c2 c3 c4 c5
1 2000 a 1 x 0
2 2000 a 1 y 0
3 2000 a 1 z 0
4 2000 a 2 z 0
5 2000 a 2 x 0
6 2000 b 1 x 0
7 2000 b 1 y 0
8 2000 b 1 z 0
9 2000 b 2 z 0
10 2001 a 1 x 0
11 2001 a 1 y 0
12 2001 a 1 z 0
13 2001 a 2 z 0
14 2001 a 2 x 0
15 2001 a 2 y 0
16 2001 a 2 w 0
17 2001 a 3 y 0
18 2001 a 3 w 0
19 2001 b 1 x 0
20 2001 b 1 y 0
21 2001 b 2 x 0
22 2001 b 2 z 0

# The problem

As stated in the question, the user’s goal is to:

I want to update column c5 (to 1) of each group base on column c1, c2, c3 where c3 is maximum in same c1, c2 group.

Easy peasy!

As I already explained, SQL is a Magic Wand. Not only that SQL has been a driving force in RDBMS widespread adoption, but even NewSQL databases (Google Spanner, CockroachDB) or Data streaming frameworks, like Kafka, have adopted SQL.

# Window Functions to the rescue!

So, this is how you can solve this problem:

```int updateCount = entityManager.createNativeQuery(
"update entries set c5 = 1 " +
"where id in " +
"( " +
"    select id " +
"    from ( " +
"        select *, MAX (c3) OVER (PARTITION BY c1, c2) as max_c3 " +
"        from entries " +
"    ) t " +
"    where t.c3 = t.max_c3 " +
") ")
.executeUpdate();

assertEquals( 7, updateCount );
```

Because Window Functions allow you to aggregate values without breaking the returning table result set, we can easily find the matching identifiers which can be passed to the UPDATE statement.

The Execution Plan for the UPDATE statement above looks as follows:

```explain analyze
update entries set c5 = 1
where id in
(
select id
from (
select *, MAX (c3) OVER (PARTITION BY c1, c2) as max_c3
from entries
) t
where t.c3 = t.max_c3
)

Update on entries
(cost=15.27..23.30 rows=1 width=2134)
(actual time=0.154..0.154 rows=0 loops=1)
->  Nested Loop
(cost=15.27..23.30 rows=1 width=2134)
(actual time=0.094..0.104 rows=7 loops=1)
->  HashAggregate
(cost=15.12..15.13 rows=1 width=1084)
(actual time=0.083..0.085 rows=7 loops=1)
Group Key: t.id
->  Subquery Scan on t
(cost=12.85..15.12 rows=1 width=1084)
(actual time=0.063..0.080 rows=7 loops=1)
Filter: (t.c3 = t.max_c3)
Rows Removed by Filter: 15
->  WindowAgg
(cost=12.85..14.25 rows=70 width=1056)
(actual time=0.053..0.065 rows=22 loops=1)
->  Sort
(cost=12.85..13.02 rows=70 width=1052)
(actual time=0.044..0.045 rows=22 loops=1)
Sort Key: entries_1.c1, entries_1.c2
Sort Method: quicksort  Memory: 26kB
->  Seq Scan on entries entries_1
(cost=0.00..10.70 rows=70 width=1052)
(actual time=0.009..0.011 rows=22 loops=1)
->  Index Scan using entries_pkey on entries
(cost=0.14..8.16 rows=1 width=1054)
(actual time=0.002..0.002 rows=1 loops=7)
Index Cond: (id = t.id)
Planning time: 0.201 ms
Execution time: 0.230 ms
```

# What if you can’t use Window Functions?

Nowadays, all major DBs support Window Functions, MySQL 8.0 being one of the last major RDBMS to join the club. Oracle, PostgreSQL and SQL Server have been supporting Window Functions for quite some time now.

However, assuming that you are stuck with some old MySQL instance, you can still solve this problem using the following SQL query:

```int updateCount = entityManager.createNativeQuery(
"update entries set c5 = 1 " +
"where id in " +
"( " +
"    select e.id " +
"    from entries e  " +
"    inner join ( " +
"        select c1, c2, max(c3) as max_c3 " +
"        from entries " +
"        group by c1, c2 " +
"    ) t " +
"    on e.c1 = t.c1 and e.c2 = t.c2 and e.c3 = t.max_c3  " +
") " )
.executeUpdate();

assertEquals( 7, updateCount );
```

The Execution Plan for the UPDATE statement above looks as follows:

```explain analyze
update entries set c5 = 1
where id in
(
select e.id
from entries e
inner join (
select c1, c2, max(c3) as max_c3
from entries
group by c1, c2
) t
on e.c1 = t.c1 and e.c2 = t.c2 and e.c3 = t.max_c3
)

Update on entries
(cost=25.49..26.22 rows=1 width=1612)
(actual time=0.112..0.112 rows=0 loops=1)
->  Nested Loop
(cost=25.49..26.22 rows=1 width=1612)
(actual time=0.081..0.090 rows=7 loops=1)
->  HashAggregate
(cost=25.35..25.36 rows=1 width=562)
(actual time=0.074..0.075 rows=7 loops=1)
Group Key: e.id
->  Hash Join
(cost=13.85..25.35 rows=1 width=562)
(actual time=0.067..0.070 rows=7 loops=1)
Hash Cond: ((e.c1 = t.c1) AND
((e.c2)::text = (t.c2)::text) AND
(e.c3 = t.max_c3))
->  Seq Scan on entries e
(cost=0.00..10.70 rows=70 width=538)
(actual time=0.016..0.019 rows=22 loops=1)
->  Hash
(cost=12.62..12.62 rows=70 width=1072)
(actual time=0.029..0.029 rows=4 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 9kB
->  Subquery Scan on t
(cost=11.23..12.62 rows=70 width=1072)
(actual time=0.021..0.023 rows=4 loops=1)
->  HashAggregate
(cost=11.23..11.92 rows=70 width=524)
(actual time=0.017..0.018 rows=4 loops=1)
Group Key: entries_1.c1, entries_1.c2
->  Seq Scan on entries entries_1
(cost=0.00..10.70 rows=70 width=524)
(actual time=0.004..0.005 rows=22 loops=1)
->  Index Scan using entries_pkey on entries
(cost=0.14..0.85 rows=1 width=1054)
(actual time=0.001..0.002 rows=1 loops=7)
Index Cond: (id = e.id)
Planning time: 0.293 ms
Execution time: 0.219 ms
```

# Which one is better?

If you compare both Execution Plans, you can see that the Windows Function query yields a better cost than the other one.

So, not only that the query is much easier to read, but chances are that it’s going to be more efficient as well.

If you enjoyed this article, I bet you are going to love my SQL Master Class training, as well. I'll run a SQL workshop in Oslo on the 30th-31st of March 2019, so come join in.

# Conclusion

As I already explained, it’s time to break free from the SQL-92 mindset.

SQL has many features like Window Functions, Common Table Expressions, PIVOT, derived tables and set operations that you can use to find the right answer to your data processing questions. For more about new SQL features, check out Markus Winand’s Modern SQL website.

FREE EBOOK This site uses Akismet to reduce spam. Learn how your comment data is processed.