Geeky is Awesome: June 2021

Monday, June 28, 2021

SQL Group By with Limit in MySQL: Finding the top n values per group

Say you have the following database table called "tbl_test" which contains rows which can be grouped by a column.

id	group_name	value
1	a	0
2	a	8
3	a	9
4	a	3
5	b	5
6	b	6
7	b	4
8	c	2
9	c	9

If you want to find the maximum value in each group, that's easy using a Group By statement and a MAX function:

SELECT
	`tbl_test`.`group_name` AS `group_name`,
    MAX(`tbl_test`.`value`) AS `value`
FROM
	`tbl_test`
GROUP BY
	`tbl_test`.`group_name`
ORDER BY
	`tbl_test`.`group_name`
;

group_name	value
a	9
b	6
c	9

But what if you wanted to find the top 2 values in each group? In MySQL, this can be achieved using the aggregate function GROUP_CONCAT which takes all the values in a column in a group, sorts them, and concatenates them into a single string. The problem is that in MySQL v5 there is no way to limit the number of values used, so we'll simulate this using the SUBSTRING_INDEX function, which takes the first n values in a string containing delimited values. For example SUBSTRING_INDEX("ab,c,de", ",", 2) returns "ab,c".

SELECT
	`tbl_test`.`group_name` AS `group_name`,
    SUBSTRING_INDEX(
        GROUP_CONCAT(`tbl_test`.`value` ORDER BY `tbl_test`.`value` DESC SEPARATOR ","),
        ",", 2
    ) AS `values`
FROM
	`tbl_test`
GROUP BY
	`tbl_test`.`group_name`
;

group_name	value
a	9,8
b	6,5
c	9,2

This puts the values together in one column. But sometimes you'll want to have each value being in its own row. In that case you can use the same trick we saw in an earlier post about finding the median value in MySQL, which involves using session variables. I got this trick from this other blog.

SET @row_num := 0;
SET @prev_group := NULL;
SELECT
	`t`.`group_name` AS `group_name`,
    `t`.`value` AS `value`
FROM (
    SELECT
        @row_num := IF(@prev_group = `tbl_test`.`group_name`, @row_num + 1, 1) AS `row_num`,
        @prev_group := `tbl_test`.`group_name`,
        `tbl_test`.`group_name` AS `group_name`,
        `tbl_test`.`value` AS `value`
    FROM
        `tbl_test`
    HAVING
        `row_num` <= 2
    ORDER BY
        `tbl_test`.`group_name` ASC,
        `tbl_test`.`value` DESC
) AS `t`
;

The idea here is to sort by the grouping column and the value and then associate a row number with each row, where the row number restarts on the first row of each group. We then use a Having clause to filter out only the rows that have a row number that is two or less.

group_name	value
a	9
a	8
b	6
b	5
c	9
c	2