Monday, June 28, 2021

SQL Group By with Limit in MySQL: Finding the top n values per group

Say you have the following database table called "tbl_test" which contains rows which can be grouped by a column.
idgroup_namevalue
1a0
2a8
3a9
4a3
5b5
6b6
7b4
8c2
9c9
If you want to find the maximum value in each group, that's easy using a Group By statement and a MAX function:
SELECT
	`tbl_test`.`group_name` AS `group_name`,
    MAX(`tbl_test`.`value`) AS `value`
FROM
	`tbl_test`
GROUP BY
	`tbl_test`.`group_name`
ORDER BY
	`tbl_test`.`group_name`
;
group_namevalue
a9
b6
c9
But what if you wanted to find the top 2 values in each group? In MySQL, this can be achieved using the aggregate function GROUP_CONCAT which takes all the values in a column in a group, sorts them, and concatenates them into a single string. The problem is that in MySQL v5 there is no way to limit the number of values used, so we'll simulate this using the SUBSTRING_INDEX function, which takes the first n values in a string containing delimited values. For example SUBSTRING_INDEX("ab,c,de", ",", 2) returns "ab,c".
SELECT
	`tbl_test`.`group_name` AS `group_name`,
    SUBSTRING_INDEX(
        GROUP_CONCAT(`tbl_test`.`value` ORDER BY `tbl_test`.`value` DESC SEPARATOR ","),
        ",", 2
    ) AS `values`
FROM
	`tbl_test`
GROUP BY
	`tbl_test`.`group_name`
;
group_namevalue
a9,8
b6,5
c9,2
This puts the values together in one column. But sometimes you'll want to have each value being in its own row. In that case you can use the same trick we saw in an earlier post about finding the median value in MySQL, which involves using session variables. I got this trick from this other blog.
SET @row_num := 0;
SET @prev_group := NULL;
SELECT
	`t`.`group_name` AS `group_name`,
    `t`.`value` AS `value`
FROM (
    SELECT
        @row_num := IF(@prev_group = `tbl_test`.`group_name`, @row_num + 1, 1) AS `row_num`,
        @prev_group := `tbl_test`.`group_name`,
        `tbl_test`.`group_name` AS `group_name`,
        `tbl_test`.`value` AS `value`
    FROM
        `tbl_test`
    HAVING
        `row_num` <= 2
    ORDER BY
        `tbl_test`.`group_name` ASC,
        `tbl_test`.`value` DESC
) AS `t`
;
The idea here is to sort by the grouping column and the value and then associate a row number with each row, where the row number restarts on the first row of each group. We then use a Having clause to filter out only the rows that have a row number that is two or less.
group_namevalue
a9
a8
b6
b5
c9
c2