Say you have the following database table called "tbl_test" which contains rows which can be grouped by a column.
id | group_name | value |
1 | a | 0 |
2 | a | 8 |
3 | a | 9 |
4 | a | 3 |
5 | b | 5 |
6 | b | 6 |
7 | b | 4 |
8 | c | 2 |
9 | c | 9 |
If you want to find the maximum value in each group, that's easy using a Group By statement and a MAX function:
SELECT
`tbl_test`.`group_name` AS `group_name`,
MAX(`tbl_test`.`value`) AS `value`
FROM
`tbl_test`
GROUP BY
`tbl_test`.`group_name`
ORDER BY
`tbl_test`.`group_name`
;
But what if you wanted to find the top 2 values in each group? In MySQL, this can be achieved using the aggregate function
GROUP_CONCAT which takes all the values in a column in a group, sorts them, and concatenates them into a single string. The problem is that in MySQL v5 there is no way to limit the number of values used, so we'll simulate this using the SUBSTRING_INDEX function, which takes the first n values in a string containing delimited values. For example SUBSTRING_INDEX("ab,c,de", ",", 2) returns "ab,c".
SELECT
`tbl_test`.`group_name` AS `group_name`,
SUBSTRING_INDEX(
GROUP_CONCAT(`tbl_test`.`value` ORDER BY `tbl_test`.`value` DESC SEPARATOR ","),
",", 2
) AS `values`
FROM
`tbl_test`
GROUP BY
`tbl_test`.`group_name`
;
group_name | value |
a | 9,8 |
b | 6,5 |
c | 9,2 |
This puts the values together in one column. But sometimes you'll want to have each value being in its own row. In that case you can use the same trick we saw in an
earlier post about finding the median value in MySQL, which involves using session variables. I got this trick from
this other blog.
SET @row_num := 0;
SET @prev_group := NULL;
SELECT
`t`.`group_name` AS `group_name`,
`t`.`value` AS `value`
FROM (
SELECT
@row_num := IF(@prev_group = `tbl_test`.`group_name`, @row_num + 1, 1) AS `row_num`,
@prev_group := `tbl_test`.`group_name`,
`tbl_test`.`group_name` AS `group_name`,
`tbl_test`.`value` AS `value`
FROM
`tbl_test`
HAVING
`row_num` <= 2
ORDER BY
`tbl_test`.`group_name` ASC,
`tbl_test`.`value` DESC
) AS `t`
;
The idea here is to sort by the grouping column and the value and then associate a row number with each row, where the row number restarts on the first row of each group. We then use a Having clause to filter out only the rows that have a row number that is two or less.
group_name | value |
a | 9 |
a | 8 |
b | 6 |
b | 5 |
c | 9 |
c | 2 |