TimeStamp | IP | UserName |
---|---|---|
2005-10-30 10:45:03 | 172.16.254.10 | jlor |
2005-10-30 10:46:31 | 172.16.254.12 | kpar |
2005-10-31 09:14:13 | 172.16.254.14 | jlor |
2005-10-31 09:25:42 | 172.16.254.16 | kpar |
2005-10-31 12:41:14 | 172.16.254.19 | jlor |
2005-11-01 07:15:15 | 172.16.254.20 | kpar |
You are asked to make a report of the last time each user has accessed the system using SQL.
At first you try using GROUP BY but then realize that it's not so simple to include the IP field along with the TimeStamp and UserName. GROUP BY works when you're interested in aggregating every field that is not used to group the records. In other words, you can easily do this:
SELECT UserName, MAX(TimeStamp) FROM log GROUP BY UserName
USERNAME | MAX(TIMESTAMP) |
---|---|
jlor | October, 31 2005 12:41:14+0000 |
kpar | November, 01 2005 07:15:15+0000 |
But if you also want to show the corresponding IP address of the access with the latest time stamp, you'd have a problem using simple SQL. If you add the IP field in the SELECT statement, you'd end up with the first IP in the table that belongs to the corresponding user, rather than the IP of the latest time stamp.
SELECT UserName, IP, MAX(TimeStamp) FROM log GROUP BY UserName
USERNAME | IP | MAX(TIMESTAMP) |
---|---|---|
jlor | 172.16.254.10 | October, 31 2005 12:41:14+0000 |
kpar | 172.16.254.12 | November, 01 2005 07:15:15+0000 |
The way to do this is to simulate the GROUP BY statement using more expressiveness methods.
MS SQL Server
In MS SQL Server, this is achieved using the ROW_NUMBER function. This function gives a number for each row (1, 2, 3, ...) which can be used inside a SELECT statement. The cool thing about this function is that the numbering can be made to restart for every different value in a field. So if we used it on the UserName field we'd have the following:
SELECT UserName, IP, TimeStamp, ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY TimeStamp DESC) FROM log
USERNAME | IP | TIMESTAMP | COLUMN_3 |
---|---|---|---|
jlor | 172.16.254.10 | October, 30 2005 10:45:03+0000 | 1 |
jlor | 172.16.254.14 | October, 31 2005 09:14:13+0000 | 2 |
jlor | 172.16.254.19 | October, 31 2005 12:41:14+0000 | 3 |
kpar | 172.16.254.12 | October, 30 2005 10:46:31+0000 | 1 |
kpar | 172.16.254.16 | October, 31 2005 09:25:42+0000 | 2 |
kpar | 172.16.254.20 | November, 01 2005 07:15:15+0000 | 3 |
It even orders the rows by user name and it lets you say how you want the rows of each user to be ordered so that you can say how you want the numbering. Using the SQL above, the row with the latest time stamp of each user has a 1 in the last column. This allows us to select it. It will have to be inside a nested query however in order to be used in a WHERE statement.
SELECT UserName, IP, TimeStamp FROM ( SELECT UserName, IP, TimeStamp, ROW_NUMBER() OVER(PARTITION BY UserName ORDER BY TimeStamp DESC) AS rank FROM log ) AS t WHERE rank = 1
USERNAME | IP | TIMESTAMP | COLUMN_3 |
---|---|---|---|
jlor | 172.16.254.19 | October, 31 2005 12:41:14+0000 | |
kpar | 172.16.254.20 | November, 01 2005 07:15:15+0000 |
Notice that you can even find when the second to last time an access was made by changing the 1 in the WHERE statement to a 2.
You can experiment with this in this SQL Fiddle.
MySQL
Unfortunately MySQL doesn't have a function as nifty as ROW_NUMBER so instead we'll have to simulate that using variables. In MySQL you can create variables using the SET statement and then update them within a SELECT statement so that they change for each row, like this:SET @row_number := 0; SELECT UserName, IP, TimeStamp, @row_number := @row_number + 1 FROM log
USERNAME | IP | TIMESTAMP | @ROW_NUMBER := @ROW_NUMBER + 1 |
---|---|---|---|
jlor | 172.16.254.10 | October, 30 2005 10:45:03+0000 | 1 |
kpar | 172.16.254.12 | October, 30 2005 10:46:31+0000 | 2 |
jlor | 172.16.254.14 | October, 31 2005 09:14:13+0000 | 3 |
kpar | 172.16.254.16 | October, 31 2005 09:25:42+0000 | 4 |
jlor | 172.16.254.19 | October, 31 2005 12:41:14+0000 | 5 |
kpar | 172.16.254.20 | November, 01 2005 07:15:15+0000 | 6 |
This is only half the story of course. We want the numbering to restart for every user and we also want this to happen after sorting the rows by user name. We also want the rows belonging to each user to be sorted by time stamp. A simple ORDER BY statement can handle the sorting part:
SET @row_number := 0; SELECT UserName, IP, TimeStamp, @row_number := @row_number + 1 FROM log ORDER BY UserName, TimeStamp DESC
USERNAME | IP | TIMESTAMP | @ROW_NUMBER := @ROW_NUMBER + 1 |
---|---|---|---|
jlor | 172.16.254.19 | October, 31 2005 12:41:14+0000 | 1 |
jlor | 172.16.254.14 | October, 31 2005 09:14:13+0000 | 2 |
jlor | 172.16.254.10 | October, 30 2005 10:45:03+0000 | 3 |
kpar | 172.16.254.20 | November, 01 2005 07:15:15+0000 | 4 |
kpar | 172.16.254.16 | October, 31 2005 09:25:42+0000 | 5 |
kpar | 172.16.254.12 | October, 30 2005 10:46:31+0000 | 6 |
The restarting of numbering is a little less simple. We have to keep track of what the previous value was using another variable and we have to also choose between setting row_number to 1 or to increment it by 1. Here is the code:
SET @row_number := 0; SET @prev_username := NULL; SELECT UserName, IP, TimeStamp, @row_number := CASE WHEN UserName = @prev_username THEN @row_number + 1 ELSE 1 END, @prev_username := UserName FROM log ORDER BY UserName, TimeStamp DESC
USERNAME | IP | TIMESTAMP | @ROW_NUMBER := CASE WHEN USERNAME = @PREV_USERNAME THEN @ROW_NUMBER + 1 ELSE 1 END | @PREV_USERNAME := USERNAME |
---|---|---|---|---|
jlor | 172.16.254.19 | October, 31 2005 12:41:14+0000 | 1 | jlor |
jlor | 172.16.254.14 | October, 31 2005 09:14:13+0000 | 2 | jlor |
jlor | 172.16.254.10 | October, 30 2005 10:45:03+0000 | 3 | jlor |
kpar | 172.16.254.20 | November, 01 2005 07:15:15+0000 | 1 | kpar |
kpar | 172.16.254.16 | October, 31 2005 09:25:42+0000 | 2 | kpar |
kpar | 172.16.254.12 | October, 30 2005 10:46:31+0000 | 3 | kpar |
The CASE statement selects a value to set row_number. If the current row's user name is the same as the previous one's then the value will be one more than row_number it currently is. Otherwise it is set to 1. After that variable is set, the prev_username variable is set to the current row's user name.
Finally we can now use this to select the latest access for each user.
SET @row_number := 0; SET @prev_username := NULL; SELECT UserName, IP, TimeStamp FROM ( SELECT UserName, IP, TimeStamp, @row_number := CASE WHEN UserName = @prev_username THEN @row_number + 1 ELSE 1 END AS rank, @prev_username := UserName FROM log ORDER BY UserName, TimeStamp DESC ) AS t WHERE rank = 1
USERNAME | IP | TIMESTAMP |
---|---|---|
jlor | 172.16.254.19 | October, 31 2005 12:41:14+0000 |
kpar | 172.16.254.20 | November, 01 2005 07:15:15+0000 |
Notice that you can even find when the second to last time an access was made by changing the 1 in the WHERE statement to a 2.
You can experiment with this in this SQL Fiddle.
No comments:
Post a Comment