UPDATE: I have included an alternate solution from one of my readers!
I have presented on T-SQL Window Functions at least 30 times over the past five years. I love the comments I get after the presentations. Without fail, at least one person will tell me how the session helped them figure out the solution to a problem they have been struggling with for months. Sometimes, a person will ask my advice about a particular query. At least for me, it’s difficult to just come up with a solution in a couple of minutes while I am trying to clear the area for the next presenter. At Cleveland SQL Saturday, a gentleman presented me with a puzzle that I really enjoyed solving. He gave me permission to share it with you.
The problem involves a call center. Each employee answers calls until time for the next employee to go on duty. Sometimes, an employee will come back later in the day. For example, employee 1 has two shifts on the first day.
Here is a script to create a table and populate it with some sample data:
CREATE TABLE calllist(empID int, CallDate date, CallTime time);
INSERT INTO calllist
VALUES (1,’1/1/2017′,’9:15′),(1,’1/1/2017′,’9:18′),(1,’1/1/2017′,’9:20′),(1,’1/1/2017′,’12:00′),(2,’1/1/2017′,’13:00′),(2,’1/1/2017′,’13:15′),(2,’1/1/2017′,’13:18′),(2,’1/1/2017′,’13:20′),(1,’1/1/2017′,’15:00′),(1,’1/1/2017′,’15:15′),(3,’1/1/2017′,’18:00′),(3,’1/1/2017′,’18:15′),(3,’1/1/2017′,’18:30′),(3,’1/2/2017′,’9:15′),(3,’1/2/2017′,’9:18′),(3,’1/2/2017′,’9:20′),(3,’1/2/2017′,’12:00′),(1,’1/2/2017′,’13:00′),(1,’1/2/2017′,’13:15′),(1,’1/2/2017′,’13:18′),(2,’1/2/2017′,’13:20′),(1,’1/2/2017′,’15:00′),(1,’1/2/2017′,’15:15′),(3,’1/2/2017′,’17:00′),(3,’1/2/2017′,’17:15′),(3,’1/2/2017′,’19:30′);
The required outcome will display each employee shift along with the first and last call:
To come up with a solution, I first thought about how to solve it with a traditional cursor. I won’t provide a cursor solution here, but it would involve looping through the rows in order and finding the first and last employee rows. When the last EmpID doesn’t match the next EmpID, then the previous row is the last call for the previous EmpID and the current row is the first call for the next EmpID.
NOTE: The solution to this problem assumes that there are at least two calls per shift.
In 2012, four T-SQL window functions became available that let you look at expressions from different rows: LAG, LEAD, FIRST_VALUE, and LAST_VALUE. For this puzzle, LAG and LEAD seemed to be the answer. In this case, I used the optional default parameter to replace any NULLs with -1.
SELECT *, LAG(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS PrevEmpID, LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS NextEmpID FROM calllist ORDER BY CallDate, CallTime;
Here are the partial results:
The next step is to compare the current EmpID with the previous and next values. When they don’t match, then the row is either the first call or the last call for that employee. To solve this in a step-by-step manner, the previous query is moved to a common table expression (CTE).
WITH CTE1 AS ( SELECT *, Lag(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS PrevEmpID, LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS NextEmpID FROM calllist ) SELECT empID, CallDate, CallTime, CASE WHEN EmpID PrevEmpID THEN 1 END AS FirstCall, CASE WHEN EmpID NextEmpID THEN 1 END AS LastCall FROM CTE1;
The next step is to eliminate the rows that are not a first or last call. Once again, the previous outer query is added to a CTE.
WITH CTE1 AS ( SELECT *, LAG(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS PrevEmpID, LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS NextEmpID FROM calllist ), CTE2 AS ( SELECT empID, CallDate, CallTime, CASE WHEN EmpID PrevEmpID THEN 1 END AS FirstCall, CASE WHEN EmpID NextEmpID THEN 1 END AS LastCall FROM CTE1) SELECT empID, CallDate, CallTime, FirstCall, LastCall FROM CTE2 WHERE FirstCall IS NOT NULL OR LastCall IS NOT NULL;
Now we can take advantage of the LEAD function again to grab the CallTime from the following row. Again, the previous outer query is a new CTE level.
WITH CTE1 AS ( SELECT *, LAG(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS PrevEmpID, LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS NextEmpID FROM calllist ), CTE2 AS ( SELECT empID, CallDate, CallTime, CASE WHEN EmpID PrevEmpID THEN 1 END AS FirstCall, CASE WHEN EmpID NextEmpID THEN 1 END AS LastCall FROM CTE1), CTE3 AS ( SELECT empID, CallDate, CallTime, FirstCall, LastCall FROM CTE2 WHERE FirstCall IS NOT NULL OR LastCall IS NOT NULL) SELECT EmpID, CallDate, CallTime, FirstCall, LEAD(CallTime) OVER(PARTITION BY CallDate ORDER BY CallTime) AS NextCallTime FROM CTE3;
The final step is to eliminate the rows where FirstCall is null.
WITH CTE1 AS ( SELECT *, LAG(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS PrevEmpID, LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS NextEmpID FROM calllist ), CTE2 AS ( SELECT empID, CallDate, CallTime, CASE WHEN EmpID PrevEmpID THEN 1 END AS FirstCall, CASE WHEN EmpID NextEmpID THEN 1 END AS LastCall FROM CTE1), CTE3 AS ( SELECT empID, CallDate, CallTime, FirstCall, LastCall FROM CTE2 WHERE FirstCall IS NOT NULL OR LastCall IS NOT NULL), CTE4 AS ( SELECT EmpID, CallDate, CallTime, FirstCall, LEAD(CallTime) OVER(PARTITION BY CallDate ORDER BY CallTime) AS NextCallTime FROM CTE3) SELECT EmpID, CallDate, CallTime AS FirstCallTime, NextCallTime AS LastCallTime FROM CTE4 WHERE FirstCall IS NOT NULL;
Here is the final result:
It’s possible to eliminate one CTE level by using LEAD to find the next call time for every row in step 1. Maybe there is even a better way to solve this query. Add a comment if you have figured out another solution.
To learn more about T-SQL window functions, check out my book or Pluralsight course!
After posting this, one of my readers, Uri Dimant, contacted me about another solution. His solution is a mix of CTE and nested subquery, so I have rearranged it to just use CTEs. There is no difference really, except that I think it is easier to understand with CTEs. This solution is very similar to the Islands technique:
WITH cte1 AS ( SELECT *,ROW_NUMBER () OVER (ORDER BY CallDate,CallTime) ordinal FROM calllist ), cte2 AS ( SELECT *, ordinal - ROW_NUMBER() OVER(PARTITION BY empid ORDER BY ordinal) AS grp FROM cte1 ) SELECT empid,CallDate,MIN(CallTime) FirstCallTime,MAX(CallTime) LastCallTime FROM cte2 GROUP BY CallDate,empid,grp ORDER BY MIN(ordinal) ;
This solution uses the difference in two ROW_NUMBERS to create groupings. Once you have the groups, you can just get the MIN and MAX. This solution is very straightforward. At least for me, I understand the solution that I came up with, but both are valid.
Couldn’t you just have gotten min and max group by day and emp?
Actually, no. Notice that employee 1 has two shifts on day one.