T-SQL Puzzler

UPDATE: I have included an alternate solution from one of my readers!

I have presented on T-SQL Window Functions at least 30 times over the past five years. I love the comments I get after the presentations. Without fail, at least one person will tell me how the session helped them figure out the solution to a problem they have been struggling with for months. Sometimes, a person will ask my advice about a particular query. At least for me, it’s difficult to just come up with a solution in a couple of minutes while I am trying to clear the area for the next presenter. At Cleveland SQL Saturday, a gentleman presented me with a puzzle that I really enjoyed solving. He gave me permission to share it with you.

The problem involves a call center. Each employee answers calls until time for the next employee to go on duty. Sometimes, an employee will come back later in the day. For example, employee 1 has two shifts on the first day.

Here is a script to create a table and populate it with some sample data:

CREATE TABLE calllist(empID int, CallDate date, CallTime time);

INSERT INTO calllist
VALUES (1,’1/1/2017′,’9:15′),(1,’1/1/2017′,’9:18′),(1,’1/1/2017′,’9:20′),(1,’1/1/2017′,’12:00′),(2,’1/1/2017′,’13:00′),(2,’1/1/2017′,’13:15′),(2,’1/1/2017′,’13:18′),(2,’1/1/2017′,’13:20′),(1,’1/1/2017′,’15:00′),(1,’1/1/2017′,’15:15′),(3,’1/1/2017′,’18:00′),(3,’1/1/2017′,’18:15′),(3,’1/1/2017′,’18:30′),(3,’1/2/2017′,’9:15′),(3,’1/2/2017′,’9:18′),(3,’1/2/2017′,’9:20′),(3,’1/2/2017′,’12:00′),(1,’1/2/2017′,’13:00′),(1,’1/2/2017′,’13:15′),(1,’1/2/2017′,’13:18′),(2,’1/2/2017′,’13:20′),(1,’1/2/2017′,’15:00′),(1,’1/2/2017′,’15:15′),(3,’1/2/2017′,’17:00′),(3,’1/2/2017′,’17:15′),(3,’1/2/2017′,’19:30′);

The required outcome will display each employee shift along with the first and last call:

ntsqp1

To come up with a solution, I first thought about how to solve it with a traditional cursor. I won’t provide a cursor solution here, but it would involve looping through the rows in order and finding the first and last employee rows. When the last EmpID doesn’t match the next EmpID, then the previous row is the last call for the previous EmpID and the current row is the first call for the next EmpID.

NOTE: The solution to this problem assumes that there are at least two calls per shift.

In 2012, four T-SQL window functions became available that let you look at expressions from different rows: LAG, LEAD, FIRST_VALUE, and LAST_VALUE. For this puzzle, LAG and LEAD seemed to be the answer.  In this case, I used the optional default parameter to replace any NULLs with -1.

SELECT *, 
    LAG(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime)   
    AS   PrevEmpID, 
    LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime)  AS NextEmpID
FROM calllist
ORDER BY CallDate, CallTime;

Here are the partial results:

ntsqp2

The next step is to compare the current EmpID with the previous and next values. When they don’t match, then the row is either the first call or the last call for that employee. To solve this in a step-by-step manner, the previous query is moved to a common table expression (CTE).

WITH CTE1 AS (
    SELECT *,
       Lag(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS  
         PrevEmpID,
    LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime)  AS NextEmpID
FROM calllist
)
SELECT empID, CallDate, CallTime,
    CASE WHEN EmpID  PrevEmpID THEN 1 END AS FirstCall,
    CASE WHEN EmpID  NextEmpID THEN 1 END AS LastCall
FROM CTE1;

ntsqp3

The next step is to eliminate the rows that are not a first or last call. Once again, the previous outer query is added to a CTE.

WITH CTE1 AS (
    SELECT *,
      LAG(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS     
         PrevEmpID,
      LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS   
         NextEmpID
FROM calllist
),
CTE2 AS (
    SELECT empID, CallDate, CallTime,
       CASE WHEN EmpID  PrevEmpID THEN 1 END AS FirstCall,
       CASE WHEN EmpID  NextEmpID THEN 1 END AS LastCall
FROM CTE1)
SELECT empID, CallDate, CallTime, FirstCall, LastCall
FROM CTE2
WHERE FirstCall IS NOT NULL OR LastCall IS NOT NULL;

ntsqp4

Now we can take advantage of the LEAD function again to grab the CallTime from the following row. Again, the previous outer query is a new CTE level.

WITH CTE1 AS (
    SELECT *,
       LAG(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS    
         PrevEmpID,
       LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS 
         NextEmpID
FROM calllist
),
CTE2 AS (
    SELECT empID, CallDate, CallTime,
       CASE WHEN EmpID  PrevEmpID THEN 1 END AS FirstCall,
       CASE WHEN EmpID  NextEmpID THEN 1 END AS LastCall
FROM CTE1),
CTE3 AS (
    SELECT empID, CallDate, CallTime, FirstCall, LastCall
    FROM CTE2
    WHERE FirstCall IS NOT NULL OR LastCall IS NOT NULL)
SELECT EmpID, CallDate, CallTime, FirstCall,
    LEAD(CallTime) OVER(PARTITION BY CallDate ORDER BY CallTime) AS NextCallTime
FROM CTE3;

ntsqp5

The final step is to eliminate the rows where FirstCall is null.

WITH CTE1 AS (
    SELECT *,
       LAG(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS
         PrevEmpID,
       LEAD(empID,1,-1) OVER(PARTITION BY CallDate ORDER BY calltime) AS   
         NextEmpID
FROM calllist
),
CTE2 AS (
    SELECT empID, CallDate, CallTime,
        CASE WHEN EmpID  PrevEmpID THEN 1 END AS FirstCall,
        CASE WHEN EmpID  NextEmpID THEN 1 END AS LastCall
FROM CTE1),
CTE3 AS (
    SELECT empID, CallDate, CallTime, FirstCall, LastCall
    FROM CTE2
    WHERE FirstCall IS NOT NULL OR LastCall IS NOT NULL),
CTE4 AS (
    SELECT EmpID, CallDate, CallTime, FirstCall,
    LEAD(CallTime) OVER(PARTITION BY CallDate ORDER BY CallTime) AS NextCallTime
FROM CTE3)
SELECT EmpID, CallDate, CallTime AS FirstCallTime, NextCallTime AS LastCallTime
FROM CTE4
WHERE FirstCall IS NOT NULL;

Here is the final result:

ntsp6

It’s possible to eliminate one CTE level by using LEAD to find the next call time for every row in step 1. Maybe there is even a better way to solve this query. Add a comment if you have figured out another solution.

To learn more about T-SQL window functions, check out my book or Pluralsight course!

After posting this, one of my readers, Uri Dimant, contacted me about another solution. His solution is a mix of CTE and nested subquery, so I have rearranged it to just use CTEs. There is no difference really, except that I think it is easier to understand with CTEs. This solution is very similar to the Islands technique:

WITH cte1 AS ( 
	SELECT *,ROW_NUMBER () OVER (ORDER BY CallDate,CallTime) ordinal 
	FROM calllist ), 	
cte2 AS (	
	SELECT *, ordinal - ROW_NUMBER() OVER(PARTITION BY empid ORDER BY ordinal) AS grp 
	FROM cte1 )
SELECT empid,CallDate,MIN(CallTime) FirstCallTime,MAX(CallTime) LastCallTime 
FROM cte2 
GROUP BY CallDate,empid,grp ORDER BY MIN(ordinal) ;

This solution uses the difference in two ROW_NUMBERS to create groupings. Once you have the groups, you can just get the MIN and MAX. This solution is very straightforward. At least for me, I understand the solution that I came up with, but both are valid.

About Kathi Kellenberger

I am a SQL Server professional working for Linchpin People. I love talking about SQL Server to anyone who will listen, just ask my two year old granddaughter. I love to write and teach. I am so humbled and honored to say that I made SQL Server MVP in 2013.
This entry was posted in T-SQL, T-SQL Window Functions. Bookmark the permalink.

2 Responses to T-SQL Puzzler

  1. Kevin Woodward says:

    Couldn’t you just have gotten min and max group by day and emp?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s