When the Performance of LAG Lags Behind

My favorite T-SQL windowing functions are LAG and LEAD. LAG and LEAD let you take a peek at another row and grab any column you need in your results. They are easy to use and perform great…except when they don’t.

There is one situation that kills the performance of LAG and LEAD, and that is by using a variable instead of a hard coded value for the offset. “What’s an offset?” you are probably asking.

By default, LAG lets you grab a value from the previous row, and LEAD lets you grab a value from the following row. That is an offset of 1 row. If you specify a value for the  optional offset parameter, you can access a row that is further away. Let’s take a look at an example:

SELECT CustomerID, SalesOrderID, OrderDate, LAG& (OrderDate)   OVER(PARTITION BY CustomerID ORDER BY OrderDate) AS   Prev1OrderDate,
LAG(OrderDate,2) OVER(PARTITION BY CustomerID ORDER BY OrderDate) AS Prev2OrderDate
FROM Sales.SalesOrderHeader
WHERE CustomerID IN (29825,30118);

lag1

If, for some reason, I needed to use a variable instead of a hard coded value for the offset the query will still work, but there will be a performance hit. In the previous example, I filtered to show a couple of customers with lots of orders. In the next example, I’ll remove the WHERE clause and compare the performance difference when a variable is used for the offset.

SET STATISTICS IO ON;
GO
SELECT CustomerID, SalesOrderID, OrderDate,
LAG(OrderDate, 1) OVER(PARTITION BY CustomerID ORDER BY OrderDate) AS Prev1OrderDate
FROM Sales.SalesOrderHeader;
DECLARE @offset INT = 1;
SELECT CustomerID, SalesOrderID, OrderDate,
LAG(OrderDate, @offset) OVER(PARTITION BY CustomerID ORDER BY OrderDate) AS Prev1OrderDate
FROM Sales.SalesOrderHeader;

To see the difference, let’s take a look at the IO used by each query:

lag3

The query with the hard coded offset used 689 logical reads while the query with the variable used 689 + 188791 logical reads. That’s a big difference!

So, what’s going on here? SQL Server creates the worktable in memory in the first query while it is created on disk in tempdb in the second. Under the covers, LAG and LEAD are actually modified FIRST_VALUE and LAST_VALUE functions. FIRST_VALUE and LAST_VALUE queries require framing. (To refresh your memory about framing see this post.)  When the database engine doesn’t know upfront how many rows will be in the frame, it creates the worktable in tempdb instead of in memory. When the worktable lives in memory, it’s almost magic, with no IO or locking.

Hopefully, using a variable as an offset is a pretty rare occurrence. When the default or a hard coded offset is used, the performance of LAG and LEAD rock!

 

 

 

 

 

About Kathi Kellenberger

I am the editor of the online journal Simple Talk. I love talking about SQL Server to anyone who will listen, just ask my five year old granddaughter. I love to write and teach. I am so humbled and honored to say I am a Data Platform MVP, a volunteer with LaunchCode and co-leader of PASS Women in Technology Virtual Group.
This entry was posted in T-SQL, T-SQL Window Functions. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s