T-SQL Tuesday 23 – Key Lookups

This post is brought to you by the letter “K”. Those of you that have been following my blog and tweets recently will know all about the SQLRelay event that is happening in the UK this week. Tonight is the second of a four night series of events culminating on the 6th October in London where none other then Itzik Ben-Gan will be headlining. There will also be a session by all of the UK User Group leaders who are doing an A – Z of SQL Server where I volunteered to take stage for the letter – you guessed it – “K”.

So what do Key Lookups have to do with joins which is the basis of this T-SQL Tuesday? Well, according to BOL “Key Lookup is always accompanied by a Nested Loops operator” and as we know this is the physical join that the optimiser chooses to perform the logical join written in the query.

Let’s run a query that will involve a key lookup to see if this is true, but before we do I’ll ask you to run this script which will set up the environment as everybody plays about with AdventureWorks. If you don’t have the AdventureWorks database you can download it from Codeplex.

Now we have built are indices we can look to see how they affect our query plans.

The first query I am going to run will cause the optimiser to choose to do a key lookup as the index being used does not cover all of the attributes being returned. Remember to turn on “Show actual execution plan” before running the query.

When you look at the query plan you can see that it has indeed used a Nested Loop operator. Note how expensive the key lookup operator was in this batch. It took up 85% of the cost of the batch!

image

As you can see from the properties of the key lookup below the base table was being accessed to return the DueDate attribute.

image

If we were to have an index that contained both the CustomerID and DueDate attributes we would be able to negate the need to perform this key lookup which would in turn produce a more efficient query. Let’s test this by running the following code:

With the two queries run in the same batch we can see how much more efficient the second query is, the reason for this is that it is no longer having to retrieve data from the base table. The index has satisfied the needs of the query.

image

So you can see just how much more efficient this is take a look at the messages tab which has recorded the amount of IO needed to return the resultset. I’m sure you’ll agree that the difference is staggering. This is just a contrived example imagine scaling this query out to 1 million rows and think of the overhead that is being saved.

image

The attentive people reading this will no doubt be wondering why there were three indices created at the beginning of the script. The reason for this is that there is still an expensive operator in our second query plan, if we change our index we can remove this bottleneck too.

image

By creating a composite index on Customer and DueDate in that order we are able to satisfy the ORDER BY requirement meaning that SQL Server does not have to perform this operation, with this index we are using just 3% of the cost of all three queries combined. From this you can really see the benefit of thinking carefully about the indices that you implement. There are some great savings to be had by creating indices, but you must also think about the overhead that they can cause to write performance.

One final word of warning, if you use SET STATISTICS IO ON as you’re main way of performance tuning you will be interested to see that there was no difference in the IO consumed between the second and third queries as the sort was performed in memory in this example:

image