Writing queries without data

Sometimes it’s not possible to have the data you need in order to create the most efficient queries possible. This may be because you don’t have a machine with enough storage capacity, it might even be on a laptop so there is also the security aspect. So how do you address the issue of writing queries with no data to work with?

Luckily there are a couple of ways that this problem can be overcome.

  1. Use a data generator.
    This sounds great in theory, however the data you will then be querying against will bare no resemblance to the data you will be querying against in production (hopefully a dev/uat/pre-production server first!). The reason for this is that the statistics will be completely different, the same problem occurs when you use obfuscated data. If only there were a way you could query against the statistics themselves, oh wait, there is!
  2. Export the production database schema and it’s statistics.
    You may not have access to do this in your live environment, so ask your friendly DBA to do it for you. If you don’t have any third party analysis tools it might be a good idea to do this once a month and keep copies so you can see how queries change over time.

 

In this example I will export the schema for all database objects of the AdventureWorks database with the exception of extended properties and foreign keys. Foreign keys may cause you an issue here so it might be best to create them as a separate step at the end. I do recommend that you do add the foreign keys in if you are going to be using this methodology as constraints will be checked during insert and update statements as well as giving the optimizer extra options in certain select scenarios.

Open up SSMS and right click on the database you wish to export the statistics for, select “Tasks” and then “Generate Scripts…”:

image

Depending on your setting this may launch a wizard with the following welcome page. If this is displayed simply click Next:

image

In this example I am choosing to export data from the AdventureWorks database and have checked the option to script all objects:

image

The key part of the whole script is changing the “Script statistics” option to be “Script Statistics and histograms” as you can see this produces a warning, but in a database the size of AdventureWorks it really took no extra time at all.

image

The next two screens are pretty self explanatory, select your destination be it a new query or a file. Then review the selections you have made thus far.

In my example I chose to create a new database called AdventureWorks_Nodata and ran the script in the context of this new database to create a clone on AdventureWorks with no data but up to date statistics.

Once this is done your server will think that each table actually contains data, if you run the Disk Usage Top Tables Report on your new database you will see that SQL Server believes there are records and these records take up physical space:

TopTables

Brilliant, we’re now in a position to start writing and running some queries! Well not quite, remember we didn’t actually export any data other than the metadata. This means that to do any testing using this database you will need to be looking at the estimated execution plans. Estimated execution plans do have their own issues of course, but that’s a blog post for another time.

Let’s now write a simple query and view the estimated plan it creates form both our AdventureWorks and new database, as you can see I have taken the liberty of pasting the properties together into one screenshot for you so you can see in one image that the statistics are indeed the same:

image

As you can see the estimated query plan is identical, and so it should because the statistics are identical!

If we look at the actual query plan for the same query we can see that the estimated number of records is the same, but unsurprisingly the number of rows returned is not. Now we know what the statistics are we could use various performance tuning tricks to tune this query – all without the need for data.