Posts tagged with sql-server

SQL Server query plan cache – what is it and why should you care?

What is a query plan?

SQL Server like all databases goes through a number of steps when it receives a command. Besides parsing and validating the command text and parameters it looks at the database schema, statistics and indexes to come up with a plan to efficiently query or change your data.

You can view the plan SQL Server comes up with for a given query in SQL Management Studio by selecting Include Actual Execution Plan from the Query menu before running your query.

A query plan in SQL Management Studio

Show me the cache!

Query plans are cached so subsequent identical operations can reuse them for further performance gains. You can see the query plans in use on your server with the following SQL:

SELECT objtype, p.size_in_bytes, t.[text], usecounts
   FROM sys.dm_exec_cached_plans p
   OUTER APPLY sys.dm_exec_sql_text (p.plan_handle) t
   WHERE objtype IN ('Prepared', 'Adhoc')
   ORDER BY usecounts DESC

Hitting the cache

DBAs know the value in hitting the query plan often and this is one of the reasons they like stored procedures. You can however achieve the same thing with parameterized queries providing the query text and the parameter definitions are identical so you can execute the same thing over and over again just with different parameters.

If your ORM uses parameterized queries then it too can take advantage of it but it is important to remember the query definition and parameters need to be identical for this to happen.

How this applies to ORMs

In .NET 3.5SP1 both LINQ to SQL and Entity Framework did not set the length of variable type parameters (varchar, nvarchar, text, ntext and varbinary) so SQL Client sets it to the actual content length. This means the cache is often missed and instead populated with plans that are different only in the parameter lengths.

In .NET 4.0 variable length parameters now honour the defined length in both LINQ to SQL and Entity Framework where possible or fall back to the maximum length when the actual content doesn’t fit in the defined length.


DLookup for Excel

I had to do a couple of ad-hoc Excel jobs today and found that whilst Excel has a VLookup function for spreadsheet/ranges it doesn’t have one for databases.

It’s been a while since I touched VBA and Access but the DLookup function was quite useful so here it is for Excel. Read the warnings below before you use it!

Public Function DLookup(expression As String, domain As String, criteria As String, Optional connectionString As String)

    Dim con As ADODB.connection
    Set con = New ADODB.connection
    If IsMissing(connectionString) Then
        con.Open "Driver={SQL Server};Server=abc;Database=def;Trusted_Connection=Yes;"
        con.Open connectionString
    End If

    Dim rs As ADODB.Recordset
    Dim sql As String
    sql = "SELECT TOP 1 " + expression + " FROM " + domain + " WHERE " + criteria
    Set rs = con.Execute(sql)
    If Not rs.EOF Then DLookup = rs(0)
    Set rs = Nothing

    Set con = Nothing

End Function

Copy and paste the code above into an Excel module and replace the “Driver=… with what you want to be your default connection.

Some examples of use:

To select the Name column from the Customers table where the numeric ID is held in column 2.

=DLookup("Name","Customers","ID=" & A2)

To return Forename and Surname columns joined together with a space from the Contacts table in the crm database using the postcode from column 5

=DLookup("Forname + ' ' + Surname","crm..Contact","PostCode='" & A5 & "'")

To return Description from a Products table held in another connection using a partial like match from column 1:

=DLookup("Description","Products","OrderCode LIKE '" & A1 & "%'", "DSN=SecondaryDSN")

I think you get the idea!

There are a few caveats you need to be aware of using this function:</p>
  • It only returns the first value of the first row
  • DLookup’s are not efficient opening a connection each time it is referenced. This could put a lot of strain on your server.
  • DLookup syntax is open-ended and has some attack vectors.

If that doesn’t scare you then give it a shot. As usual no warranty expressed or implied!


When SQL Server replication eats disk space

Part of my job involves revising our SQL Server architecture. My plan includes the addition of a read-only reporting SQL pair for non-critical inquiries and reports. This allows the heavy and unpredictable load from reporting away from from the primary SQL pair responsible for critical operations (shipping orders).

We utilized SQL Server’s publisher-subscriber replication on the required databases which, given their legacy nature, had some cross-database dependencies that were added without due consideration.

The reporting SQL pair needed to be rebuilt so the subscriptions were removed whilst they were being rebuilt. Within three days our primary servers were out of disk space and we had a problem that needed to be solved quickly.

Our first step was to remove the new publications. This released a small amount of space in the distribution database (now 100GB) and allowed the replicated databases logs to be truncated and shrunk. We now had a little breathing room.

One publication would not remove in a reasonable time (10+ hours) and even trying to remove individual articles caused a slow-running operation sat on PAGEIOLATCH_SH (which Microsoft describe as waiting on disk IO). A quick investigation revealed a complex query on internal system tables with no indexes – presumably for fast insert performance.

In the mean-time this slow-running operation would block the replication agents which are also used by IBM’s DataMirror product to replicate data back to the iSeries (AS/400) in real time. This blocking would prevent us shipping orders and was therefore unacceptable.

The database in question is a home-grown data warehousing application rebuilt each night and on reflection was a terrible idea to replicate using this method. Each night it generated over twice it’s own size in replication activity. Given we have a 72 hour replication retention and the size of the database that alone was enough to wipe out the disk space on our server within 3 days.

Reinitialize All Subscriptions was the magic bullet which ran in just a minute allowing the publication to be subsequently instantly deleted. The replicated databases transaction logs were then truncated and shrunk giving us ample breathing space on the storage front.

The distribution database was still huge but heading into Replication Monitor’s Common Jobs and starting the Distribution clean up: distribution job was our final step.

The job might take all day but did not block any other replication activity :)


SQL Server replication blocking on cleanup job

For some time my primary workplace has been having a problem with SQL Server replication delaying for several minutes at a time which is surprising given the 12GB of RAM and quad processor hardware behind it.

Activity Monitor showed a number of processes on the distribution database used for SQL Server’s replication were blocked by another process which was in turn blocked by a SQL Agent – TSQL Job executing on the distribution database.

This process’s job GUID that did not match any of our jobs and appeared to be an internal job that regularly executes:

EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_disretention = 32760

This would sit on the PAGEIOLATCH_SH status for over 10 minutes at a time most most likely because it was operating on an MSrepl_commands table with 3.4 million records and a MSrepl_transactions table of 600k records.

The default max_disretention is 72 (hours) and specifies how long to hold the replication history for. This means if a subscriber is down for longer than this period it will loose transactions.

Quite how this system obtained the value 32760 (3.7 years) is a bit of a mystery and I’m assuming that it actually uses a lower value as there have been more than 3.4 million updates in that period.

Setting this value to 84 hours (3.5 days) in Management Studio’s Replication > Distributor Properties > Transaction Retention soon dropped those large MS tables down to 406 records and 300 respectively and resulted in sp\_MSdistribution\_cleanup executing in a couple of seconds.

It’s a very specific scenario but perhaps this blog post will be of use to somebody else!