Skip to content

Linq-to-sql articles

Optimizing Sum, Count, Min, Max and Average with LINQ  

LINQ is a great tool for C# programmers letting you use familiar syntax with a variety of backend systems without having to learn another language or paradigm for many query operations.

Ensuring that the queries still perform well can be a bit of a chore and one set that fails quite badly are the aggregate operations when you want more than one.

Multiple sequential queries (bad)

var count = db.Invoices.Count();
var total = db.Invoices.Sum(i => i.Paid);
var average = db.Invoices.Average(i => i.Paid);

Will issue three separate requests. There is nothing a LINQ provider can do to optimize that pattern as they are three discrete statements.

Background

If we wanted these values by country we could do this in LINQ:

var a = db.Invoices.GroupBy(i => i.Country)
          .Select(g => new { Country = g.Key,
                             Count = g.Count(),
                             Total = g.Sum(i => i.Paid),
                             Average = g.Average(i => i.Paid) });

Which gets us everything in a single statement broken down by country. In SQL this is:

SELECT Country, Count(*), Sum(Paid), Average(Paid)
    FROM Invoices GROUP BY Country

Many data sources including SQL are happy to provide aggregate values without a group by so how do we generate that from LINQ?

In the absence of a Group method that doesn’t take a property we need to fake it and because of the way many LINQ providers optimize out parts of the tree we can:

Single optimized query (good)

Replacing the property in a GroupBy with a constant value gives us an optimized single query:

var a = db.Invoices.GroupBy(i => 1)
          .Select(g => new { Count = g.Count(),
                             Total = g.Sum(i => i.Paid),
                             Average = g.Average(i => i.Paid) });

Here are the providers I’ve tried:

  • LINQ to Objects (Works although constant is likely evaluated)
  • LINQ to SQL (Works although passes 1 parameter to SQL)
  • Entity Framework 6 (Works although query is a little obscure)
  • Elasticsearch (Works and optimizes out totally)

Count+Where optimizations

If we are performing counts with a predicate or against a where we can also optimize these.

var high = db.Invoices.Count(i => i.Paid >= 1000);
var low = db.Invoices.Where(i => i.Paid < 1000).Count();
var sum = db.Invoices.Sum(i => i.Paid);

Then we can express this as:

var a = db.Invoices.GroupBy(g => 1)
          .Select(g => new { High = g.Count(i => i.Paid >= 1000),
                             Low = g.Count(i => i.Paid < 1000),
                             Sum = g.Sum(i => i.Paid) });

[)amien

Include for LINQ to SQL (and maybe other providers)  

It’s quite common that when you issue a query you’re going to want to join some additional tables.

In LINQ this can be a big issue as associations are properties and it’s easy to end up issuing a query every time you hit one. This is referred to as the SELECT N+1 problem and tools like EF Profiler can help you find them.

An example

Consider the following section of C# code that displays a list of blog posts and also wants the author name.

foreach(Post post in db.Posts)
  Console.WriteLine("{0} {1}", post.Title, post.Author.Name);

This code looks innocent enough and will issue a query like “SELECT * FROM [Posts]” but iterating over the posts causes the lazy-loading of the Author property to trigger and each one may well issue a query similar to “SELECT * FROM [Authors] WHERE [AuthorID] = 1”.

In the case of LINQ to SQL it’s not always an extra load as it will check the posts AuthorID foreign key in its internal identity map (cache) to see if it’s already in-memory before issuing a query to the database.

LINQ to SQL’s LoadWith

Most object-relational mappers have a solution for this – Entity Framework’s ObjectQuery has an Include operator (that alas takes a string), and NHibernate has a fetch mechanism. LINQ to SQL has LoadWith which is used like this:

var db = new MyDataContext();
var dlo = new DataLoadOptions();
dlo.LoadWith<Posts>(p => p.Blog);
db.LoadOptions = dlo;

This is a one-time operation for the lifetime of this instance of the data context which can be inflexible and LoadWith has at least one big bug with inheritance issuing multiple joins.

A flexible alternative

This got me thinking and I came up with a useful extension method to provide Include-like facilities on-demand in LINQ to SQL (and potentially other LINQ providers depending on what they support) in .NET 4.0.

public static IEnumerable<T> Include<T, TInclude>(this IQueryable<T> query, Expression<Func<T, TInclude>> sidecar) {
   var elementParameter = sidecar.Parameters.Single();
   var tupleType = typeof(Tuple<T, TInclude>);
   var sidecarSelector =  Expression.Lambda<Func<T, Tuple<T, TInclude>>>(
      Expression.New(tupleType.GetConstructor(new[] { typeof(T), typeof(TInclude) }),
         new Expression[] { elementParameter, sidecar.Body  },
         tupleType.GetProperty("Item1"), tupleType.GetProperty("Item2")), elementParameter);
   return query.Select(sidecarSelector).AsEnumerable().Select(t => t.Item1);
}

To use simply place at the end of your query and specify the property you wish to eager-load, e.g.

var oneInclude = db.Posts.Where(p => p.Published).Include(p => p.Blog));
var multipleIncludes = db.Posts.Where(p => p.Published).Include(p => new { p.Blog, p.Template, p.Blog.Author }));

This technique only works for to-one relationships not to-many. It is also quite untested so evaluate it properly before using it.

How it works

How it works is actually very simple – it projects into a Tuple that contains the original item and all additional loaded elements and then just returns the query back the original item. It is a dynamic version of:

var query = db.Posts.Where(p => p.Published)
   .Select(p => new Tuple<Post, Blog>(p, p.Blog))
   .AsEnumerable()
   .Select(t => t.Item1);

This is why it has to return IEnumerable<T> and belong at the end (and the use of Tuple is why it is .NET 4.0 only although that should be easy enough to change). Not all LINQ providers will necessarily register the elements with their identity map to prevent SELECT N+1 on lazy-loading but LINQ to SQL does :)

[)amien

LINQ to SQL tips and tricks #3  

A few more interesting and lesser-known LINQ to SQL techniques.

Lazy loading with stored procedures

LINQ to SQL supports stored procedures for retrieving entities, insert, update and delete operations but you can also use them to perform lazy-loading of navigation properties.

Lets show an example of a bi-directional relationship between a Post and a Comment. We have two stored procedures shown below and we bring them into the DBML by dragging them from Server Explorer into the LINQ to SQL designer surface and we set the return type property for each to the expected entity (Post and Comment respectively).

CREATE PROCEDURE LoadPost (@PostID int) AS SELECT * FROM Posts WHERE ID = @PostID
CREATE PROCEDURE LoadComments(@PostID int) AS SELECT * FROM Comments WHERE Parent_Post_ID = @PostID

This generates two method stubs named LoadPost and LoadComments that we can use to programatically retrieve entities:

var post = dataContext.LoadPost(1).First();
Console.WriteLine("{0}", post.Title);

Now to replace LINQ to SQL’s lazy-loading query generation we add  methods to the data context subclass with a specific signature.

partial class DataClasses1DataContext {
    protected IEnumerable<Comment> LoadComments(Post post) {
        return this.LoadComments(post.ID);
    }

    protected Post LoadParentPost(Comment comment) {
        return this.LoadPost(comment.Post_ID).First();
    }
}

To get the signature of the method names right:

  1. Visibility can be anything (protected or private is recommended)
  2. Return type must be the type of the other side of the association (wrapped in IEnumerable<T> when that side can be many)
  3. Method name must start with the word “Load”
  4. Method name must then continue with the name of the navigation property you want to intercept
  5. Parameter type must be the type that has the named navigation property (step 4)

Storing and retrieving binary files

LINQ to SQL supports the SQL Server’s varbinary type but storing something practical like a file in there isn’t so clear. Map your varbinary(max) column from your table into your entity which will expose the column as the special System.Data.Linq.Binary type (effectively a wrapper for a byte array but better change tracking).

File to database

To store a file in the database just read those bytes in and assign them to the property (Binary knows how to create itself from a byte array automatically). e.g.

string readPath = @"c:\test.jpg";
var storedFile = new StoredFile();
storedFile.Binary = File.ReadAllBytes(readPath);
storedFile.FileName = Path.GetFileName(readPath);
data.StoredFiles.InsertOnSubmit(storedFile);

I recommend storing the file name as well as the binary contents for two reasons. Firstly writing the file back to disk or streaming it to a browser will require you know the file type (e.g. .jpg or image/jpeg) and secondly nobody likes downloading a a file called ‘download’ or ‘1’ :)

Database to file

Writing the file back to disk is just as easy although you have to use the ToArray() method of System.Data.Linq.Binary to turn it back into a byte array.

string writePath = @"c:\temp";
var storedFile = data.StoredFiles.First();
File.WriteAllBytes(Path.Combine(writePath, storedFile.FileName), storedFile.Binary.ToArray());
Always ensure when writing to the file system based on data that your filenames are sanitized! You don’t want users overwriting important files on your system.

Multiple databases with a single context

Contrary to popular belief you can in fact access entities from multiple databases with a single data context providing they live on the same server. This isn’t supported but I’ve used it on my own projects without issue :)

The first part is the tricky bit which involves getting the definition of your entity into your DBML. You have two options here:

Create a temporary view

If you have the rights you can temporarily create views in your primary database for each table in your non-primary database.

CREATE VIEW MyOtherTable AS SELECT * FROM MyOtherDatabase.dbo.MyOtherTable

Once the views are created add them to your DBML by dragging them from Server Explorer into the LINQ to SQL designer surface and delete the views you created from the database.

Create a temporary DBML

If you can’t or don’t want to create temporary views then add a second (temporary) LINQ to SQL classes file (DBML) to your project. Use Server Explorer to find your secondary database and drag all the tables you will want to access to the LINQ to SQL designer surface.

Now save & close open files and use the right-mouse-button context menu to Open With… and choose XML Editor on your original DBML and the new temporary one. Head to the Window menu and select New Vertical Tab Group to make the next step easier.

Looking through the DBML you will see each entity has a <Table> block inside the <Database>. Select all the Table tags and their children (but not Database or Connection) and copy/paste them into your existing DBML file. Then close the files and check all looks well in the designer again.

If it does, delete the temporary DBML file you created. If not go back and check the DBML file for duplicate names, mismatched XML etc.

Finally, the easy bit

Open the designer and for each table that comes from the other database select it and change the Source property in the Properties window from dbo.MyOtherTable to MyOtherDatabase.dbo.MyOtherTable.

Hit play and run!

Check out part 1 of LINQ to SQL tips

[)amien

SQL Server query plan cache – what is it and why should you care?  

What is a query plan?

SQL Server like all databases goes through a number of steps when it receives a command. Besides parsing and validating the command text and parameters it looks at the database schema, statistics and indexes to come up with a plan to efficiently query or change your data.

You can view the plan SQL Server comes up with for a given query in SQL Management Studio by selecting Include Actual Execution Plan from the Query menu before running your query.

A query plan in SQL Managment Studio

Show me the cache!

Query plans are cached so subsequent identical operations can reuse them for further performance gains. You can see the query plans in use on your server with the following SQL:

SELECT objtype, p.size_in_bytes, t.[text], usecounts
     FROM sys.dm_exec_cached_plans p
     OUTER APPLY sys.dm_exec_sql_text (p.plan_handle) t
     WHERE objtype IN ('Prepared', 'Adhoc')
     ORDER BY usecounts DESC

Hitting the cache

DBAs know the value in hitting the query plan often and this is one of the reasons they like stored procedures. You can however achieve the same thing with parameterized queries providing the query text and the parameter definitions are identical so you can execute the same thing over and over again just with different parameters.

If your ORM uses parameterized queries then it too can take advantage of it but it is important to remember the query definition and parameters need to be identical for this to happen.

How this applies to ORMs

In .NET 3.5SP1 both LINQ to SQL and Entity Framework did not set the length of variable type parameters (varchar, nvarchar, text, ntext and varbinary) so SQL Client sets it to the actual content length. This means the cache is often missed and instead populated with plans that are different only in the parameter lengths.

In .NET 4.0 variable length parameters now honour the defined length in both LINQ to SQL and Entity Framework where possible or fall back to the maximum length when the actual content doesn’t fit in the defined length.

[)amien