Include for LINQ to SQL (and maybe other providers)

It’s quite common that when you issue a query you’re going to want to join some additional tables.

In LINQ this can be a big issue as associations are properties and it’s easy to end up issuing a query every time you hit one. This is referred to as the SELECT N+1 problem and tools like LINQ to SQL Profiler can help you find them.

An example

Consider the following section of C# code that displays a list of blog posts and also wants the author name.

foreach(Post post in db.Posts)
  Console.WriteLine("{0} {1}", post.Title, post.Author.Name);

This code looks innocent enough and will issue a query like “SELECT * FROM [Posts]” but iterating over the posts causes the lazy-loading of the Author property to trigger and each one may well issue a query similar to “SELECT * FROM [Authors] WHERE [AuthorID] = 1″.

In the case of LINQ to SQL it’s not always an extra load as it will check the posts AuthorID foreign key in its internal identity map (cache) to see if it’s already in-memory before issuing a query to the database.

LINQ to SQL’s LoadWith

Most object-relational mappers have a solution for this – Entity Framework’s ObjectQuery has an Include operator (that alas takes a string), and NHibernate has a fetch mechanism. LINQ to SQL has LoadWith which is used like this:

var db = new MyDataContext();
var dlo = new DataLoadOptions();
dlo.LoadWith<Posts>(p => p.Blog);
db.LoadOptions = dlo;

This is a one-time operation for the lifetime of this instance of the data context which can be inflexible and LoadWith has at least one big bug with inheritance issuing multiple joins.

A flexible alternative

This got me thinking and I came up with a useful extension method to provide Include-like facilities on-demand in LINQ to SQL (and potentially other LINQ providers depending on what they support) in .NET 4.0.

public static IEnumerable<T> Include<T, TInclude>(this IQueryable<T> query, Expression<Func<T, TInclude>> sidecar) {
   var elementParameter = sidecar.Parameters.Single();
   var tupleType = typeof(Tuple<T, TInclude>);
   var sidecarSelector =  Expression.Lambda<Func<T, Tuple<T, TInclude>>>(
      Expression.New(tupleType.GetConstructor(new[] { typeof(T), typeof(TInclude) }),
         new Expression[] { elementParameter, sidecar.Body  },
         tupleType.GetProperty("Item1"), tupleType.GetProperty("Item2")), elementParameter);
   return query.Select(sidecarSelector).AsEnumerable().Select(t => t.Item1);
}

To use simply place at the end of your query and specify the property you wish to eager-load, e.g.

var oneInclude = db.Posts.Where(p => p.Published).Include(p => p.Blog));
var multipleIncludes = db.Posts.Where(p => p.Published).Include(p => new { p.Blog, p.Template, p.Blog.Author }));

This technique only works for to-one relationships not to-many. It is also quite untested so evaluate it properly before using it.

How it works

How it works is actually very simple – it projects into a Tuple that contains the original item and all additional loaded elements and then just returns the query back the original item. It is a dynamic version of:

var query = db.Posts.Where(p => p.Published)
   .Select(p => new Tuple<Post, Blog>(p, p.Blog))
   .AsEnumerable()
   .Select(t => t.Item1);

This is why it has to return IEnumerable<T> and belong at the end (and the use of Tuple is why it is .NET 4.0 only although that should be easy enough to change). Not all LINQ providers will necessarily register the elements with their identity map to prevent SELECT N+1 on lazy-loading but LINQ to SQL does :)

[)amien

12 responses  

  1. Good info!
    1. Do you know if this will work for EF4? The string-based includes are a pet-peeve.
    2. “Not all LINQ providers will necessarily register the elements with their identity map to prevent SELECT N+1 on lazy-loading” – Do you mean that just because the object is materialised, that doesn’t mean that object will make it into the identity map? Is this another way of saying that the provider wont necessarily do relationship fix-up for you? I’m not sure I understand this point.

    Matt – May 21st, 2010
  2. I haven’t tried it with EF so not sure if it will work there.

    LINQ to SQL when it attempts to follow an association property first looks up the foreign key in it’s own internal dictionary (identity map). It also registers all projected elements it received from the database into the map. Both of these are required for this technique to work and there’s no guarantee that every LINQ provider does this so you’ll have to try it and see whether it is still SELECT N+1 or not.

    [)amien

    Damien GuardMay 21st, 2010
  3. Great idea! Will definitely try it out. I agree with the identity map/N+1 comment though.

    gunteman – May 24th, 2010
  4. pingback

    [...] pasikeitė prieš kelias savaites, kada Damien Guard (dirba Microsoft Data Platform padalinyje) savo straipsnyje aprašė dar vieną būdą, leidžianti išnaudoti tą patį DataContext. Damieno pavyzdys veikia [...]

    LINQ to SQL meistriškumo klasė #1 (LoadOptions)May 30th, 2010
  5. The Microsoft connect bug was from 2008. This bug still reminds in .NET 4.0 ??

    Cecil – June 16th, 2010
  6. Yes it still exists in .NET 4.0.

    Damien GuardJune 16th, 2010
  7. You said: “Not all LINQ providers”

    Hmm I’ve tought, that Linq2SQL does not support the provider model (only EF does). Has anything changed in .NET 4.0 and we can use custom providers (without nasty TransparentProxy/remoting tricks)?

    Matra

    matra – July 20th, 2010
  8. This trick does not rely upon any kind of provider mechanism – it relies on whether the LINQ provider optimizes out what is happening here and how it handles identity resolution.

    LINQ to SQL’s provider model is internal and will always be. The bulk of what constitutes LINQ to SQL is the SQL Provider – if you take that away you are left with very little (identity map, lazy-loading semantics).

    [)amien

    Damien GuardJuly 20th, 2010
  9. Thanks, I understand. With “Linq to SQL provider” you are reffering to the classes inside System.Data.Linq.SqlClient namespace (which can not be replaced).

    Matra

    matra – July 20th, 2010
  10. Just out of curiosity, why not use a projection to get the properties you need. That is, if you KNOW you’ll need Post.Author.Name when you retrieve post why not just say:

    foreach(var projection in db.Posts.Select(p=>new {post.Title, AuthorName=post.Author.Name}))
    Console.WriteLine(“{0} {1}”, projection.Title, projection.AuthorName);

    I believe that the projection will be resolved by the L2S provider (and it also works with EF) and be transformed to the appropriate SQL. That being said, you’ve got a pretty cool utility method there.

    To answer an earlier question, I DID see a strongly typed Include operator for EF4 but can’t find it unfortunately. As I mentioned, I don’t do a lot of Includes in my work with EF so I didn’t keep track of that one.

    Mike Brown – August 27th, 2010
  11. Nice one D! I love L2S and I so hope it does not just fizzle out… So easy and so fast. Have you been moved permanently? Or will you still contribute to L2S?

    Quooston – October 26th, 2010
  12. You forget to write a ‘.AsEnumerable()’ in the code sample after “How it works”. The code should read:

    var query = db.Posts.Where(p => p.Published)
    .Select(p => new Tuple(p, p.Blog)).AsEnumerable()
    .Select(t => t.Item1);

    Cheers

    StevenFebruary 21st, 2011

Respond to this