Include for LINQ to SQL (and maybe other providers)

It’s quite common that when you issue a query you’re going to want to join some additional tables.

In LINQ this can be a big issue as associations are properties and it’s easy to end up issuing a query every time you hit one. This is referred to as the SELECT N+1 problem and tools like EF Profiler can help you find them.

An example

Consider the following section of C# code that displays a list of blog posts and also wants the author name.

foreach(Post post in db.Posts)
  Console.WriteLine("{0} {1}", post.Title, post.Author.Name);

This code looks innocent enough and will issue a query like “SELECT * FROM [Posts]” but iterating over the posts causes the lazy-loading of the Author property to trigger and each one may well issue a query similar to “SELECT * FROM [Authors] WHERE [AuthorID] = 1”.

In the case of LINQ to SQL it’s not always an extra load as it will check the posts AuthorID foreign key in its internal identity map (cache) to see if it’s already in-memory before issuing a query to the database.

LINQ to SQL LoadWith

Most object-relational mappers have a solution for this – Entity Framework’s ObjectQuery has an Include operator (that alas takes a string), and NHibernate has a fetch mechanism. LINQ to SQL has LoadWith which is used like this:

var db = new MyDataContext();
var dlo = new DataLoadOptions();
dlo.LoadWith<Posts>(p => p.Blog);
db.LoadOptions = dlo;

This is a one-time operation for the lifetime of this instance of the data context which can be inflexible and LoadWith has at least one big bug with inheritance issuing multiple joins.

A flexible alternative

This got me thinking and I came up with a useful extension method to provide Include-like facilities on-demand in LINQ to SQL (and potentially other LINQ providers depending on what they support) in .NET 4.0.

public static IEnumerable<T> Include<T, TInclude>(this IQueryable<T> query, Expression<Func<T, TInclude>> sidecar) {
  var elementParameter = sidecar.Parameters.Single();
  var tupleType = typeof(Tuple<T, TInclude>);
  var sidecarSelector =  Expression.Lambda<Func<T, Tuple<T, TInclude>>>(
    Expression.New(tupleType.GetConstructor(new[] { typeof(T), typeof(TInclude) }),
       new Expression[] { elementParameter, sidecar.Body  },
       tupleType.GetProperty("Item1"), tupleType.GetProperty("Item2")), elementParameter);
  return query.Select(sidecarSelector).AsEnumerable().Select(t => t.Item1);
}

To use simply place at the end of your query and specify the property you wish to eager-load, e.g.

var oneInclude = db.Posts.Where(p => p.Published).Include(p => p.Blog));
var multipleIncludes = db.Posts.Where(p => p.Published).Include(p => new { p.Blog, p.Template, p.Blog.Author }));

This technique only works for to-one relationships not to-many. It is also quite untested so evaluate it properly before using it.

How it works

How it works is actually very simple – it projects into a Tuple that contains the original item and all additional loaded elements and then just returns the query back the original item. It is a dynamic version of:

var query = db.Posts.Where(p => p.Published)
  .Select(p => new Tuple<Post, Blog>(p, p.Blog))
  .AsEnumerable()
  .Select(t => t.Item1);

This is why it has to return IEnumerable and belong at the end (and the use of Tuple is why it is .NET 4.0 only although that should be easy enough to change). Not all LINQ providers will necessarily register the elements with their identity map to prevent SELECT N+1 on lazy-loading but LINQ to SQL does :)

[)amien

12 responses

  1. Avatar for Matt

    Good info!

    1. Do you know if this will work for EF4? The string-based includes are a pet-peeve.
    2. "Not all LINQ providers will necessarily register the elements with their identity map to prevent SELECT N+1 on lazy-loading" - Do you mean that just because the object is materialised, that doesn't mean that object will make it into the identity map? Is this another way of saying that the provider wont necessarily do relationship fix-up for you? I'm not sure I understand this point.
    Matt 21 May 2010
  2. Avatar for Damien Guard

    I haven't tried it with EF so not sure if it will work there.

    LINQ to SQL when it attempts to follow an association property first looks up the foreign key in it's own internal dictionary (identity map). It also registers all projected elements it received from the database into the map. Both of these are required for this technique to work and there's no guarantee that every LINQ provider does this so you'll have to try it and see whether it is still SELECT N+1 or not.

    Damien Guard 21 May 2010
  3. Avatar for gunteman

    Great idea! Will definitely try it out. I agree with the identity map/N+1 comment though.

    gunteman 24 May 2010
  4. Avatar for Cecil

    The Microsoft connect bug was from 2008. This bug still reminds in .NET 4.0 ??

    Cecil 16 June 2010
  5. Avatar for Damien Guard

    Yes it still exists in .NET 4.0.

    Damien Guard 16 June 2010
  6. Avatar for matra

    You said: “Not all LINQ providers"

    Hmm I've tought, that Linq2SQL does not support the provider model (only EF does). Has anything changed in .NET 4.0 and we can use custom providers (without nasty TransparentProxy/remoting tricks)?

    Matra

    matra 20 July 2010
  7. Avatar for Damien Guard

    This trick does not rely upon any kind of provider mechanism - it relies on whether the LINQ provider optimizes out what is happening here and how it handles identity resolution.

    LINQ to SQL's provider model is internal and will always be. The bulk of what constitutes LINQ to SQL is the SQL Provider - if you take that away you are left with very little (identity map, lazy-loading semantics).

    Damien Guard 20 July 2010
  8. Avatar for matra

    Thanks, I understand. With "Linq to SQL provider" you are reffering to the classes inside System.Data.Linq.SqlClient namespace (which can not be replaced).

    Matra

    matra 20 July 2010
  9. Avatar for Mike Brown

    Just out of curiosity, why not use a projection to get the properties you need. That is, if you KNOW you'll need Post.Author.Name when you retrieve post why not just say:

    foreach(var projection in db.Posts.Select(p=>new {post.Title, AuthorName=post.Author.Name})) Console.WriteLine("{0} {1}", projection.Title, projection.AuthorName);

    I believe that the projection will be resolved by the L2S provider (and it also works with EF) and be transformed to the appropriate SQL. That being said, you've got a pretty cool utility method there.

    To answer an earlier question, I DID see a strongly typed Include operator for EF4 but can't find it unfortunately. As I mentioned, I don't do a lot of Includes in my work with EF so I didn't keep track of that one.

    Mike Brown 27 August 2010
  10. Avatar for Quooston

    Nice one D! I love L2S and I so hope it does not just fizzle out... So easy and so fast. Have you been moved permanently? Or will you still contribute to L2S?

    Quooston 26 October 2010
  11. Avatar for Steven

    You forget to write a '.AsEnumerable()' in the code sample after "How it works". The code should read:

    var query = db.Posts.Where(p => p.Published) .Select(p => new Tuple(p, p.Blog)).AsEnumerable() .Select(t => t.Item1);

    Cheers

    Steven 21 February 2011
  12. Avatar for Robert Taylor

    I have a pet hate of 'magic strings' and found a way of usinga lambda rather than a string for EF's IncludeProperties and converting it to a string in a base class. This also handles the 1:M case; so p => new {p.Person , p.Person.Creditcards.Single().Bank} becomes "Person, Person.CreditCards.Bank" where CreditCards is a collection of CreditCard. The 'Single()' is simply a way of getting access the a CreditCard. (I also handle 'First() and the' ...OrDefault()' options for completeness.) All really I do is process the Lambda expression as a string. Seems to work well; avoids typos and can be refactored.

    Robert Taylor 27 May 2015