Posts tagged with linq

Creating OR expressions in LINQ

As everybody who has read my blog before knows, I love LINQ and miss it when coding in other languages, so it’s nice when I get a chance to use it again. When I come back to it with fresh eyes, I notice some things aren’t as easy as they should be - and this time is no exception.

Background

People often need to build up LINQ expressions at runtime based on filters or criteria a user has selected. Adding criteria is incredibly easy, as you can chain operations together on the IQueryable interface, e.g.

if (customerActive)
  query = query.Where(c => c.IsActive);
if (customerCountry != null)
  query = query.Where(c => c.Country == customerCountry);

This is great if you want to AND things together, which is often the case in building up filters. Other operations, like Contains, are useful in allowing many options against a specific field.

Problem

But what about offering a choice involving either of two properties? One way to write it would be:

if (customerActive) {
  if (customerEnterprise) {
    query = query.Where(c => c.IsActive || c.IsEnterprise);
  } else {
    query = query.Where(c => c.IsActive);
  } else {
    if (customerEnterprise) {
      query = query.Where(c => c.IsEnterprise);
    }
  }
}

This code is already hard to read and is going to increase in complexity for each new option.

Solution

With some helper code, we can combine two expression predicates into a single OrElse for use in a Where condition.

class Or {
  public static Expression<Func<T, bool>> Combine<T>(Expression<Func<T, bool>> left, Expression<Func<T, bool>> right) {
    if (left == null && right == null) throw new ArgumentException("At least one argument must not be null");
    if (left == null) return right;
    if (right == null) return left;
    
    var parameter = Expression.Parameter(typeof(T), "p");
    var combined = new ParameterReplacer(parameter).Visit(Expression.OrElse(left.Body, right.Body));
    return Expression.Lambda<Func<T, bool>>(combined, parameter);
  }

  class ParameterReplacer : ExpressionVisitor {
    readonly ParameterExpression parameter;

    internal ParameterReplacer(ParameterExpression parameter) {
      this.parameter = parameter;
    }

    protected override Expression VisitParameter(ParameterExpression node) {
      return parameter;
    }
  }
}

Now we can:

Expression<Func<Customer, bool>> criteria = null;
if (customerActive)
    criteria = Or.Combine(criteria, c => c.IsActive == customerActive);
if (customerEnterprise)
    criteria = Or.Combine(criteria, c => c.IsEnterprise == customerEnterprise);

How it works

There are two parts to this working. The first combines the two separate predicates into a single OR lambda, taken care of by the Expression.OrElse method.

That alone can’t quite do what we need because the two expressions each have a separate parameter variable. The visitor replaces both of these with a new parameter we’ll pass into the new Lambda expression we’re creating that combines both conditions.

Caveats

The code, as it stands, creates an unbalanced tree. Ideally, the tree should be balanced to limit the depth LINQ providers must traverse to translate the query. I doubt you’ll run into the dreaded Stack Overflow (the message, not the site), but if you do, that’s why.

An overload that takes an Enumerable of Expression<Func<T, bool>> and produces a balanced tree is left as an exercise to the reader :p

[)amien

Optimizing Sum, Count, Min, Max and Average with LINQ

LINQ is a great tool for C# programmers letting you use familiar syntax with a variety of back-end systems without having to learn another language or paradigm for many query operations.

Ensuring that the queries still perform well can be a bit of a chore and one set that fails quite badly are the aggregate operations when you want more than one.

Multiple sequential queries (bad)

var count = db.Invoices.Count();
var total = db.Invoices.Sum(i => i.Paid);
var average = db.Invoices.Average(i => i.Paid);

Will issue three separate requests. There is nothing a LINQ provider can do to optimize that pattern as they are three discrete statements.

Background

If we wanted these values by country we could do this in LINQ:

var a = db.Invoices.GroupBy(i => i.Country)
      .Select(g => new { Country = g.Key,
           Count = g.Count(),
           Total = g.Sum(i => i.Paid),
           Average = g.Average(i => i.Paid) });

Which gets us everything in a single statement broken down by country. In SQL this is:

SELECT Country, Count(*), Sum(Paid), Average(Paid)
    FROM Invoices GROUP BY Country

Many data sources including SQL are happy to provide aggregate values without a group by so how do we generate that from LINQ?

In the absence of a Group method that doesn’t take a property we need to fake it and because of the way many LINQ providers optimize out parts of the tree we can:

Single optimized query (good)

Replacing the property in a GroupBy with a constant value gives us an optimized single query:

var a = db.Invoices.GroupBy(i => 1)
    .Select(g => new { Count = g.Count(),
               Total = g.Sum(i => i.Paid),
               Average = g.Average(i => i.Paid) });

Here are the providers I’ve tried:

  • LINQ to Objects (Works although constant is likely evaluated)
  • LINQ to SQL (Works although passes 1 parameter to SQL)
  • Entity Framework 6 (Works although query is a little obscure)
  • ElasticLINQ (Works and optimizes out totally)

Count+Where optimizations

If we are performing counts with a predicate or against a where we can also optimize these.

var high = db.Invoices.Count(i => i.Paid >= 1000);
var low = db.Invoices.Where(i => i.Paid < 1000).Count();
var sum = db.Invoices.Sum(i => i.Paid);

Then we can express this as:

var a = db.Invoices.GroupBy(g => 1)
    .Select(g => new { High = g.Count(i => i.Paid >= 1000),
                   Low = g.Count(i => i.Paid < 1000),
                   Sum = g.Sum(i => i.Paid) });

[)amien

Include for LINQ to SQL (and maybe other providers)

It’s quite common that when you issue a query you’re going to want to join some additional tables.

In LINQ this can be a big issue as associations are properties and it’s easy to end up issuing a query every time you hit one. This is referred to as the SELECT N+1 problem and tools like EF Profiler can help you find them.

An example

Consider the following section of C# code that displays a list of blog posts and also wants the author name.

foreach(Post post in db.Posts)
  Console.WriteLine("{0} {1}", post.Title, post.Author.Name);

This code looks innocent enough and will issue a query like “SELECT * FROM [Posts]” but iterating over the posts causes the lazy-loading of the Author property to trigger and each one may well issue a query similar to “SELECT * FROM [Authors] WHERE [AuthorID] = 1”.

In the case of LINQ to SQL it’s not always an extra load as it will check the posts AuthorID foreign key in its internal identity map (cache) to see if it’s already in-memory before issuing a query to the database.

LINQ to SQL LoadWith

Most object-relational mappers have a solution for this – Entity Framework’s ObjectQuery has an Include operator (that alas takes a string), and NHibernate has a fetch mechanism. LINQ to SQL has LoadWith which is used like this:

var db = new MyDataContext();
var dlo = new DataLoadOptions();
dlo.LoadWith<Posts>(p => p.Blog);
db.LoadOptions = dlo;

This is a one-time operation for the lifetime of this instance of the data context which can be inflexible and LoadWith has at least one big bug with inheritance issuing multiple joins.

A flexible alternative

This got me thinking and I came up with a useful extension method to provide Include-like facilities on-demand in LINQ to SQL (and potentially other LINQ providers depending on what they support) in .NET 4.0.

public static IEnumerable<T> Include<T, TInclude>(this IQueryable<T> query, Expression<Func<T, TInclude>> sidecar) {
  var elementParameter = sidecar.Parameters.Single();
  var tupleType = typeof(Tuple<T, TInclude>);
  var sidecarSelector =  Expression.Lambda<Func<T, Tuple<T, TInclude>>>(
    Expression.New(tupleType.GetConstructor(new[] { typeof(T), typeof(TInclude) }),
       new Expression[] { elementParameter, sidecar.Body  },
       tupleType.GetProperty("Item1"), tupleType.GetProperty("Item2")), elementParameter);
  return query.Select(sidecarSelector).AsEnumerable().Select(t => t.Item1);
}

To use simply place at the end of your query and specify the property you wish to eager-load, e.g.

var oneInclude = db.Posts.Where(p => p.Published).Include(p => p.Blog));
var multipleIncludes = db.Posts.Where(p => p.Published).Include(p => new { p.Blog, p.Template, p.Blog.Author }));

This technique only works for to-one relationships not to-many. It is also quite untested so evaluate it properly before using it.

How it works

How it works is actually very simple – it projects into a Tuple that contains the original item and all additional loaded elements and then just returns the query back the original item. It is a dynamic version of:

var query = db.Posts.Where(p => p.Published)
  .Select(p => new Tuple<Post, Blog>(p, p.Blog))
  .AsEnumerable()
  .Select(t => t.Item1);

This is why it has to return IEnumerable and belong at the end (and the use of Tuple is why it is .NET 4.0 only although that should be easy enough to change). Not all LINQ providers will necessarily register the elements with their identity map to prevent SELECT N+1 on lazy-loading but LINQ to SQL does :)

[)amien

LINQ to SQL tips and tricks #3

Another set of useful and lesser-known LINQ to SQL techniques.

Lazy loading with stored procedures

LINQ to SQL supports stored procedures for retrieving entities, insert, update and delete operations, as you know. But you can also use them to perform lazy-loading of navigation properties.

Let’s show an example of a bi-directional relationship between a Post and a Comment. We have two stored procedures shown below, bringing them into the DBML by dragging them from Server Explorer into the LINQ to SQL designer surface, setting the return type property for each to the expected entity (Post and Comment, respectively).

CREATE PROCEDURE LoadPost (@PostID int) AS SELECT * FROM Posts WHERE ID = @PostID
CREATE PROCEDURE LoadComments(@PostID int) AS SELECT * FROM Comments WHERE Parent_Post_ID = @PostID

This generates two method stubs named LoadPost and LoadComments that we can use to programmatically retrieve entities:

var post = dataContext.LoadPost(1).First();
Console.WriteLine("{0}", post.Title);

Now to replace LINQ to SQL’s lazy-loading query generation. We add methods to the data context subclass with a specific signature.

partial class DataClasses1DataContext {
  protected IEnumerable<Comment> LoadComments(Post post) {
    return this.LoadComments(post.ID);
  }

  protected Post LoadParentPost(Comment comment) {
    return this.LoadPost(comment.Post_ID).First();
  }
}

To get the signature of the method names right:

  1. Visibility can be anything (protected or private recommended)
  2. Return type must be the type of the other side of the association (wrapped in IEnumerable when that side can be many)
  3. Method name must start with the word “Load”
  4. Method name must then continue with the name of the navigation property you want to intercept
  5. Parameter type must be the type that has the named navigation property (step 4)

Storing and retrieving binary files

LINQ to SQL supports the SQL Server’s varbinary type but storing something practical like a file in there isn’t clearly documented. Map your varbinary(max) column from your table into your entity, which exposes the column as using the System.Data.Linq.Binary type (effectively a wrapper for a byte array but better change tracking).

File to database

To store a file in the database, read those bytes in and assign them to the property (Binary knows how to create itself from a byte array automatically). e.g.

var readPath = @"c:\test.jpg";
var storedFile = new StoredFile();
storedFile.Binary = File.ReadAllBytes(readPath);
storedFile.FileName = Path.GetFileName(readPath);
data.StoredFiles.InsertOnSubmit(storedFile);

I recommend storing the file name as well as the binary contents for two reasons. Firstly, writing the file back to disk or streaming it to a browser requires you know the file type (eg. .jpg or image/jpeg), and, secondly, nobody likes downloading a file called ‘download’ or ‘1’ :)

Database to file

Writing the file back to disk is just as easy although, you have to use the ToArray() method of System.Data.Linq.Binary to turn it back into a byte array.

var writePath = @"c:\temp";
var storedFile = data.StoredFiles.First();
File.WriteAllBytes(Path.Combine(writePath, storedFile.FileName), storedFile.Binary.ToArray());
Always ensure when writing to the file system based on data that filenames are sanitized! You don’t want users overwriting critical files on your system.

Multiple databases with a single context

Contrary to popular belief, you can, in fact, access entities from multiple databases with a single data context - providing they live on the same server. This is unsupported, but I’ve used it on my own projects without issue :)

The first part is the tricky bit which involves getting the definition of your entity into your DBML. You have two options here:

Create a temporary view

If you have the rights, you can temporarily create views in your primary database for each table in your non-primary database.

CREATE VIEW MyOtherTable AS SELECT * FROM MyOtherDatabase.dbo.MyOtherTable

Once views are created, add them to your DBML by dragging them from Server Explorer into the LINQ to SQL designer surface and delete the views you created from the database.

Create a temporary DBML

If you can’t or don’t want to create temporary views, then add a second (temporary) LINQ to SQL classes file (DBML) to your project. Use Server Explorer to find your secondary database and drag all the tables you want to access to the LINQ to SQL designer surface.

Now save & close open files and use the right-mouse-button context menu to Open With… and choose XML Editor on your original DBML and the new temporary one. Head to the Window menu and select New Vertical Tab Group to make the next step easier.

Looking through the DBML you see each entity has a <Table> block inside the . Select all the Table tags and their children (but not Database or Connection) and copy/paste them into your existing DBML file. Then close the files and check all looks well in the designer again.

If it does, delete the temporary DBML file you created. If not, go back and check the DBML file for duplicate names, mismatched XML etc.

Finally, the easy bit

Open the designer and for each table that comes from the other database select it and change the Source property in the Properties window from dbo.MyOtherTable to MyOtherDatabase.dbo.MyOtherTable.

Hit play and run!

Check out part 1 of LINQ to SQL tips

[)amien