Optimizing Sum, Count, Min, Max and Average with LINQ

LINQ is a great tool for C# programmers letting you use familiar syntax with a variety of back-end systems without having to learn another language or paradigm for many query operations.

Ensuring that the queries still perform well can be a bit of a chore and one set that fails quite badly are the aggregate operations when you want more than one.

Multiple sequential queries (bad)

var count = db.Invoices.Count();
var total = db.Invoices.Sum(i => i.Paid);
var average = db.Invoices.Average(i => i.Paid);

Will issue three separate requests. There is nothing a LINQ provider can do to optimize that pattern as they are three discrete statements.

Background

If we wanted these values by country we could do this in LINQ:

var a = db.Invoices.GroupBy(i => i.Country)
      .Select(g => new { Country = g.Key,
           Count = g.Count(),
           Total = g.Sum(i => i.Paid),
           Average = g.Average(i => i.Paid) });

Which gets us everything in a single statement broken down by country. In SQL this is:

SELECT Country, Count(*), Sum(Paid), Average(Paid)
    FROM Invoices GROUP BY Country

Many data sources including SQL are happy to provide aggregate values without a group by so how do we generate that from LINQ?

In the absence of a Group method that doesn’t take a property we need to fake it and because of the way many LINQ providers optimize out parts of the tree we can:

Single optimized query (good)

Replacing the property in a GroupBy with a constant value gives us an optimized single query:

var a = db.Invoices.GroupBy(i => 1)
    .Select(g => new { Count = g.Count(),
               Total = g.Sum(i => i.Paid),
               Average = g.Average(i => i.Paid) });

Here are the providers I’ve tried:

  • LINQ to Objects (Works although constant is likely evaluated)
  • LINQ to SQL (Works although passes 1 parameter to SQL)
  • Entity Framework 6 (Works although query is a little obscure)
  • ElasticLINQ (Works and optimizes out totally)

Count+Where optimizations

If we are performing counts with a predicate or against a where we can also optimize these.

var high = db.Invoices.Count(i => i.Paid >= 1000);
var low = db.Invoices.Where(i => i.Paid < 1000).Count();
var sum = db.Invoices.Sum(i => i.Paid);

Then we can express this as:

var a = db.Invoices.GroupBy(g => 1)
    .Select(g => new { High = g.Count(i => i.Paid >= 1000),
                   Low = g.Count(i => i.Paid < 1000),
                   Sum = g.Sum(i => i.Paid) });

[)amien

2 responses

  1. Avatar for Victor Kornov

    I'm wondering if you can do

    var invoices = db.Invoices;
    var result = invoices.Select(x => new { count = invoices.Count(), sum = ... });
    
    Victor Kornov 4 September 2014
  2. Avatar for Damien Guard

    LINQ to SQL does support that syntax, not sure about other providers but could be worth trying.

    Damien Guard 5 September 2014