Skip to content

Elasticsearch articles

Optimizing Sum, Count, Min, Max and Average with LINQ  

LINQ is a great tool for C# programmers letting you use familiar syntax with a variety of backend systems without having to learn another language or paradigm for many query operations.

Ensuring that the queries still perform well can be a bit of a chore and one set that fails quite badly are the aggregate operations when you want more than one.

Multiple sequential queries (bad)

var count = db.Invoices.Count();
var total = db.Invoices.Sum(i => i.Paid);
var average = db.Invoices.Average(i => i.Paid);

Will issue three separate requests. There is nothing a LINQ provider can do to optimize that pattern as they are three discrete statements.

Background

If we wanted these values by country we could do this in LINQ:

var a = db.Invoices.GroupBy(i => i.Country)
          .Select(g => new { Country = g.Key,
                             Count = g.Count(),
                             Total = g.Sum(i => i.Paid),
                             Average = g.Average(i => i.Paid) });

Which gets us everything in a single statement broken down by country. In SQL this is:

SELECT Country, Count(*), Sum(Paid), Average(Paid)
    FROM Invoices GROUP BY Country

Many data sources including SQL are happy to provide aggregate values without a group by so how do we generate that from LINQ?

In the absence of a Group method that doesn’t take a property we need to fake it and because of the way many LINQ providers optimize out parts of the tree we can:

Single optimized query (good)

Replacing the property in a GroupBy with a constant value gives us an optimized single query:

var a = db.Invoices.GroupBy(i => 1)
          .Select(g => new { Count = g.Count(),
                             Total = g.Sum(i => i.Paid),
                             Average = g.Average(i => i.Paid) });

Here are the providers I’ve tried:

  • LINQ to Objects (Works although constant is likely evaluated)
  • LINQ to SQL (Works although passes 1 parameter to SQL)
  • Entity Framework 6 (Works although query is a little obscure)
  • Elasticsearch (Works and optimizes out totally)

Count+Where optimizations

If we are performing counts with a predicate or against a where we can also optimize these.

var high = db.Invoices.Count(i => i.Paid >= 1000);
var low = db.Invoices.Where(i => i.Paid < 1000).Count();
var sum = db.Invoices.Sum(i => i.Paid);

Then we can express this as:

var a = db.Invoices.GroupBy(g => 1)
          .Select(g => new { High = g.Count(i => i.Paid >= 1000),
                             Low = g.Count(i => i.Paid < 1000),
                             Sum = g.Sum(i => i.Paid) });

[)amien