Optimizing Sum, Count, Min, Max and Average with LINQ
LINQ is a great tool for C# programmers letting you use familiar syntax with a variety of back-end systems without having to learn another language or paradigm for many query operations.
Ensuring that the queries still perform well can be a bit of a chore and one set that fails quite badly are the aggregate operations when you want more than one.
Multiple sequential queries (bad)
var count = db.Invoices.Count();
var total = db.Invoices.Sum(i => i.Paid);
var average = db.Invoices.Average(i => i.Paid);
Will issue three separate requests. There is nothing a LINQ provider can do to optimize that pattern as they are three discrete statements.
Background
If we wanted these values by country we could do this in LINQ:
var a = db.Invoices.GroupBy(i => i.Country)
.Select(g => new { Country = g.Key,
Count = g.Count(),
Total = g.Sum(i => i.Paid),
Average = g.Average(i => i.Paid) });
Which gets us everything in a single statement broken down by country. In SQL this is:
SELECT Country, Count(*), Sum(Paid), Average(Paid)
FROM Invoices GROUP BY Country
Many data sources including SQL are happy to provide aggregate values without a group by so how do we generate that from LINQ?
In the absence of a Group method that doesn’t take a property we need to fake it and because of the way many LINQ providers optimize out parts of the tree we can:
Single optimized query (good)
Replacing the property in a GroupBy with a constant value gives us an optimized single query:
var a = db.Invoices.GroupBy(i => 1)
.Select(g => new { Count = g.Count(),
Total = g.Sum(i => i.Paid),
Average = g.Average(i => i.Paid) });
Here are the providers I’ve tried:
- LINQ to Objects (Works although constant is likely evaluated)
- LINQ to SQL (Works although passes 1 parameter to SQL)
- Entity Framework 6 (Works although query is a little obscure)
- ElasticLINQ (Works and optimizes out totally)
Count+Where optimizations
If we are performing counts with a predicate or against a where we can also optimize these.
var high = db.Invoices.Count(i => i.Paid >= 1000);
var low = db.Invoices.Where(i => i.Paid < 1000).Count();
var sum = db.Invoices.Sum(i => i.Paid);
Then we can express this as:
var a = db.Invoices.GroupBy(g => 1)
.Select(g => new { High = g.Count(i => i.Paid >= 1000),
Low = g.Count(i => i.Paid < 1000),
Sum = g.Sum(i => i.Paid) });
[)amien
2 responses