Sequence averages in Scala

I’ve been learning Scala and decided to put together a C# to Scala cheat sheet. All is going pretty well but then I got stuck on the equivalent of Average.

Enumerable.Average in .NET calculates a mean average from your sequence by summing up all the values and counting them in a single pass then returning the sum divided by the count in a floating point format (or decimal).

The problem

Given that Scala has nothing built-in there are more than a few suggestions online that boil down to:

val average = seq.sum / seq.length

This has a few problems:

  1. Visiting a sequence twice can be inefficient
  2. Sum can overflow as it is the same type as the sequence
  3. Applied to an integer without casting it returns an integer average

A solution

Scala provides a useful high-order function called foldLeft. Its job is to take an initial state and a function then keep applying the function with each value to the state. So one more efficient solution to the problem is:

val average = seq.foldLeft((0.0, 1)) ((acc, i) => ((acc._1 + (i - acc._1) / acc._2), acc._2 + 1))._1

How does this work?

What we do here is calculate an average as we go, adding the new weighted average each time.

It achieves this by setting up a tuple to contain our initial state with (0.0, 1). This specifies our starting average of 0.0 and our starting position of 1.

The next part specifies the function that takes that state as acc (for accumulator) and the next value in the sequence as i and calculates our rolling average for each value and increases the position as it goes along.

Finally at the end of our call we specify ._1 which tells the compiler we want the first value from the tuple – the average – as we no longer care about the position.

If you wanted to make this function more reusable you could do this:

def average(s: Seq[Int]): Double = s.foldLeft((0.0, 1)) ((acc, i) => ((acc._1 + (i - acc._1) / acc._2), acc._2 + 1))._1

Be aware you might need multiple overloads for each numeric sequence type you want to be able to average given the lack of a common numeric trait that allows for the subtraction and division.

Precision and rounding

There is some slight variance in results between this approach and the total / count due to rounding precision. If you wanted to preserve that you could always add and then divide at the end still in a single pass much like .NET does but with Scala’s foldLeft rather than a foreach.

def average(s: Seq[Int]): Double = { val t = s.foldLeft((0.0, 0)) ((acc, i) => (acc._1 + i, acc._2 + 1)); t._1 / t._2 }


3 responses

  1. Avatar for Harold

    Welcome to Scala. You can use a case statement to deconstruct the tuple and avoid the ugly _1 and _2.

    seq.foldLeft((0.0, 1)) { case ((avg, idx), next) => (avg + (next - avg)/idx, idx + 1) }._1
    Harold December 12, 2014
  2. Avatar for Rik Hemsley

    For fun, I knocked up a similar implementation in C#

    public static class EnumerableExtensions {
        private struct ValueAndCount {
            public decimal Value;
            public ulong Count;
            public decimal Mean { get { return Value/Count; } }
        public static decimal Mean(this IEnumerable<int> sequence) {
            return sequence
                .Select(n => new ValueAndCount {Value = n, Count = 1})
                .Aggregate ((agg, valueAndCount) =>
                new ValueAndCount {Value = agg.Value + valueAndCount.Value, Count = agg.Count + 1} )
    Rik Hemsley December 12, 2014
  3. Avatar for Damien Guard

    @Rick That's close to the second option although I think you want that initial state declared in the Aggregate function rather than on the select? For fun here's the C# version of the first one that calculates a running average:

    var average = s.Aggregate(Tuple.Create(0.0, 1), (acc, i) => Tuple.Create((acc.Item1 + (i - acc.Item1) / acc.Item2), acc.Item2 + 1)).Item1;
    Damien Guard December 12, 2014