Refactoring to LINQ Part 2: Aggregate is Great

In part one, death to the foreach, I discussed some of the quick and easy refactorings you can use to convert foreach loops to LINQ. In this post I look at the Aggregate extension method that LINQ provides. This method is used less frequently then the other LINQ extension methods but is just as powerful. Often aggregate examples revolve around primitive types such as int and strings. Below I will show some more interesting object-based examples.

LINQ provides several aggregation extension methods: Aggregate, Average, Count, LongCount, Max, Min and Sum. The aggregation methods all take a list of objects and reduces that list to a single result. The Aggregate method is the most flexible and generic of the aggregation extension methods. Conceptually it helps to think of Aggregate as a generic building block and the others aggregation methods (Average, Max, etc.) as special cases of Aggregate. In functional programming languages, such as F#, Aggregate is usually named fold (or inject in Ruby). The SQL like name Aggregate leads developers to write off Aggregate as purely for numeric aggregation purposes. In fact, Aggregate can be used whenever we want to build a single object from a group of objects.

So how does Aggregate work?

Looking at the Aggregate method signature is a pretty scary experience:

public static TResult Aggregate(
    this IEnumerable<TSource> source,
    TAccumulate seed,
    Func func)

The Aggregate methods takes a list of source objects, a seed value and an accumulator function which it processes as follows:

  • The accumulator function is called for each item in the list and returns a value
  • The first time the accumulator function is called the seed and the first item in the list are passed to it
  • The accumulator function is called again with the result of the first accumulator function call and the second item in the list as its parameters
  • This continues until all items in the list are processed
  • The result of the last call to the accumulator function is returned as the result of the entire Aggregate method

This can take some getting your head around. Here is an example:

var whiskeyNames = new [] {"Ardbeg 1998", "Glenmorangie","Talisker", "Cragganmore"};

var listOfWhiskies = whiskeyNames.Aggregate("Whiskies: ", (accumulated, next) =>
{
	Console.Out.WriteLine("(Adding [{0}] to the list [{1}])", next, accumulated);
	return accumulated + " " + next;
});

Console.Out.WriteLine(listOfWhiskies);

This outputs:

(Adding [Ardbeg 1998] to the list [Whiskies: ])
(Adding [Glenmorangie] to the list [Whiskies:  Ardbeg 1998])
(Adding [Talisker] to the list [Whiskies:  Ardbeg 1998 Glenmorangie])
(Adding [Cragganmore] to the list [Whiskies:  Ardbeg 1998 Glenmorangie Talisker])
Whiskies:  Ardbeg 1998 Glenmorangie Talisker Cragganmore

In this example a string listing several brands of whiskies is being built. The seed passed to the Aggregate method is the string “Whiskey: “. A lambda that adds the whiskey names together (and outputs to the console) is passed as the accumulator function. This accumulator function is called for each item in the list of whiskies. On the first execution of the accumulator function the seed, “Whiskey: “, is passed as the accumulated parameter and the first item in the list “Ardbeg 1998” is passed as the next parameter. On the second execution of the accumulator function the return value from the accumulator function, “Whiskies: Ardbeg 1998”, is passed as the accumulated parameter and the second item in the list, “Glenmoragnie” is passed as the next parameter.

The seed parameter is optional. If it is omitted the first two items in the list will be passed to the function, as this example demonstrates:

listOfWhiskies = whiskeyNames.Aggregate((accumulated, next) =>
{
	Console.Out.WriteLine("(Adding [{0}] to the list [{1}])", next, accumulated);
	return accumulated + " " + next;
});

Console.Out.WriteLine(listOfWhiskies);

This outputs:

(Adding [Glenmorangie] to the list [Ardbeg 1998])
(Adding [Talisker] to the list [Ardbeg 1998 Glenmorangie])
(Adding [Cragganmore] to the list [Ardbeg 1998 Glenmorangie Talisker])
Ardbeg 1998 Glenmorangie Talisker Cragganmore

Finding the best item in a list

A common coding pattern using foreach is finding the “best” item in a list based on some criteria. This example reuses the whiskey class from the previous post:

public class Whiskey
{
	public Whiskey()
	{
		Ingredients = new List<Whiskey>();
	}

	public string Name { get; set; }
	public int Age { get; set; }
	public decimal Price { get; set; }
	public string Country { get; set; }

	public List<Whiskey> Ingredients {get; set;}

	public string IngredientsAsString
	{
		get
		{
			return String.Join(",", Ingredients.Select(x=> x.Name).ToArray());
		}
	}
}

Whiskey ardbeg = new Whiskey { Name = "Ardbeg 1998", Age = 12, Price = 49.95m, Country = "Scotland" };
Whiskey glenmorangie = new Whiskey { Name = "Glenmorangie", Age = 10, Price = 28.95m, Country = "Scotland" };
Whiskey talisker = new Whiskey { Name = "Talisker", Age = 18, Price = 57.95m, Country = "Scotland" };
Whiskey cragganmore = new Whiskey { Name = "Cragganmore", Age = 12, Price = 30.95m, Country = "Scotland" };
Whiskey redbreast = new Whiskey { Name = "Redbreast", Age = 12, Price = 27.95m, Country = "Ireland" };
Whiskey greenspot = new Whiskey { Name = "Green spot", Age = 8, Price = 44.48m, Country = "Ireland" };

List whiskies = new List { ardbeg, glenmorangie, talisker, cragganmore, redbreast, greenspot };

In the code snippet below we are searching for the most expensive whiskey in a list. This is done by storing the first whiskey of the list in a mostExpensiveWhiskey variable. Then the price of each subsequent whiskey object is compared to the mostExpensiveWhiskey object and if its price is higher that whiskey is stored in the mostExpensiveWhiskey variable.

Whiskey mostExpensiveWhiskey = null;
foreach (var challenger in whiskies)
{
    if (mostExpensiveWhiskey == null)
    {
        mostExpensiveWhiskey = challenger;
    }
    if (challenger.Price > mostExpensiveWhiskey.Price)
    {
        mostExpensiveWhiskey = challenger;
    }
}
Console.WriteLine("Most expensive is {0}", mostExpensiveWhiskey.Name);

This outputs:

Most expensive is Talisker

This can be refactored to some remarkably concise code by using the Aggregate method with a lambda expression and the ternary operator:

Whiskey mostExpensiveWhiskey = whiskies.Aggregate((champion, challenger) => challenger.Price > champion.Price ? challenger : champion);
Console.WriteLine("Most expensive is {0}", mostExpensiveWhiskey.Name);

Creating a new ‘aggregated’ object

In this example (which also uses the whiskey domain!) we will create a new object from several other objects. The majority of whiskies sold are blended whiskies, which are made by mixing together several single malt whiskies. In the code below we loop through a list of whiskies. Where the whiskey is Scottish we add it to our new blendedWhiskey object and update the price of the blendedWhiskey object accordingly.

var blendedWhiskey = new Whiskey() { Name="Tesco value whiskey", Age=3, Country="Scotland" };
foreach (var whiskey in whiskies)
{
    if (whiskey.Country != "Scotland")
    {
        continue;
    }

    blendedWhiskey.Ingredients.Add(whiskey);
    blendedWhiskey.Price = blendedWhiskey.Price + (whiskey.Price / 10);
};

Console.WriteLine("Blended Whiskey Name: {0}", blendedWhiskey.Name);
Console.WriteLine("Blended Whiskey Price: {0}", blendedWhiskey.Price);
Console.WriteLine("Blended Whiskey Ingredients: {0}", blendedWhiskey.IngredientsAsString);

This outputs:

Blended Whiskey Name: Tesco value whiskey
Blended Whiskey Price: 16.780
Blended Whiskey Ingredients: Ardbeg 1998,Glenmorangie,Talisker,Cragganmore

This can be refactored in a few steps. First we use the Where extension method to filter the list to only Scottish whiskies. Then we pass the object we are building as the seed to Aggregate method. The accumulator function then adds the single malt whiskies to the blended whiskey and updates the price.

var blendedWhiskey = whiskies.Where(x=> x.Country == "Scotland")
.Aggregate(new Whiskey() { Name="Tesco value whiskey", Age=3, Country="Scotland" },
	(newWhiskey, nextWhiskey) =>
	{
		newWhiskey.Ingredients.Add(nextWhiskey);
		newWhiskey.Price += (nextWhiskey.Price / 10);
		return newWhiskey;
	});

Console.WriteLine("Blended Whiskey Name: {0}", blendedWhiskey.Name);
Console.WriteLine("Blended Whiskey Price: {0}", blendedWhiskey.Price);
Console.WriteLine("Blended Whiskey Ingredients: {0}", blendedWhiskey.IngredientsAsString);

Unfortunately, the refactored code in this example is not as concise as the previous refactored examples. When the accumulator function consists of multiple statements it can be a bad idea to refactor the code to use Aggregate, as the code becomes less clear.

Summing up

I hope this post has demystified the Aggregate method for you. Learning to use Aggregate can help you to better understand functional programming ideas and for several frequent programming tasks it provides a compelling improvement to the traditional foreach statement. However, for some scenarios using foreach still makes more sense.

Advertisements

6 comments

  1. Paul Harrington · March 16, 2010

    Great post Jason, I’m loving reading through your experiments with Linq.

    Have you played around with refactoring your resulting Linq code to make it more readable? I’ve been through these concepts before with some of the developers in my team (who are unfamiliar with functional programming paradigms) and this always seems to be the sticking point.

    Great stuff though, lookign forward to the next one!

    • jasonneylon · March 17, 2010

      Thanks Paul,

      Readability is the issue with using Aggregate in C#. When you have multiple statements the resulting code is quite clunky due to c# syntax. In functional languages like f# it reads much better.

      If you use good OO techniques (Value objects and immutability) you could improve the code above by refactoring this code from

      var blendedWhiskey = whiskies.Where(x=> x.Country == "Scotland")
      .Aggregate(new Whiskey() { Name="Tesco value whiskey", Age=3, Country="Scotland" },
      (newWhiskey, nextWhiskey) =>
      {
      newWhiskey.Ingredients.Add(nextWhiskey);
      newWhiskey.Price += (nextWhiskey.Price / 10);
      return newWhiskey;
      });

      by using a Blend method:

      var blendedWhiskey = whiskies.Where(x=> x.Country == "Scotland")
      .Aggregate(new Whiskey() { Name="Tesco value whiskey", Age=3, Country="Scotland" },
      (newWhiskey, nextWhiskey) => newWhiskey.Blend(nextWhiskey); );

      Where blend creates a new object from the existing object and does the addition/”Blending”

      Whiskey Blend(otherWhiskey)
      {
      var blendedWhiskey = this.Clone();
      blendedWhiskey.Ingredients.Add(otherWhiskey);
      blendedWhiskey.Price += (otherWhiskey.Price / 10);
      return blendedWhiskey;
      }

  2. Thomas Morrison · December 8, 2015

    public static TResult Aggregate(
    this IEnumerable source,
    TAccumulate seed,
    Func func)

    How did you know that Func takes to arguments, the accumulator and the next item. Surely you can’t tell this from the signature alone?

    Nice and simple post 🙂

    Thanks

  3. Thomas Morrison · December 8, 2015

    public static TResult Aggregate(
    this IEnumerable source,
    TAccumulate seed,
    Func func)

  4. Nicholas Layton · September 23, 2016

    Aggregate is not great. It does not treat each item the same. Just a quick example, if you want to add up the squares of every integer in a collection, aggregate would not work for you. myints.Aggregate((a,b) => a*b*b). Every value in the collection would have a “turn” being the the “b” parameter in this lambda, except for the very 1st one. So if your collection was {1,10,5} you’d get the same answer as if it was {1, 5, 10}, but you’d get a different answer with {10, 5, 1}.

  5. Nicholas Layton · September 23, 2016

    The 1st bullet point under “So how does Aggregate work?” is incorrect. You wrote “The accumulator function is called for each item in the list and returns a value”, but the accumulator function is *not* called for the 1st item in the list. Try it with a multi-line lambda where you output a message inside of the accumulator function. If you have n items in the list, you’ll only see your output message n-1 times.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s