Refactoring to LINQ Part 1: Death to Foreach

Language integrated query (LINQ) was first introduced to the .NET framework 3 years ago in 2007. It has taken time for developers to become familiar with it and fully grasp its usefulness. When it was released LINQ was promoted as a way to integrate database and XML querying into .NET languages. Although LINQ makes heavy use of SQL terminology (SELECT/WHERE/AGGREGATE), the LINQ implementation is in fact heavily based on functional programming concepts. In the last year, I have become more familiar with functional programming ideas by exploring and using languages such as JavaScript, Ruby and F#. As a result, I now regularly use the LINQ extension methods to write C# in a more functional programming style.

The advantages and disadvantages of a functional programming style

What advantages does a functional programming style bring?

  • shorter more concise code
  • less temporary variables and less state
  • less errors
  • easier to understand, more declarative code

There are some disadvantages however:

  • debugging is harder
  • performance gotchas due to deferred execution/lazy evaluation
  • higher learning curve

At first functional style code can provoke a WTF? response. To realise the advantages of a functional techniques in a team environment all the developers on a team need to become comfortable with these functional approaches. Hopefully, this should become less of an issue over time as functional constructs are now built into most languages. Additionally, tools such as ReSharper now support automatic refactoring to LINQ expressions methods.

Thinking functional

LINQ introduced a more high level, declarative way to work with collections. Instead of manipulating individual items in a collection, operations are performed on the entire collection. Mark Needham refers to this as the transformational mindset in his talk on Functional C#. Using the transformational mindset we think in terms of the transformations that can be applied to a collection to produce a result. Several transformations can be chained together to replace a complex foreach loop made up of many imperative statements. Typically, a collection is passed through a pipeline of one or more LINQ extension methods such as Where, Select and Aggregate to produce the desired result.

I have now become deeply suspicious of foreach loops and regard them as a code smell. In most cases the foreach loop can be refactored into a shorter more meaningful functional version. Often this will reduce the amount of code by 70% to 80%. A key skill to do perform these refactorings is to recognize which type of looping construct map to which functional operation. Below are several examples of foreach statements refactored to LINQ extension methods.

Functional refactorings

In these examples we will manipulate a list of whiskies. Each whiskey has a name, an age, a price and a country of origin:

public class Whiskey
{
	public string Name { get; set; }
	public int Age { get; set; }
	public decimal Price { get; set; }
	public string Country { get; set; }
}

Whiskey ardbeg = new Whiskey { Name = "Ardbeg 1998", Age = 12, Price = 49.95m, Country = "Scotland" };
Whiskey glenmorangie = new Whiskey { Name = "Glenmorangie", Age = 10, Price = 28.95m, Country = "Scotland" };
Whiskey talisker = new Whiskey { Name = "Talisker", Age = 18, Price = 57.95m, Country = "Scotland" };
Whiskey cragganmore = new Whiskey { Name = "Cragganmore", Age = 12, Price = 30.95m, Country = "Scotland" };
Whiskey redbreast = new Whiskey { Name = "Redbreast", Age = 12, Price = 27.95m, Country = "Ireland" };
Whiskey greenspot = new Whiskey { Name = "Green spot", Age = 8, Price = 44.48m, Country = "Ireland" };

List whiskies = new List { ardbeg, glenmorangie, talisker, cragganmore, redbreast, greenspot };

Creating one list of objects from another

In our first example we create a list of whiskey names from the list of whiskies.

var whiskeyNames = new List ();
foreach (var whiskey in whiskies) {
    whiskeyNames.Add (whiskey.Name);
}

Console.WriteLine("Whiskey names: {0}", String.Join(", ", whiskeyNames.ToArray()));

This outputs:
Whiskey names: Ardbeg 1998, Glenmorangie, Talisker, Cragganmore, Redbreast, Green spot

Converting a list of objects to another list of objects is called mapping or projection in functional programming. The LINQ extension method to do this is Select.

var whiskeyNames = whiskies.Select(x=> x.Name).ToList();

Filtering a list

In our second example we want to get a list of “good value” whiskies that cost 30 pounds or less. To do this a new list is created, the list of whiskies is iterated over and any whiskey under 30 pounds is added to the new list:

var goodValueWhiskies = new List ();
foreach (var whiskey in whiskies) {
  if (whiskey.Price < 30m) {
    goodValueWhiskies.Add (whiskey);
  }
}
Console.WriteLine("Found {0} good value whiskeys", goodValueWhiskeies.Count);

This outputs:
Found 2 good value whiskeys

This operation is called filtering in functional programming. In LINQ we use the Where extension method to filter a list. A predicate is passed to Where to determine which items to keep:

	var goodValueWhiskies = whiskies.Where(x=> x.Price <= 30m).ToList();

Counting the number of items matching a condition in a list

In this example we count the number of 12-year-old whiskies in our list by incrementing a counter variable inside an if statement.

var howMany12YearOldWhiskies = 0;
foreach (var whiskey in whiskies) {
  if (whiskey.Age == 12) {
    howMany12YearOldWhiskies++;
  }
}

Console.WriteLine ("How many 12-year-old whiskies do we have {0}", howMany12YearOldWhiskies);

This outputs:
How many 12-year-old whiskies do we have 3

In this case two of the extension methods, Where and Count, can be combined to produce the desired result:

  var howMany12YearOldWhiskies = whiskies.Where(x=> x.Age == 12).Count();

This can be shortened further as Count optionally takes a predicate to filter the items:

  var howMany12YearOldWhiskies = whiskies.Count(x=> x.Age == 12);

Checking if some or all of the items in a list match a criteria

In this example we check if a condition is true for all items in a list. This involves creating a boolean variable initialized to true, looping through each item in the list and setting the variable to false if any item violates the condition.

var allAreScottish = true;
foreach (var whiskey in whiskies) {
  if (whiskey.Country != "Scotland") {
     allAreScottish = false;
     break;
  }
}

This outputs:
All are scottish? False
This can be replace with the All extension method, which conveys more clearly the intention of what the code is trying to do.

   var allAreScottish = whiskies.All(x=> x.Country == "Scotland");

All is one of three quantifier extension methods available in LINQ the other two being Any and Contains. For example, it is easy to find if there are any Irish whiskies in our list by using the Any extension method:

  var isThereIrishWhiskey = whiskies.Any(x=> x.Country == "Ireland");

Splitting up complex foreach statements

Often you will be faced with quite complex foreach statements that may initially seem hard to refactor. For example how would you refactor the following code to use LINQ extension methods?

var scottishWhiskiesCount = 0;
var scottishWhiskeyTotal = 0m;
foreach (var whiskey in whiskies) {
    if (whiskey.Country == "Scotland") {
        scottishWhiskiesCount++;
        scottishWhiskeyTotal += whiskey.Price;
    }
}

This code is doing three things: determining if a whiskey is Scottish, counting those whiskies and then totalling the price of those whiskies.
The first step is to split the single foreach statement into multiple foreach statements. This is usually a good idea anyway, as doing multiple things in a single loop can be difficult to understand.

List scottishWhiskies = new List ();
foreach (var whiskey in whiskies) {
    if (whiskey.Country == "Scotland") {
        scottishWhiskies.Add (whiskey);
    }
}
foreach (var whiskey in scottishWhiskies) {
    scottishWhiskiesCount++;
}

foreach (var whiskey in scottishWhiskies) {
    scottishWhiskeyTotal += whiskey.Price;
}

The code is then easily refactored to use LINQ extension methods:

var scottishWhiskies = whiskies.Where(x=> x.Country == "Scotland");
scottishWhiskiesCount = scottishWhiskies.Count();
scottishWhiskeyTotal = scottishWhiskies.Sum(x=> x.Price);

In my next post I will discuss some of the more advanced but seldom used LINQ extension methods: Aggregate and SelectMany.

Advertisements

4 comments

  1. Paul Harrington · February 25, 2010

    Great post Jason, I’ve just started working with some of these concepts as well at the moment & this is an excellent overview of the basic functionality – I’ll be sure to pass it on to my work colleagues.

  2. Daniel Scott · September 11, 2010

    I’m a little confused by the statement “The first step is to split the single foreach statement into multiple foreach statements. This is usually a good idea anyway, as doing multiple things in a single loop can be difficult to understand.”

    Yes it does make the code easier to understand, however surely this is a significant performance penalty? (especially considering there are various objects, such as iterators being instantiated each time a foreach loop is being generated).

    Does this then mean that LINQ is another layer of abstraction that will encourage writing poor performing code?

    • jasonneylon · September 12, 2010

      Hi Daniel,

      95% of the time it probably won’t matter. Object instantiation time is very quick compared to the real performance killers – file/network/database operations. Generally I monitor/measure the performance of the code after its written rather than worry too much about the performance up front.

      Yes LINQ is a higher abstraction level so if you don’t it correctly and unaware of what is going on under the hood it can bite you. However, I think the productivity it brings makes it well worth using.

      I watch out for two performance gotchas:

      • When using LINQ against a database (via LINQ to Nhibernate, LINQ to SQL etc.) I check the queries actually being executed to make sure I’m not running excessive or slow queries.
      • Watch out for deferred execution issues described here: LINQ Performance Pitfall – Deferred Execution.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s