A little while ago I dipped my toe into pattern matching and recursion with both F# and Elixir. I’d like to take a deeper dive into pattern matching in F# with a two-part series.

  1. F# Pattern Matching – Part 1 – Introduction (this post)
  2. F# Pattern Matching – Part 2 – Active Patterns

Pattern Matching Introduction

Pete Linforth

Before we can dive into the wonderful world of active patterns, we first must understand how basic pattern matching works. In this post we will explore the concept of pattern matching and work through several examples beginning with very basic patterns and moving through increasingly more complicated and powerful examples.

Let’s begin our journey by first defining what pattern matching actually means.

Definition

So what is pattern matching? According to MSDN, the definition is as follows:

Patterns are rules for transforming input data. They are used throughout the F# language to compare data with a logical structure or structures, decompose data into constituent parts, or extract information from data in various ways.

At a first pass, this may sound complicated. And while the use of pattern matching unlocks wonderful possibilities that would make MacGuyver drool, the basic idea is rather simple. Pattern matching is a conditional operation that compares some “thing” (usually data or a data structure) to a pattern. A successful match then transforms that thing in some way. The transformation can be as simple as binding the thing as-is to something else.

Let Bindings

OpenClipart-Vectors

Let’s start with the simplest example.

In the programming language Elixir, it is often said that the equal sign is not an assignment operator, but rather a pattern matching operator. In F#, the same could be said.

This can be thought of as: If the pattern _ (wildcard) matches the shape of the data on the right hand side of the equal sign, take that data and do something with it. In this case, the _ means we don’t want to do anything with it, so this let binding is rather useless. However, a wildcard matches anything, so this is a successful match.

How about we use an identifier this time.

This can be thought of as: If the pattern n matches the shape of the data on the right hand side of the equal sign, take that data and bind it to n. The identifier n is really just a wildcard with a name, so that it can be used later. I repeat, this is a wildcard, so it matches anything and is bound to whatever is on the right hand side of the equals sign (in this case, the integer 42).

We can do the same thing for binding another data structure, such as a list.

If the pattern n matches the shape of the data on the right hand side of the equal sign, take that data and bind it to n. Since the identifier n is a wildcard, it matches the shape of the data and is bound to the list [1; 2; 3; 4; 5; 6; 7; 8; 9; 10].

Now let’s get a little more complicated.

In this case, the shape of the pattern on the left (two named identifiers that are both wildcards) matches the shape of the data on the right, so 1 is bound to x and 2 is bound to y. To hammer this point home, we could have done this as well.

Both x and y are wild cards, so the pattern will match. The integer 1 will be bound to x and the list containing integers 1 to 10 will be bound to y.

And let’s do one last example with simple let bindings, using destructured assignment of a three element tuple (also called a triple).

Here we are first binding point3D to the tuple (1, 2, -5) (point3D is a wildcard with a name remember). In the next line, the value for point3D is evaluated and compared against the shape of x, y, z (three wild cards). The shape matches, since three values can be extracted from the evaluated point3D triple. 1 is bound to x, 2 is bound to y and -5 is bound to z.

Match Expressions

OpenClipart-Vectors

Now that I’ve beaten simple let bindings to death from the point of view of pattern matching, let’s move on to a useful F# feature called a match expression.

At first glance, a match expression appears to be like a switch/case statement (I’ve often heard it referred to as a switch statement on steroids). However, it is much more than that.

This is the match expression as defined by MSDN:

A match expression is a branching structure that lets you combine a set of patterns together. A comparison against each pattern is made (from top to bottom or left to right) until a match is found, at which point the associated code is returned as the result of the expression. As an added bonus, the compiler will warn you if your match expression has the possibility to fall through without a match.

The key is the set of patterns bit. This is very different from a simple switch statement. This will become apparent as we dig into some examples. But first, let’s start simple.

The end result of this code is essentially doing the same thing as we did in the last example. We’ve created a function called getXYZ that does the following:

  • Accepts a three element tuple as a parameter (binding it to the identifier p3D).
  • Takes the triple and passes it into the match expression.
  • The match expression has a single pattern that expects the shape of the data to match a triple.
  • The triple then uses destructured assignment, binding the values 1, 2 and -5 to the local identifiers x, y and z.
  • The bound identifiers are then passed to the right side of the -> operator and used to construct a new triple.
  • The new triple is then returned as the result of the entire getXYZ function expression.

Yes, this is a silly function that decomposes a three element tuple and then reconstructs it.

In the line below the function we are binding x, y and z to the result of getXYZ with (1, 2 -5) passed in as the parameter. The result is that x, y and z are bound to 1, 2 and -5 respectively.

Let’s try something a little more interesting.

The function examinePoint expects a two element tuple, which it then passes to a match expression. Within the match expression I have included nine specific patterns that examine the tuple using when clauses (Note: I put parenthesis around the variable patterns (x, y) just to show that you can do it that way too. I actually find this clearer.). Each pattern extracts the two elements from the tuple and binds them to the x and y identifiers. Then the x and y are compared against the when clause. If a match is found, the expression on the right hand side of the -> for the given pattern is returned as the function result. If the pattern does not match, it tries the next pattern, continuing until a match is found.

While we have covered all possible scenarios to make a match, the compiler is still not certain that a integer tuple won’t fall through the bottom, so it warns us that we have an incomplete match expression. In this case we’d be fine with just the nine patterns, but to be safe I have included a catch-all at the bottom. There are actually quite a few different ways that could have appeased the compiler.

  • (x, y) -> [expression] will match on any two element tuple, allowing me to use x and y on the right hand side of the -> operator.
  • (x, _) -> [expression] will match on any two element tuple, allowing me to use x on the right hand side of the -> operator.
  • (_, y) -> [expression] will match on any two element tuple, allowing me to use y on the right hand side of the -> operator.
  • (_, _) -> [expression] will match on any two element tuple, ignoring both elements (they are not bound, so can’t be used in a when clause or in the right hand side of the -> operator).
  • _ -> [expression] will match anything. This is a generic catch all in a match expression. There are times when this can have drawbacks (I will discuss soon).

There are a couple more things I should mention. First, the x and y identifiers could be called anything (I could have used (y, x) or (foo, bar)). They are simply bound the the first and second element of the tuple. With that in mind we could have just as easily done something like the following:

In this case, the patterns match the entire tuple (binding it to the identifier p) and then we’re using the functions fst and snd to manually extract the two elements for comparison purposes.

Let’s do one more example with a modified version of this function before moving on. I want to show that you can mix and match these approaches (though for readability, I’d pick one approach and stick with it) and that you can match on only the data you find relevant for your purposes.

In this version of the function, we only have three patterns:

  1. Bind p to the two element tuple, extract the first element and then check if it is greater than zero. Since we have bound p to the two element tuple, we have access to it on the right hand side of the -> operator if we want to use it (I have not used it in this example).
  2. Bind x to the first element in the tuple and ignore the second element. Then check if x is less than zero. Again, x could have been used in the expression on the right hand side of the -> if we wanted to.
  3. Bind x and y to the first and second element in the tuple respectively. This will match all two element tuples, so works nicely as a catch all. In this case, we’re also using x and y on the right hand side of the expression.

Before we move onto another example, it is worth mentioning a few things. As with all conditionals in F#, every branch of a match expression must evaluate to the same type. In other words, you can’t have one branch return a string and have another that returns an integer. Also, our examples are using a tuple that must contain two integers (based on what we are doing in the match expression). The first pattern is specifically comparing the first element against the integer zero. This tells the type inference engine that the first element must be an integer. In the second pattern, we are also comparing the first element against the integer zero. Had we tried to treat the first element as something else, we would get a compile error because the first pattern has already told us we must have an integer. In the third pattern, our sprintf expression is telling us that both the first and second element (which we have bound to the identifiers x and y) must be an integer because of the formatted string containing a %i as the placeholder for each element. Until that point, we could have had the second element be anything else.

Ok, let’s move on shall we. Match expressions can be done against many other data structures. Let’s try matching against a record type.

There are a couple new things going on here in the greet function, so let’s break the code down by each pattern. For this function we are passing in a new record type called Person that has fields FirstName, LastName and Age.

  1. Records can be pattern matched against individual fields using the syntax { Field1 = value; Field2 = value; FieldN = value; } In this case we are matching any Person record that has a FirstName of Jason and a LastName of Down. It doesn’t matter that we haven’t matched against the Age field. As stated in the examples with the tuples, we only need to match on as much information as we find relevant for a given pattern.
  2. We are matching on anyone with the Firstname of Jason (without the LastName of Down, since we can only get to pattern 2 after pattern 1 fails to match).
  3. This is similar to the previous pattern where we are matching on FirstName (this time with Bob). However, we are also using the as keyword to bind the entire record to the identifier p. This allows us to make use of the entire record on the right hand side of the -> operator through the p identifier.
  4. We are matching the field Age with the value 18.
  5. This is another way to bind an identifier (we are again using p) to any Person record. Then we are using a when clause to compare the record’s Age field against the value 18. Any Person record with Age less than 18 will match. We also have access to the Person record via the identifier p on the right hand side of the -> operator.
  6. A catch all that will match any Person record type. The actual Person record is then ignored (not bound to anything because of the _ identifier), but the code on the right hand side of the -> operator is returned as the value of the match expression (and then returned as the expression of the greet function).

Hopefully that all made sense. Let’s continue with yet another data structure, discriminated unions. I’ll begin with a simple disriminated union (enumeration style with no associated data).

Here we have a straightforward discriminated union called AdventurerAction that represents four actions an adventurer may take. Then we have a function called interactWithDragon that takes an AdventurerAction and feeds it into a pattern expression.

The patterns should be self-explanatory at this point. What is nice, is that the compiler will warn us if we miss one of the union cases (particularly useful if we add a new action, such as Open). Earlier I had mentioned that using a _ catchall was not always ideal. This is one of those cases. If we had decided to put a catchall in the match expression, then we would never be warned by the compiler that we have an incomplete match expression when we add a new AdventurerAction. Likely if you add a new union case, you want to handle it explicitly. Having the compiler tell you all the places you need to do this is a good thing.

Ok, let’s try a slightly more complicated discriminated union now. This will be using the classic example with shapes (a favourite in object oriented programming examples for class hierarchies and/or interfaces).

Now we are starting to see a combination of some of the concepts we have discussed so far. First, let’s discuss the Shape discriminated union.

A Shape can be created using one of three constructors. Each constructor takes different associated data. A Circle takes a float representing its radius. A Rectangle takes a two integer tuple, representing its height and width. And lastly, a Point takes a two integer tuple representing its coordinates.

Next we have the area function that takes any Shape as a parameter. The Shape is passed to a match expression, which has a set of patterns to match on each union case. Additionally, we are extracting the associated data for each type of Shape and using it to calculate the area.

A few things to note: Each pattern must return the same type. Since the area of a Circle is a float, we must cast the area of a Rectangle to a float as well (or we could cast the area of a Circle to an integer). For the Point, since it doesn’t have an area, we can throw an exception. This will make the compiler happy. Of course, that makes developers less happy, so a better approach would be to use an Option type. We’ll look more at that in the next blog post.

Speaking of which, this post is getting rather lengthy, so we’ll finish off with one last type. Let’s add a collection type in for good measure.

Here we have function getEquipment that contains an inner function buildList. This function builds a string as it recurses through a list of strings (representing an adventurer’s equipment).

The relevance to this post is, of course, the match expression. We have two patterns that we are trying to match on for each recursive call into the internal buildList function.

  1. An empty list pattern, denoted by []. This pattern successfully matches when we are left with an empty list. It is the base case that terminates the recursion and allows the function to evaluate the final result and return it. In our case, we then return the string bound to listSoFar, which contains a list of all items formatted how we want it.
  2. The cons pattern, denoted by x::xs. The x represents the head of the list and the xs represents a list of the remaining elements (or the tail). As noted before, the identifier names are not important. Often you will see head::tail used. When this pattern successfully matches (because we have at least one element in our list, followed by a tail, which can be an empty list), we grab the head and append it to our accumulator string listSoFar. Then we recursively call buildList with the remaining items in the list and the accumulator string.

So that about does it for this post. There are more patterns available that I did not cover (such as the or and and patterns). I will likely sneak these in part 2 while we explore active patterns.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.