How I got into functional programming

About a year and a half ago I decided that I wanted to change course and go back to being an engineer. I knew that I wanted to code, I just didn’t know where to start.

In this post I will go over:

  • how I picked functional programming (FP) as something I want to study
  • what FP is and why I like it
  • go over a handful of examples to give you an idea on how you could use it in your day to day job

In closing, I will try so summarize what I learned about personal development as I went through this process.

How I got started

Right after I decided that I want to give up on management and go back to engineering, I had no idea what I should do next… All I could think of where the things I didn’t like about my job instead of the things I liked :).

My team was (and still is) heavily involved with distributed systems (mainly batch and real-time analytics built on top of the Hadoop stack), but simply joining those projects seemed very intimidating at that moment.

I really wanted to have fun and score a few quick wins, in order to validate if I’m still on the right track…

I didn’t have any experience with enterprise Java and the system and tech stack are both incredibly complex; on top of that I am very motivated by concrete results and enjoy working on more high-level problems — none of which can be found in building and optimizing distributed infrastructure.

Lastly, I really wanted to have fun and score a few quick wins, in order to validate if I’m still on the right track, career wise. That’s when 2 opportunities presented themselves:

  1. While we had a very powerful system that could process millions of events per second in real time, we were sorely lacking a great UI to display and interact with these metrics
    • most of my colleagues were passionate about the “big data” challenges and distributed algorithms, so I decided that I could really help and get familiar with the data flow in the project by building the web app that allows users to consume our real-time analytics
  2. The Scala programming language was starting to get some traction in the industry and a few members of my core team signed up for the Coursera course on functional programming with Scala, taught by Martin Odersky. I also joined the online course, hoping that I could benefit from it in the following ways:
    • Scala is both a pure OOP and a pure functional language; coupled with Odersky’s great teaching style, I would not only learn about functional programming but I would also strengthen my understanding of advanced OOP topics like generics and design patterns
    • FP is always at the core of parallel data processing so I knew this would really help me get a head start with the MapReduce programming model
    • I used JavaScript in the past, but I always felt like I didn’t fully grasp concepts like closures, anonymous functions, higher level functions – I knew my JS skills would benefit a lot, especially since most of the work for the new project had to do with querying and manipulating big sets of data on the client

Why functional programming?

In this section I will try to give you an overview of what FP is, why I like it and where it can be successfully applied.

At a very high level, FP is a programming paradigm (typically contrasted with imperative programming) that has the following characteristics:

  1. Focuses on data flow (what to do), not instructions (how to do it)
  2. Functions are 1st class citizens in the language: there is a Function type, you can pass around function values and higher order functions take other functions as parameters (allowing for very easy definition of pluggable, composable algorithms)
  3. There is a focus on minimizing mutability and side effects. In a pure functional language, all values are immutable and functions don’t have any side effects, as they return new values instead of modifying state. This makes it very easy for algorithms to be parallelized as there is no shared state to keep in sync via synchronization mechanisms.

Instead of continuing with the many benefits and things that I love about this programming model, let me illustrate some of the above with some code samples in Scala and JavaScript.

Show me!

Let’s say we have a fictive log file from a service that contains messages with the following structure:

unixTime,errorLevel,details
...
1402168686,INFO,process started
1402168886,ERROR,disk full
...

Let’s say that we want to pretty print all the ERRORs that happened in the last 2 hours, or that we want to obtain a count of warnings per minute in the last 30 minutes.

Here’s how a straight-forward iterative algorithm could look like (in JS):

// some fake log lines and the output array
var logLines = [
      "1402155886,INFO,process started",
      "1402156886,ERROR,network unavailable",
      "1402157886,INFO,network restored",
      "1402158886,WARN,low disk",
      "1402159886,ERROR,disk full"
    ],
    recentErrors = [];

// iterate through all the log entries
for (var i = 0; i < logLines.length; i++) {

  // extract log fields and parse date/time
  var message = logLines[i].split(",");

  // time limit omitted for brevity
  if (message[1] == "ERROR") {
    recentErrors.push({
      severity: message[1],
      message: message[2],
      time: new Date(message[0] * 1000)
    });
  }
}

// finally, output the result one by one
for (var i = 0; i < recentErrors.length; i++) {
  console.log(recentErrors[i]);
}

Using the FP model, the same algorithm can be decomposed into a series of smaller, simpler steps:

  • from the original list of log lines, split the fields into a JS hash and parse the unix time into a Date object (map operation)
  • from this array, keep only ERROR messages (filter operation)
  • for each entry in the resulting collection, print it to the console (forEach or each operation).

Let’s re-do the JavaScript implementation, using the popular underscore.js library:

// transform the initial log into structured data
var messageObjects = _.map(logLines,
    function(line) {
      var message = line.split(",");
        return {
        severity: message[1],
        message: message[2],
        time: new Date(message[0] * 1000)
        };
    });
// keep only ERROR messages
var errors = _.filter(messageObjects,
    function(message) {
    return message.severity == "ERROR";
    });
// "act" on all remaining messages
_.each(errors, function(error) {
  console.log(error);
});

And here’s the same code in Scala, using the built-in FP APIs:

import java.util.Date

// simple message structure
case class Message(severity: String,
                   message: String,
                   time: Date)

val logLines = List(
  "1402155886,INFO,process started",
  "1402156886,ERROR,network unavailable",
  "1402157886,INFO,network restored",
  "1402158886,WARN,low disk",
  "1402159886,ERROR,disk full"
)

// transform log lines to structured messages
val messages = logLines.map(line => {
  val fields = line.split(",")
  new Message(fields(1), fields(2),
    new Date(fields(0).toLong * 1000)
  )
})
// keep only errors
val errors = messages.filter(
  m => m.severity == "ERROR"
)
// print them out to the console
errors.foreach(println)

The Scala code is more readable thanks to the arrow notation for anonymous functions and the built-in support for FP on the native collections.

More advanced processing

Ok, so what’s the big deal, I hear you say? Well, let’s say that as soon as you put out your implementation, you get new requirements:

  • can you also look for ERROR in the message details?
  • can you limit the search to messages from the last hour?
  • can you generate different arrays for each message type instead of filtering (SQL-ish GROUP BY instead of WHERE)?
  • can you create a bar chart that shows the counts for each message type?

a new toolbox of primitive operations that you can enhance with your own simple algorithms

Turns out most of the requirements above can be implemented with the same primitives: map, filter, groupBy and a few user specified functions that are easy to write and plug in the overall data pipeline.
Let’s wrap-un the code samples with a simple Scala snippet that implements the last requirement:

// count messages of each type
val countBySeverity = messages.
  groupBy(m => m.severity).
  mapValues(mm => mm.size)

// => Map(ERROR -> 2, INFO -> 2, WARN -> 1)

Hopefully you agree by now that the the FP approach starts to pay off. As I mentioned in the beginning, you don’t have to think about data structures and processing steps, instead you think about the operations you want to apply to the data.

That’s the beauty of this model — it gives you a new toolbox of primitive operations that you can enhance with your own simple algorithms: focus on one thing, keep the complexity low, easy to test in isolation and composed or reused or in other places.

META section

I promised in the beginning of the article that I would spend some time reflecting on the process I went through as I was re-evaluating my career path. You may also remember a time in your career when you were no longer happy with how things went, you just didn’t know what to change, or what to try next.

try to become really good at something (new); as you master the new domain it’s going to bring you satisfaction and you’ll start to love what you do once again

The typical strategy that I’ve seen work for most people is to go back to your passions and your strengths: what are you good at? what parts of your job do you genuinely enjoy? Can you find something new that builds on your strengths and you think will enjoy doing?
If the answer is yes, then you’re all set!

Let’s call this strategy becoming great at your job by building on your strengths and working on things you love doing.

Unfortunately, that’s not always easy to pull off:

  • you may not have the liberty to choose your own projects
  • you may not know what you like or what you’re good at — this is something that happens a lot with people early in their career

For these situations, I’d like to suggest an alternative strategy, by flipping around the one above: try to become really good at something (new); as you master the new domain it’s going to bring you satisfaction and you’ll start to love what you do once again.

I know it sounds counter-intuitive, so here’s my reasoning around it:

  • you have to start somewhere — change needs a catalyst, it may as well be a random one; you can figure out the rest of the plan later
  • you will gain a new appreciation of your job — by going more in depth than usual you may find a specific area that you like
  • you’ll go back into the habit of learning new things — this alone is worth the experience — you’ll remember that you can learn pretty much anything and you’re not stuck in the current job, role or company
  • even if it doesn’t turn out to be exactly what you want, you have gained a new skill that you can use later

In my particular situation, I mixed both strategies and it turned out great in the end:

  • I started with something that I knew I could finish, in order to boost my confidence and keep me on track
  • I also picked a new field that I struggled with before and I invested a lot of time learning it, to the point that I can apply it to the day-to-day job and I absolutely love doing so.