Cyclomatic Complexity

As programmers, we spend our days battling the enemy. The enemy is tough, devious and sneaky. The enemy entrenches itself into your own lines (pun not intended) only to be found when it's already too late. This enemy is complexity. It increases bugs, makes it harder to reason about your code and to change stuff later on.

But fear not, brave soldier! You have powerful weapons to fight this enemy! Design patterns, architectures, programming techniques, all of them try to reduce complexity, or at least push it around.

However, all great battles require a strategy. And in order to create a strategy, you need to know thy enemy. What you need is a way to find and measure the complexity of your code. Of course, we can all intuitively feel roughly how complex a piece of code is. You know that an if statement is simpler to understand than a bunch of GOTO commands.

But it's hard to base decisions on a feeling. Science (or war) doesn't work on rough estimates. In order to really start fighting our enemy, we need an objective way to measure how complex something is. Otherwise how do we know if we're even making progress?

Thankfully, other fields aside from computer science also deal with complexity. Graph theory, for instance, is a field in which people study, well, graphs. A graph is a mathematical structure that represents relationships between things. They're similar to a tree in that they're made up of connected nodes, but graphs are more general. For instance, a node in a graph can be connected to any other node (not just its children), and can loop back in on itself.

If this sounds a bit too general: it's because it is! The beauty of graphs is that they can represent a bunch of different things like communication networks, website maps, atomic structures, social groups, electrical circuits, road networks and even neuro-degenerative diseases. As you can see, graph theory is pretty darn useful!

But most importantly for us, graphs can also represent computer programs. For instance, a simple if-then-else structure can be represented as a graph with a branching path. Similarly, a while loop can be shown as a node that either loops back on itself, or goes to a different node.

By representing your programs as a graph, you can borrow concepts from graph theory to study your program. That is exactly what Thomas J. McCabe, Sr. did in 1976 in his paper where he explained the idea of cyclomatic complexity.

See, graphs have certain properties which can be studied. One of them is called the cyclomatic number. The cyclomatic number counts the number of cycles inside the graph. You can easily get this number by noting down the number of branchings inside the graph (when you have two arrows going outside a single node), and adding one to it. For instance, the following graph has 2 branches, so its cyclomatic number is 3.

This number is pretty cool because it can roughly tell you how hard it is to follow a graph. A graph with a small cyclomatic number is much simpler to reason about than a really convoluted graph. Since computer programs can be represented as a graph, the cyclomatic number of the program's graph gives us a good candidate for how complex that piece of software is.

To get the cyclomatic complexity of a program, simply represent it as a graph and count the cycles. Here are some examples of how complex different programs are:

// CC = 1
print(str)

Simply printing a string has a complexity of one. This is the smallest amount of complexity a program doing something can have.

// CC = 2
if (!str.isEmpty)
  print(str)

By adding a check before our print statement, we have added another path trough our program. This is why the complexity has increased by one.

// CC = 3
if (a < 3)
  performFunctionA()
else
  performFunctionB()

if (b < 3)
  performFunctionC()
else
  performFunctionD()

You might have noticed by now an easy way to calculate cyclomatic complexity of a program: It's the number of branches, plus one. If you're too lazy to draw graphs (like I am), this is an easy rule of thumb to follow if you're wondering what's the cyclomatic complexity of your function.

Okay, we established what cyclomatic complexity is and why it should be reduced. How can we go about doing that?

We usually measure cyclomatic complexity within a subset of our program, like a single function. For instance, the following function has a complexity of 4.

def myFunction(x):
  if (x < 10)
    if (x % 2 == 0)
      print("even")
    else
      print("odd")
  else
    if (x % 3 == 0)
      print("divisble by 3")
    else
      print("not divisible by 3")

It's pretty hard to follow what's going on here. To fix this, we can take the lines inside the then and else blocks and put them into separate functions. If we do that, our original function will be much easier to follow.

In fact, cyclomatic complexity is a direct function of the number of branches in your program. With each if, for, or case, you add to the cyclomatic complexity of the program. By removing branchings from a function, you can make it less complex.

def myFunction(x):
  if (x < 10)
    isEven(x)
  else
    isDivBy3(x)

The function is visually much simpler and has a lower cognitive load. This is because we have reduced the cyclomatic complexity of the function from 4 to 2. Of course, the complexity of the whole code is still the same, but to our brains it's much easier to follow three simple functions than one complex function.

I like to think that our brain has a limited RAM memory. If the code you are currently reading doesn't fit inside your RAM, you will start to lose data, and you won't be able to make sense of the function. If, however, the function is split into three smaller functions, our brain can look at them one at a time, and each of the little pieces fits inside our limited RAM. We can understand the code piece by piece and assemble a coherent picture from those pieces.

Of course, you don't need to know the exact cyclomatic complexity of your functions. It's good enough to have an intuitive feel of what increases the complexity. Each branch, each condition, each new combination you add to your code increases its complexity. If you add too much, soon your RAM will start overflowing and you will have a hard time working in your own code.

One thing you should also realise, though, is that cyclomatic complexity is just one of the ways we measure code complexity. It's also debatable whether it's a good measure or not. Since cyclomatic complexity is heavily correlated with the line count, some say line count is a good enough measure for complexity. People also often use maximum nesting depth as a measure of complexity.

No matter which measure you use, it's important to think about complexity. Keep that in mind next time you're adding an if statement. So grab your helmets and shine your boots. It's time for battle.

References:

A Complexity Measure - Thomas J. McCabe (McCabe's original paper) http://www.literateprogramming.com/mccabe.pdf

Cyclomatic complexity - Wikipedia
https://en.wikipedia.org/wiki/Cyclomatic_complexity

McCabe's Cyclomatic Complexity and Why We Don't Use It https://www.cqse.eu/en/blog/mccabe-cyclomatic-complexity/

Referential Transparency

Accidental and Essential Complexity