I was recently asked to give a talk about Hadley’s package testthat. Any discussion of testthat cannot be in isolation - we have to put it into the wider software development context. I think this is the first time that I have used the words software development on this blog, and for many R users, thinking of their code as software development, may not be familiar. In particular, I think that those of us who have come to Data Science or statistical computing from a statistical, especially academic background, tend to have a poor understanding of some common practices among developers that would actually make their lives much easier1. Version control is one I’ve talked about before (recognise this?), but another is unit testing.
Having a second person look over your code: QA or 2i, whatever you want to call it, is an incredibly important part of the analytical process in many organisations. Sadly, in my experience this is often done badly, especially if there is not a strong community of R users (assuming we are talking about R - although we needn’t be) and a strong idea of ‘what good looks like’.
What if we could:
- write our code in a way that makes QA much easier, and
- actually automates much of this testing in a more thorough and consistent way, as a matter of course.
As is often the case, software developers have been grappling with this problem for a long time, under much more trying conditions, and Test Driven Development (TDD) is one approach to doing this.
In this post, I am going to give a very simple example of TDD in an analytical context, using testthat2
A simple example
Imagine that we want to normalise a vector, which is a common task for machine learning.
Let’s think for a second what we would need to do. Mathematically it might look something like this:
\[\vec{y} = \frac{\vec{x}-\min{x}}{\max{x}-\min{x}}\]Where:
- $\vec{y}$ is our normalised vector,
- $\vec{x}$ is our target vector.
Before setting out to write the code, there are a few things we already expect to be true of $\vec{y}$. We can list some of these expectations, for example:
- Elements of $\vec{y}$ should all be between $0$ and $1$
- $\vec{y}$ should be the same length as $\vec{x}$
- $\vec{x}$ should be scalar (
numeric
in R)
Let’s try this. First let’s create a vector x to play with based on a uniform distribution with a minimum of $10$ and a maximum of $100$
Now to normalise:
Now we can test our assumptions:
What is the range of $\vec{y}$
How long is $\vec{y}$?
What class is $\vec{y}$?
All good so far, but this is quite long winded, and if we had to check that our code worked every time like this, we would have to write a lot of code. Second we would need to manually check our little tests manually, and we might conceivably notice a failed test. If we started to litter our code with a lot of these ad-hoc tests, might also start to break another of the tenets of good software development: DRY or Don’t Repeat Yourself.
The TDD process
Now lets do the same again from a TDD perspective. Of course, in order to do this, we need to functionalise our code, which may also be something relatively new to a statistical scripter The benefits become obvious pretty quick - generalising our code into functions allows us to generalise our code, whilst also hiding away the complexities, and keeping our code nice and tidy.
The TDD process looks something like this:
- Write a series of tests
- Run tests and see if the tests pass
- Write the code
- Run tests again
- Repeat
Write a series of tests & Run them
Just as in the previous section I discussed some of our expectations of what our code will produce, in TDD we formalise these expectations by explicitly writing tests first, before we even start to think about the code in our function. We know that our tests will fail first time round, as there is no code to make them pass, but spelling out our expectations of what our function will produce, is an important step in its own right.
This is where testthat comes in. Testthat is a unit testing frame work for R (we think of our function as a unit of code), that makes TDD very easy.
So, to re-use my normalisation example: first I define my tests:
A simple test from testthat
looks like this:
This test encapsulates our expectation that after normalisation, our vector will have a minimum of $0$ and maximum of $1$.
Usually we want to package up a series of tests together to test a particular function. Here’s how we can do that:
So we run this code, and the tests fail.
Why?
Because could not find function "normalise"
.
Of course - we have not written it yet!
Write the code
So lets write our function now. We define our function to take one input which should be a vector.
Run tests again
So with that written, lets try again with our tests.
This time we don’t get anything returned. These tests are silent. If they pass, they pass silently; if they fail they let us know. So all good.
Repeat
At this point, we might start to think about the other ways that we might test our function. What would happen for instance if someone passed something to it that it was not expecting, for example:
- A single scalar instead of a vector.
How might we want our function to work in these case? Let’s say we want to handle this cases very simplistically by just returning an error. We expand our test suite to include this:
These tests fail.
Why?
The error message says normalise(1) did not throw an error
; so we know what we have to do…
Now that we have updated the function, lets run the tests again…
So all the tests pass silently!
Tidying this all up
So this is all a bit long-winded.
You don’t really want to be repeating these long strings of tests over and over again; it would be nice if we could just get these to run automatically, and this is possible using the function testthat::test_dir()
which will look into a directory, and run all the tests in that directory beginning with the prefix test_
.
Better yet, we write a package, and when we build our package containing our collection of functions, our tests will be run automatically. In fact, a good example of how to organise your tests can be seen in the automated tests that ship with the testthat package itself; there are a lot of them!
We can take this even further, and use some code coverage tools (like https://codecov.io) to look at our code and tell us which functions have been tested, and which haven’t. For example, in my govstyle package we can see that around 80% (at time of writing) of the code is covered by tests:
Drilling deeper, I can see that of the two functions that make up the package, one of them is only 60% covered:
I can even see the offending lines of code, which are in this case an untested if
statement:
I don’t think I need to worry about this too much.
I’ve written about package development before - and there are a lot of other benefits to writing packages, not least the fact that for teams, it can allow us to enshrine business knowledge into a corpus of code that is inseparable from the documentation of that code. It’s well worth the relatively modest learning curve.
-
Props to Software Carpentry for righting this wrong. ↩