Writing algorithms is one of the hardest but also most satisfying tasks in software development. During our career we’ve been lucky to have had the chance to design and implement many novel and interesting algorithms. A few examples of the things we have worked on:
- Signal processing to align signals from different time captures of an electrophoretic trace
- Image processing to identify and calibrate robotics in three-dimensions
- Signal analysis to check the quality of biological samples
- Multivariate analysis of manufacturing processes to detect product issues before they occur
- Real-time feature recognition in consumable manufacturing
While each algorithm is unique, there are many reoccurring principles we’ve found that, if followed, help to produce a reliable algorithm.
Inputs and Outputs
An algorithm is an effective method that can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function.
OK, thanks Wikipedia.
We find it useful to start by treating the algorithm as a magic black box that takes input data then outputs the result.
- Input: Before you consider writing code, gather a data set. This must include everything that will be thrown at the algorithm in the real world – not only the full range of expected data, but also the unexpected. For instance, if you have an algorithm that intends to take a picture of an animal and returns a cuteness value, how does this handle a picture submitted of an inanimate object? What about a fully black or white image? Muffins or Chihuahuas?
- Output: In some cases you know for any given input exactly what the output should be. Other times however, you might be asked to design an algorithm where the experts themselves are not able to provide the answer. An example we had of this was the development of a signal processing algorithm that returned a single value metric. Once we had gathered a data set, we then asked several experts to evaluate a subset of the data. It quickly became apparent that there were large differences in how they each rated signals (sometimes even the same person on different days). As a result, a large part of this project was a separate piece of software to gather the expert’s opinions and weigh them against each other, providing a sort of crowd sourced intelligence. Expect more details on this in the future!
- Automated Testing: It shouldn’t need to be said: testing is essential to creating a robust and reliable algorithm. If you only write unit tests for one thing, it should be to cover your algorithms. You should write these tests before it’s even implemented – you are doing TDD, right?
You now know what the algorithm should do. Time to figure out how to do it.
- Don’t reinvent the wheel: Spending a few hours of research is always worth it. Even where there is no pre-existing algorithm that fits your given problem, you can often tailor a more general purpose algorithm to your needs. Most of the algorithms we have written in the past have started with a more widely known algorithm, which has then been modified to handle our specific domain. This not only saves time, but is also likely to lead to a more reliable and maintainable solution. It’s also worth considering machine learning, where a lot of recent improvements have made it viable for non-experts to integrate third-party products into production software. We saw a great talk at the GOTO Conference 2017 which demonstrated using TensorFlow & Google Cloud
- ‘Rubber Duck’ the Solution: Get one or two other software developers (the more critical the better!) and talk them through your intended solution. This will often identify design flaws in the algorithm, which is an order of magnitude simpler to fix at this stage than after you have written the code. Also, as the algorithm is likely to need bug fixes and updates in years to come, at least you won’t be the only one able to understand it.
- Documentation: The users of the system often want to understand what the algorithm is doing at a conceptual level (and why wouldn’t they – how can they trust the results if they can’t understand it?), so it’s worth spending a bit of time documenting it (in addition to code level documentation). If you can’t write a simple 2-3 paragraph description (perhaps with a few diagrams) you are going to cause yourself a lot of pain.
Focus on writing an algorithm that provides the correct answer. Then refactor it to improve maintainability – your automated tests should make this safe and easy. Only then should you consider optimisation. As Donald Knuth said:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil
Or from Michael A. Jackson in Principles of Program Design:
Rules of Optimization:
Rule 1: Don’t do it.
Rule 2 (for experts only): Don’t do it yet.
In some domains performance is not just a nice to have, but an essential requirement – for instance, in a crash detection system in a car. Before diving into development, ask the required performance: does it need to provide an answer within 10ms, 1 second, 10 seconds, sometime in the next day? This will set some constraints on the implementation.
Before you start optimising, make sure you profile the code. It will help you focus on the parts of the algorithm that are causing the biggest performance bottlenecks. It will also tell you if your optimisations are working. Compilers are pretty smart these days, and sometimes they will be doing the optimisation behind the scenes anyway.
Keep It Simple
Saving the best advice to last: keep it simple! You must be able to reason about how the algorithm is working on a specific data input, otherwise you will never be able to debug it. We have found that an algorithm will grow in complexity over time as each new edge case is found and handled. It’s important that these are not just ‘hacked’ in, but given careful consideration how they integrate with the core algorithm. Where necessary, break the algorithm down into separate stages with well-defined intent and boundaries. This keeps it simple to understand and gives easy points for inspection to help with debugging.