Firefinch took a couple of days out of office recently to attend ODSC (Open Data Science Conference) in London. There were plenty of exciting new machine learning techniques on display, however something that really stood out to us is how much of the focus was on issues arising from integrating with real world use cases – deployment, explainability, responsibility, consistency – really showing how much the field has matured in recent years.
We’ve picked out a few of our favourite talks and summarised below. There were 8 parallel tracks, so we were only able to attend a fraction of the talks; eagerly waiting until the videos are released so we can catch the rest!
Learning with Limited Labelled Data
Shioulin Sam, Research Engineer, Cloudera (Slides)
This talk demonstrated the concept of active learning, which aims to reduce effort spent labelling data, while still retaining a high accuracy. It does this through running repeated training loops, with each iteration taking in more input from an expert. The important detail is that at each iteration, an algorithm is used to select which data points in the dataset are the most useful for the training engine – typically these are data points that sit near the classification boundaries.
This talk was probably our favourite from the conference, as the technique of active learning is very relevant to a project we were recently working on, where we had a large unlabelled dataset but limited access to experts to classify the training data.
ML in Production: Serverless and Painless
Oliver Gindele, Head of ML, Datatonic (Slides)
This talk really illustrated how far the managed ML services have come, and how much easier it is becoming to move complex ML pipelines to production
The use case illustrated in this talk is very challenging. In an effort to reduce plastic packaging, Lush prototyped an app that could recognise a product from a photo and retrieve the product information, and brought in the Datatonic team to improve the ML model and productise it. When you consider that a significant amount of their products look very similar (bath bombs, soaps, etc), and the app needs to work both in store and at home under very different lighting conditions, the level of accuracy they achieved (~97%) is remarkable.
The most impressive part of the system though is it has a fully automated re-training loop hosted in managed services to accommodate new product introduction. When a new product is added to the catalogue, images of the new product are uploaded along with the product details to a cloud service. This starts a new training loop, where the image data is augmented to increase the training size, added to the training set, then the model is re-trained and accuracy measured. Assuming the model accuracy meets pre-defined targets, the new model is made available to the app developers to include in their next app update.
Practical Methods to Optimise Model Stability: A Case Study Using Customer-Lifetime Value at Farfetch
Davide Sarra, Data Scientist & Kishan Manani, Senior Data Scientist, Farfetch (Slides)
Accuracy is by far the most commonly used metric to evaluate an ML model. This talk goes in depth into another model metric: stability. Stability refers to the variability of the model predictions arising from the training process, changing training data, etc. For example, if some new data is introduced into the training set, does the model still produce the same/similar results compared to the previously trained model?
Having written many image and signal analysis algorithms for Life Science applications, we have seen this requirement of stability come up many times. If an algorithm occasionally produces significantly different outputs from a similar looking input, this can erode the users trust in the software. This is particularly relevant with ML, where the explainability of some algorithms can be low.
An interesting aspect of this talk was how it took a step back from the ML details, and instead focused on the assessment and selection of the final ML model based on its metrics – in this case selecting the algorithm that gave a good balance between accuracy and stability.
Other Notable Talks
ODSC KeyNote: The Future is Multiagent! – Michael Wooldridge, PhD – Programme Co-Director of the Alan Turing Institute. Professor, Head of Department of Computer Science, University of Oxford
Industrial Artificial Intelligence – The Driving Force – Diego Galar, Professor of Condition Monitoring, Lulua University of Technology
Practical, Rigorous Explainability in AI – Tsvi Lev – General Manager, NEC Corporation
Explainable AI – Methods, Applications & Recent Developments – Dr. Wojciech Samek – Head of Machine Learning, Fraunhofer Heinrich Hertz Institute
Tools for High Performance Python – Ian Ozsvald – Principle Data Scientist, Co-Founder, PyData London
Generative Adversarial Networks for Finance – Alexandre Combessie – Data Science Lead, Dataiku