Sniffing Out Errors

As seasoned analysts will know; it can be difficult to identify when to draw a line under your Predictive Modelling, accept its performance as sufficient for your purposes and move on to deployment.

This article will consider an established method for determining when enough is enough; Error Analysis.

Why use Error Analysis?

Analysts and Data Scientists will be familiar with examining residual plots for their models and looking for outlier errors that may indicate that the model’s underlying assumptions have been broken, or that some of the data points might be extreme outliers that cause grave problems when trying to build a model on the whole data set.

But while examining residual plots is great from a qualitative point of view, as data natives, we should always be looking for quantitative methods for describing, classifying and understanding these errors.

What we need is a statistical analysis, that fulfills our desire to quantitatively understand the weaknesses in our models. One simple practice which meets this need and can help the indecisive data-practitioner to direct and allocate their limited time is Error Analysis.

In short, Error Analysis builds a model out of your existing model’s errors!

From this you gain an understanding of where your model is succeeding and what can be amended to improve performance. Additionally, this process can be trivially integrated into your data science pipeline and run multiple times to iteratively improve model performance.

This methodology is in line with Agile Data Science principles; of building simple, failing fast, and iterating.

As an example, say we are building a production-grade model to predict distances jumped by Olympic long jumpers. We start with a very simple model, just taking into account the jumpers’ physical vital statistics. We use this model to make initial predictions then subtract our simple predictions from the actual observed jumps in order to calculate errors, on which we build a very simple error analysis model.

We now ask ourselves; “what differentiates samples for which we are producing very accurate results (low error) from those with very inaccurate model results?”

From this model we are able to establish that the feature creating the most error in the model is our Jumpers’ shoe sizes. Whilst this may be broadly indicative of distance jumped, it also creates a lot of noise. From here we can decide how to attempt an improvement, perhaps reducing this variable to a few classes of shoe sizes, or trying to curb errors through the addition of more data sources, perhaps their leg measurements.

After attempting an improvement to the first iteration of the model, we test that our deployment still functions, rerun our error analysis and continue to iterate and improve. All the while, our error analysis model gives us handy pointers of what to focus on in the next iteration.

Types of Errors

You might then ask, “how exactly can we interpret our results in order to get an idea of what to iterate on and improve?”

Building a simple model using all your input features to explain the error will give us an indication of which features are driving the majority of the error. While you may intend to use a “trendier” non-linear algorithm in your main predictive model, there is no requirement for such things in Error Analysis. In-fact, a highly interpretable linear model in error analysis will yield much faster and clearer directions. Sophisticated ML algorithms have revolutionized many processes, however using one in Error Analysis is the equivalent of using Satellite Navigation to get to your next-door neighbor’s house!

Now it’s time to think about what sorts of things could be causing that pesky error-prone feature to misbehave:

  • Messy Data – Perhaps the feature has a large number of missing values? Or inconsistent data collection?
  • Insufficient Information – would collecting more data give a better performance?
  • Feature Overfit – Maybe this feature fits very well for a certain proportion of the data but hands us great big errors for the rest of the data? Would one-hot encoding help draw out where the feature is useful vs. where it is spurious? Is there an upper or lower threshold where the feature behaves non linearly?
  • Feature Overfit within the Algorithm – perhaps your chosen algorithm is being too generous with this feature? Would a simpler algorithm such as random forest use a different and simpler set of features?

Following an iterative approach to predictive model building, such as error analysis, allows you to consistently test and improve, with handy pointers of where to direct your attention. Knowing that you’ve evaluated the cause of your errors, arms us with the information of not only where our predictive model will be successful but also where it is prone to deviate.

About the Author

Naomi Beckett is a Senior Data Scientist, applying SparkBeyond’s technology to business challenges across multiple industries. Past solutions include: risk scoring for financial services, predictive maintenance in the resources industry and e-commerce pricing. With a Masters degree in Statistical Science from University College London, Naomi enjoys group exercise, happy hour and blanket forts.

Sign up for the free insideAI News newsletter.