Building an F1 prediction engine – Problem Definition

Building an F1 prediction engine – Problem Definition

In this series of posts I’m going to explain the process behind building a Machine-Learning model capable of predicting F1 race outcomes. This is not intented to be a complete guide so some background on building ML pipelines is assumed.

On a high level, these are the steps of any such pipeline:

  1. Problem Definition
  2. Data Acquisition
  3. Feature Engineering
  4. Predictive Modelling
  5. Model Deployment

In this post I’m going to talk about step 1.

Before even starting thinking about the availability of data or how many layers your deep neural net is going to have, you should go a step back and think about what problem you are trying to solve.

In this specific project, since there was no predefined project scope, I had to come up with what I wanted to do. And basically this was the most difficult thing. Would I try to predict the race winner? Would I try the predict probability of each driver finishing at every possible position? Would I try to simulate the race many times and come up with the most probable result?

Of course, any decision I take heavily affects the whole pipeline I’m going to build later. For instance, in some cases it would be more appropriate to have the drivers as the targets in a classification task while in others it’s better to have drivers as rows and treat their race position as a target for regression or classification. It also affects the evaluation criterion I’ll need to design.

I finally decided to stick with predicting the exact finishing position for each driver for each race (i.e. not the probability of finishing to a specific position). I did that because it just made more sense. How useful would it be if I said that, say, Alonso is going to come 1st with x% probability, 2nd with y% probability and so on? Also, it didn’t make sense to predict just the winner of a race (the guys in the back of the pack deserve some attention as well). Finally, I decided to go this route because it just sounded more difficult!

In the next posts I’m going to describe the technical details behind building this project. Stay tuned!

5 thoughts on “Building an F1 prediction engine – Problem Definition

  1. It’s a brilliant idea to forecast the exact finishing positions. Quite challenging area though! Good luck!

    1. Hi Blazej,

      Thanks for your comments. It’s indeed challenging.

      Currently, I’m making the assumption that all drivers are going to complete the race (i.e. no retirements). Although this is probably incorrect, I’ve incorporated this into the error metric I’m using for local cross-validation of the model.

      I’ll share more details in the future.

  2. It all depends on what perspective you choose. If you are working for a driver, you want to be able to tell him what’s the likelihood of finishing in a position or another.

Leave a Reply

Your email address will not be published. Required fields are marked *