Building an F1 prediction engine – Problem Definition

March 16, 2017 Stergios Comments 5 comments

In this series of posts I’m going to explain the process behind building a Machine-Learning model capable of predicting F1 race outcomes. This is not intented to be a complete guide so some background on building ML pipelines is assumed.

On a high level, these are the steps of any such pipeline:

Problem Definition
Data Acquisition
Feature Engineering
Predictive Modelling
Model Deployment

In this post I’m going to talk about step 1.

Before even starting thinking about the availability of data or how many layers your deep neural net is going to have, you should go a step back and think about what problem you are trying to solve.

In this specific project, since there was no predefined project scope, I had to come up with what I wanted to do. And basically this was the most difficult thing. Would I try to predict the race winner? Would I try the predict probability of each driver finishing at every possible position? Would I try to simulate the race many times and come up with the most probable result?

Of course, any decision I take heavily affects the whole pipeline I’m going to build later. For instance, in some cases it would be more appropriate to have the drivers as the targets in a classification task while in others it’s better to have drivers as rows and treat their race position as a target for regression or classification. It also affects the evaluation criterion I’ll need to design.

I finally decided to stick with predicting the exact finishing position for each driver for each race (i.e. not the probability of finishing to a specific position). I did that because it just made more sense. How useful would it be if I said that, say, Alonso is going to come 1st with x% probability, 2nd with y% probability and so on? Also, it didn’t make sense to predict just the winner of a race (the guys in the back of the pack deserve some attention as well). Finally, I decided to go this route because it just sounded more difficult!

In the next posts I’m going to describe the technical details behind building this project. Stay tuned!

5 thoughts on “Building an F1 prediction engine – Problem Definition”

Blazej Maksym says:

March 20, 2017 at 9:18 am

It’s a brilliant idea to forecast the exact finishing positions. Quite challenging area though! Good luck!

1. Stergios says:
  
  March 20, 2017 at 10:37 am
  
  Hi Blazej,
  
  Thanks for your comments. It’s indeed challenging.
  
  Currently, I’m making the assumption that all drivers are going to complete the race (i.e. no retirements). Although this is probably incorrect, I’ve incorporated this into the error metric I’m using for local cross-validation of the model.
  
  I’ll share more details in the future.
  
MakeItHappen says:

March 24, 2017 at 8:55 pm

It all depends on what perspective you choose. If you are working for a driver, you want to be able to tell him what’s the likelihood of finishing in a position or another.

1. Stergios says:
  
  March 24, 2017 at 10:11 pm
  
  Sure! But would it make sense for a blog like this one? 🙂
  
Pingback: Building an F1 prediction engine – Predictive Modelling Part I – F1 predictor

F1 predictor

Machine-learning based F1 race prediction engine

Building an F1 prediction engine – Problem Definition

March 16, 2017 Stergios Comments 5 comments

5 thoughts on “Building an F1 prediction engine – Problem Definition”

Leave a Reply to Stergios Cancel reply