In this series of posts I’m going to explain the process behind building a Machine-Learning model capable of predicting F1 race outcomes. This is not intented to be a complete guide so some background on building ML pipelines is assumed.
- Problem Definition
- Data Acquisition
- Feature Engineering
- Predictive Modelling
- Model Deployment
In this post I’m going to talk about step 1.
Before even starting thinking about the availability of data or how many layers your deep neural net is going to have, you should go a step back and think about what problem you are trying to solve.
In this specific project, since there was no predefined project scope, I had to come up with what I wanted to do. And basically this was the most difficult thing. Would I try to predict the race winner? Would I try the predict probability of each driver finishing at every possible position? Would I try to simulate the race many times and come up with the most probable result?
Of course, any decision I take heavily affects the whole pipeline I’m going to build later. For instance, in some cases it would be more appropriate to have the drivers as the targets in a classification task while in others it’s better to have drivers as rows and treat their race position as a target for regression or classification. It also affects the evaluation criterion I’ll need to design.
I finally decided to stick with predicting the exact finishing position for each driver for each race (i.e. not the probability of finishing to a specific position). I did that because it just made more sense. How useful would it be if I said that, say, Alonso is going to come 1st with x% probability, 2nd with y% probability and so on? Also, it didn’t make sense to predict just the winner of a race (the guys in the back of the pack deserve some attention as well). Finally, I decided to go this route because it just sounded more difficult!
In the next posts I’m going to describe the technical details behind building this project. Stay tuned!