Building an F1 prediction engine – Data Acquisition

Building an F1 prediction engine – Data Acquisition

Having defined what we are trying to achieve, it’s time to start thinking about what data we need and how we are going to obtain them in order to create features and train our models.

In this post I’m going to talk about the 2nd step of the prediction engine pipeline:

  1. Problem Definition
  2. Data Acquisition
  3. Feature Engineering
  4. Predictive Modelling
  5. Model Deployment

Since the challenge of predicting F1 race outcomes is so open, you have to think about what data you need and what data are indeed available. My first thought was that I needed historical data for the Grands Prix like who won each race, in which place he started, what car he drove, how much experienced he was and so on. Where will I get this data from? Should I go and scrap F1-related websites or is there some ready-to-use data source?

Fortunately, there is and it is called ergast.com. Ergast.com is “an experimental web service which provides a historical record of motor racing data“. It provides historical data since 1950 (!!) and is updated after the end of each race. Since I needed a lot of historical data (usually the more the better for ML algorithms), I decided to use the database image instead of making many queries over the API.

I am more acquainted with PostgreSQL so I decided to upload the data to a local PostgreSQL database. However, the database images provided are in MySQL format. 1st problem here; as I found out, there is no easy way of importing this format on a PostgreSQL DB. There are several type mismatches between the two formats that you have to modify manually. As this will be a procedure I’ll have to do after each Grand Prix (since the DB is updated and I’ll need to download it again in order to update the prediction model), I had to some-how automate it. Therefore, I searched for some script that could do that for me and I found this great MySQL to PostgreSQL converter. With a few additions to the code, my PostgreSQL-compatible .sql file was ready!

After running the .sql file, my DB was full of data in 13 tables. The DB has a star-like schema while the main table is the results table which holds all information regarding the outcome of each race. This table can be joined with the rest tables in order to get information like driver name, circuit name, race data, finishing status (e.g. retired) etc.

The next step after getting the data in the DB, is to play around with them to see exactly what’s available and what’s not. This is what I’ll discuss on my next post!

4 thoughts on “Building an F1 prediction engine – Data Acquisition

  1. Ι think you did a great job in collecting the data, especially in the format you need. My thoughts are going to the aspect that you don’t need only who won and which place he started, but 1. The championship scores to be embedded in the data, i.e. which place he was in the championship before the start of the race, 2.if you want to predictwho will win the next race, you should leave the name of the pilot and work with the position he was in the championship list

  2. Hi Taso,

    Indeed, you are correct! In this post I was just mentioning a few data that will be useful for the predictive model.

    In fact, I’m using the features you said and many many more! I’ll share more details on what data I’m actually using in a later blog post.

  3. Hello,

    I’ve been trying to get this data into a postgres database on docker and running into some issues. Thank you for the link you provided to the mysql to postgres converter! However, the db dump from ergast is simply a .sql file and they python converter looks to want a .mysql file to convert into a .psql file. Are you going through the process of importing this into mysql, dumping the mysql database to a .mysql file, and then converting it with the python script?

    Is there no way to directly convert the files from ergast into a postgres compatible format without having to spin up a mysql instance and go through the extra steps? It sounds like you found a way:

    “With a few additions to the code, my PostgreSQL-compatible .sql file was ready!”
    Would you mind sharing how you did this?
    Thank you in advance!

Leave a Reply

Your email address will not be published. Required fields are marked *