As I was brainstorming possible features for the ML model, I created a lot of graphs that helped me get a better understanding of what is hidden in the data and what features may be helpful for the ML algorithms. Here, I share some of the most interesting and weird ones!
All driver retirements have been excluded from the charts below so that the figures are not skewed.
Below is a box plot of the current F1 drivers’ finishing position across their entire F1 career. Current teammates are next to each other so that comparisons are easier to draw. The positions within each driver’s box are the most expected ones while the small dots represent any outlier values i.e. finishing positions that are not very usual for the respective driver.
It is interesting to note the consistency of Vettel and Hamilton in the top positions. Somewhat surprising is Alonso’s box; I’d expect it to be closer to the podium positions but this is affected by some not-that-good seasons he had (past 2 seasons). It is also clear that Magnussen’s podium on his debut was a one-off surprise (similarly to Stroll’s podium in Baku this year) and that Alonso, Hulkenberg and Massa are the only drivers to clearly outperform their teammates.
The idea behind the next graph was to see if there are some races where the starting position is a better indicator of the finishing position compared to other races. The chart shows the correlation between the starting and the finishing position of the drivers for each race held in these circuits since 2000.
It is amazing how far back Monaco GP is! This could be due to the many accidents you have there that result in changes in the finishing position. Australian GP, the first GP of the season, has one for the lowest correlations; probably because the cars are new and have glitches and the teams try to get accustomed to setting up their strategy (in case there are rule changes like this year). Mexico GP has been held just two times so let’s see if it stays in the top after this year’s GP. Belgian GP is the only one with correlation below 0.5; this could be due to the tricky weather conditions present in the Ardennes countryside.
The following chart shows the average finishing position of each driver to each of this year’s GPs. You can see that the drivers have some favourite and some less favourite circuits. For instance, Vettel is consistently better than Raikkonen except in the Belgian and Hungarian GPs.
Although the next plots do not provide any clear insights as the above ones, it’s interesting to see whether there is some latent structure in the data; something that distinguishes those who finish first from those who finish further back. In fact, I added those just because they are cool 🙂
- 0 -> driver finished on the podium
- 1 -> driver finished 4th to 7th
- 2 -> driver finished 8th to 14th
- 3 -> driver finished 15th or further back
As you can see, all methods identified some latent patterns in the data since there are clearly darker and lighter areas in all six plots. Even the random projections managed to capture patterns in the data. Although these points could be used as additional features to the model, I’m not currently using any of them.
Did you find the charts interesting or found something unexpected? Looking forward to your suggestions and ideas.