An analytical approach – Part 1 – Intro
Since no one else has ever tried to answer the question of who the greatest grand prix driver of all time is (sarcasm), I thought it would be fun to do so. My hope is that this exercise will help me learn more about ranking systems and Formula 1 in general.
My plan is to start by generating an ELO ranking system and then develop a model to normalize for car performance and reliability, different tracks, weather, strategy, and other drivers in the field. The objective of the latter will ideally allow to control for factors other than the driver and answer hypotheticals like who is the best rain driver or who is the best road course driver.
Since this is a fun exercise I will throw out a couple of predictions (biases?). Senna will be the best, especially in the rain with Verstappen being close behind but suffering from less years in the sport. Hamilton and Schumacher will be in the following tier.
In addition to some of the analytics I am interested in learning more about pre-’90s drivers, different tracks and setups, as well as whether Kimi Räikkönen is as underrated as I suspect :).
One disclaimer: From a data science standpoint, I believe a driver is a proxy feature for the overall team as well as their driving talent. For example, James Vowles recently stated that Lewis Hamilton has more driving talent than Michael Schumacher, but Schumacher was a master at getting the entire team on the same page. It is possible that Vowles is right but that the model rates Schumacher higher. Another example of something the model won’t pick up on is that it is rumored Schumacher did not share data with his teammates so he might look stronger than his teammates relative to other drivers, but the model will be ‘blind’ to the fact that his teammates might have been at a disadvantage relative to other drivers’ teammates.