Kaggle Playground Retrospective and Next Up

Taking a step back to review performance from the Rwanda CO2 Kaggle competition I placed 841st out of 1,442 submissions (59th percentile) which is certainly nothing to write home about, but given the relatively short amount of analysis and the extensibility of the tools developed it is not completely discouraging.

So what could I have done better? After reviewing some of the posted notebooks the most obvious / painful thing is that the time period includes COVID and including an indicator for that time period or even excluding that data would likely have improved the model performance. I also one-hot encoded the months / week numbers to try and account for seasonality, but a couple of notebooks used sine and cosine to better capture the data’s cyclicality. There is a really good explanation of the approach in this kaggle notebook.

So what’s next on this deliberately hacky adventure…

On the data science front there is a new competition focused on a sleep study for children which sounds interesting. I would also like to learn more front-end development so I want to tackle the Odin Project to start to develop those skills and maybe tackle a public-facing full-stack project.

Leave a comment