Alright folks, let me tell you about this little side project I tackled while stuck indoors during a Cubs rain delay. Yeah, classic Chicago, right? Anyway, I was bored out of my skull and figured, “Hey, why not try something new?” So, I decided to mess around with a bit of data and see what I could whip up.

First thing I did was grab some data. I found a public dataset online with historical weather info for Chicago, focusing on days with rain. You know, the kind of days that make you want to stay in and binge-watch Netflix. After I downloaded the dataset, loaded it into a pandas DataFrame, which is my go-to for data manipulation in Python.
Then cleaned the data. There were some missing values and weird formatting issues. I used pandas functions like fillna()
and to_datetime()
to sort it out. I removed the rows that I can’t use for my data analysis. It was a bit tedious, but hey, that’s data science for ya! Explored the data. I wanted to get a feel for what I was working with. I created some histograms and scatter plots to visualize the distribution of rainfall and temperature. Saw some interesting patterns that helped frame my focus.
I wanted to see if there was any correlation between rainfall and other factors like temperature, humidity, or even the day of the week. I calculated the correlation coefficients using pandas corr()
function. Turns out, there wasn’t a super strong correlation, but I did notice a slight tendency for rain to be more frequent on certain days of the week, specifically weekends. Nothing earth-shattering, but interesting nonetheless.
Next, I decided to build a simple predictive model to see if I could predict the probability of rain on a given day based on the available data. I opted for a Logistic Regression model using scikit-learn. I split the data into training and testing sets, trained the model on the training data, and then evaluated its performance on the testing data.
To my surprise, the model actually performed reasonably well, not perfect by any means, but good enough for a little rain delay project. The accuracy was around 70%, which meant it correctly predicted whether it would rain or not about 7 out of 10 times. Not bad, right?
Finally, I deployed the model using Flask. I created a simple web interface where you can input the date and get a prediction of the probability of rain. It’s nothing fancy, but it works. I can now check my super scientific rain model before heading to Wrigley. It was a fun little project that kept me entertained during a typical Chicago rain delay. Plus, I learned a thing or two about data analysis and machine learning along the way!
So, there you have it. That’s how I spent my rain-soaked afternoon. Maybe next time I’ll tackle something even more ambitious. But for now, I’m happy with my little rain prediction model. Go Cubs!