As I work to improve my python skills, I’ll keep adding my efforts to the blog. This week, I saw a nice article showing a basic approach to coding a neural network in R to predict yards after catch in NFL receivers from the folks at www.opensourcefootball.com. So, I decided to take that idea and try and create my own neural network in Python.
Some notes:
- This is not a to say that the neural network was the best method to answer the question. Rather, it was just a way for me to try and take stuff I’d already do in R and see if I could learn it in Python.
- This is not a blog post to cover all aspects of neural networks (not even close). It just so happened that the original article used a neural network via tensorflow in R and I happened to be doing some work in tensorflow in Python this week, so it was an easy connection to make.
- My Python coding is pretty messey and at times I feel like it takes me several steps to do what someone might do in a few lines. Feel free to comment and offer suggestions.
- Harking back to point one, I finish the script by coding a linear regression model to answer the same question (predict yards after catch) as it is a simpler and more interpretable than a neural network. I construct the regression model in two ways. First, I do it in sklearn, which seems to be the preferred approach to coding models by pythoners. I then do it in the statsmodels library. I’m sure this is more a function of my poor python programming skills but I do feel like the model output from statsmodels is more informative than what I can get returned from sklearn (and I show this in the script).
The data came from nflfastR, which is an R package for obtaining NFL play-by-play data created by Ben Baldwin et al.
I provide a step-by-step explanation for coding the model on my GITHUB page.