Ellis Hughes and I continue to work with the MLB pitcher data, courtesy of the {Lahman} baseball package.
This week we walk through using a random forest model to calculate the probability a pitcher will make it to the Hall of Fame given several different performance stats.
In this episode we cover:
- Splitting data into training and testing sets
- Splitting training sets into cross validated folds
- Using {tidyverse} and {purrr} to construct a tuning grid and tune the random forest models to identify the optimal mtry and ntrees for the prediction task
- Fitting a final model with the optimized parameters and exploring predictions
To watch our screen cast, CLICK HERE.
To access our code, CLICK HERE.