TidyX Episode 175: Predicting Hall of Fame Pitchers using Random Forests

Ellis Hughes and I continue to work with the MLB pitcher data, courtesy of the {Lahman} baseball package.

This week we walk through using a random forest model to calculate the probability a pitcher will make it to the Hall of Fame given several different performance stats.

In this episode we cover:

  • Splitting data into training and testing sets
  • Splitting training sets into cross validated folds
  • Using {tidyverse} and {purrr} to construct a tuning grid and tune the random forest models to identify the optimal mtry and ntrees for the prediction task
  • Fitting a final model with the optimized parameters and exploring predictions

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.