# TidyX Episode 176: Extracting all predictions from a Random Forest, Building Uncertainty Intervals, and Creating a Player Compare Tool

In this installment of predicting whether MLB pitchers will make the Hall of Fame, Ellis Hughes and I take the Random Forest model we built in and discuss how you can extract the predictions from each of the individual trees and then build uncertainty intervals so that you can construct an entire distribution of predictions. We then take these predictions and build a custom function for comparing the predicted distribution of two players and whether they will make it to the Hall of Fame.

# TidyX Episode 175: Predicting Hall of Fame Pitchers using Random Forests

Ellis Hughes and I continue to work with the MLB pitcher data, courtesy of the {Lahman} baseball package.

This week we walk through using a random forest model to calculate the probability a pitcher will make it to the Hall of Fame given several different performance stats.

In this episode we cover:

• Splitting data into training and testing sets
• Splitting training sets into cross validated folds
• Using {tidyverse} and {purrr} to construct a tuning grid and tune the random forest models to identify the optimal mtry and ntrees for the prediction task
• Fitting a final model with the optimized parameters and exploring predictions

# TidyX Episode 174: Forecasting MLB Hall of Fame Pitchers using AI

Continuing to build on our modeling strategy from the previous few weeks, this week Ellis Hughes and I show how to set up a neural network prediction model in R using {tensorflow}.

# TidyX Episode 173: Predicting Hall of Fame MLB Pitchers with Bayesian Logistic Regression

Last week, Ellis Hughes and I did our “Teach me something in R in 20min or less” series on predicting hall of fame MLB pitchers using logistic regression. This week, we complete the same task but instead do it using Bayesian logistic regression with {rstanarm}. We don’t get into priors or anything like that (simply use the default, weakly informative, priors that are specified in the function) since it is a 20min or less episode. However, we do cover:

• Specifying the syntax of the model
• Plotting the model results
• Making out of sample predictions
• Plotting posterior distributions for individual players