Author Archives: Patrick

TidyX 57: Naive Bayes Classifier for pitchf/x classification

For the fifth model in our pitchf/x classifier series, Ellis Hughes and I buildĀ  Naive Bayes Classifier and discuss it’s results on the test data and how it might be as part of an ensemble of models for prediction.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP
  3. Episode 55: Decision Trees, Random Forest, & Optimization
  4. Episode 56: XGBoost

TidyX 56: XGBoost for pitchf/x classification

Building on the previous weeks, Ellis Hughes and I work on pitchf/x classification using the popular XGBoost algorithm.

We discuss:

  • The basics of XGBoost
  • Training an XGBoost model
  • Evaluating the variables of importance within the model
  • Tuning model parameters of an XGBoost model using the {caret} package

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

For previous episodes in this classification series:

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP
  3. Episode 55: Decision Trees, Random Forest, & Optimization

 

TidyX 55: Decision Trees, Random Forest, & Optimization

Ellis Hughes and I continue our series on classification of MLB pitch types by working with Decision Trees and Random Forests.

We discuss:

  • Building a decision tree
  • Building a random forest
  • The advantage of random forests over decision trees
  • Tuning the random forest using the {caret} package using parallel processing
  • Evaluating the model’s classification accuracy overall and within pitch

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

Previous screen casts in this series:

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP

TidyX 53: MLB Pitch Classification Series — EDA & Hierarchical Clustering

This week, Ellis Hughes and I begin a several week series on MLB Pitch Classification. Each week we will go over a different classification model using pitchf/x data, which we accessed via the {mlbgameday} R package.

In this first video of the series, we cover the following steps:

  • Loading the data
  • Data cleaning
  • Exploratory data analysis
  • Hierarchical cluster analysis
  • Interactive plotting of the dendrogram in {plotly}

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.