Author Archives: Patrick

TidyX 59: pitchf/x classification with class imbalance

In the past 6 episodes of this series, Ellis Hughes and I have been working on developing a pitchf/x classification model using data from the {mlbgameday} package. Along the way we’ve mentioned that the data has a large class imbalance, with the majority class being the four seam fastball (FF).

This week, we discuss the approach of up and down sampling your data as a potential way of trying to address this class imbalance issue. We conclude by comparing our up and down sampled models by giving an intro to log-loss, Brier score, and AUC. Next week, we will wrap up this series by comparing all of our models using those three measures.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

For previous episodes in this series:

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP
  3. Episode 55: Decision Trees, Random Forest, & Optimization
  4. Episode 56: XGBoost
  5. Episode 57: Naive Bayes Classifier
  6. Episode 58: Tensorflow Neural Network

TidyX 58: Tensorflow Neural Network for pitchf/x classification

In our sixth model of the pitchf/x classification series, Ellis Hughes and I build a neural network using {tensorflow}. We then go on and discuss whether the steak is worth the sizzle as far as using a model as complex as a neural network versus one of the more simpler models that we covered in prior episodes in this series.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

For previous episodes in this series:

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP
  3. Episode 55: Decision Trees, Random Forest, & Optimization
  4. Episode 56: XGBoost
  5. Episode 57: Naive Bayes Classifier

TidyX 57: Naive Bayes Classifier for pitchf/x classification

For the fifth model in our pitchf/x classifier series, Ellis Hughes and I buildĀ  Naive Bayes Classifier and discuss it’s results on the test data and how it might be as part of an ensemble of models for prediction.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP
  3. Episode 55: Decision Trees, Random Forest, & Optimization
  4. Episode 56: XGBoost

TidyX 56: XGBoost for pitchf/x classification

Building on the previous weeks, Ellis Hughes and I work on pitchf/x classification using the popular XGBoost algorithm.

We discuss:

  • The basics of XGBoost
  • Training an XGBoost model
  • Evaluating the variables of importance within the model
  • Tuning model parameters of an XGBoost model using the {caret} package

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

For previous episodes in this classification series:

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP
  3. Episode 55: Decision Trees, Random Forest, & Optimization

 

TidyX 55: Decision Trees, Random Forest, & Optimization

Ellis Hughes and I continue our series on classification of MLB pitch types by working with Decision Trees and Random Forests.

We discuss:

  • Building a decision tree
  • Building a random forest
  • The advantage of random forests over decision trees
  • Tuning the random forest using the {caret} package using parallel processing
  • Evaluating the model’s classification accuracy overall and within pitch

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

Previous screen casts in this series:

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP