Category Archives: TidyX Screen Cast

TidyX 56: XGBoost for pitchf/x classification

Building on the previous weeks, Ellis Hughes and I work on pitchf/x classification using the popular XGBoost algorithm.

We discuss:

  • The basics of XGBoost
  • Training an XGBoost model
  • Evaluating the variables of importance within the model
  • Tuning model parameters of an XGBoost model using the {caret} package

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

For previous episodes in this classification series:

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP
  3. Episode 55: Decision Trees, Random Forest, & Optimization

 

TidyX 55: Decision Trees, Random Forest, & Optimization

Ellis Hughes and I continue our series on classification of MLB pitch types by working with Decision Trees and Random Forests.

We discuss:

  • Building a decision tree
  • Building a random forest
  • The advantage of random forests over decision trees
  • Tuning the random forest using the {caret} package using parallel processing
  • Evaluating the model’s classification accuracy overall and within pitch

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

Previous screen casts in this series:

  1. Episode 53: Pitch Classification & EDA
  2. Episode 54: KNN & UMAP

TidyX 53: MLB Pitch Classification Series — EDA & Hierarchical Clustering

This week, Ellis Hughes and I begin a several week series on MLB Pitch Classification. Each week we will go over a different classification model using pitchf/x data, which we accessed via the {mlbgameday} R package.

In this first video of the series, we cover the following steps:

  • Loading the data
  • Data cleaning
  • Exploratory data analysis
  • Hierarchical cluster analysis
  • Interactive plotting of the dendrogram in {plotly}

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX 52: Creating Presentations in R with {Xaringan}

Tired of making all your nice tables and plots in R and then having to copy and paste them into power point for your next presentation?

If you answered “yes” to that question then you are going to love TidyX 52. This week, Ellis Hughes and I discuss how to make reproducible presentations in R using the {Xaringan} (pronounced cha-reen-gan) package.

Using R Markdown, you can build an entire presentation deck without having to copy and paste your work into power point. Moreover, the ease of {Xaringan} makes things like reproducibility of your work and updating of your presentation slides as simple as changing a few lines of code and hitting Knit.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX 51: Building Statistical Models into Shiny Apps

We’ve done a fair bit of {shiny} over the past year, detailing different approaches to provide an interactive data environment for end users. However, one thing we haven’t done yet is talk about embedding statistical models into your {shiny} apps. This week, Ellis Hughes and I do just that.

We create a random forest classifier, which uses body size features to predict the probability that a penguin is one of three different species and embed this into a {shiny} web app. For this, we will be using the penguins data set form the {palmerpenguins} R package.

In addition to from making an interactive statistical app, we cover a few other nuances of {shiny} app development:

  • Organizing the user components horizontally across the top of the app versus the commonly used sidebar panel.
  • How to increase the size and width of the plot in the main panel.
  • How to center the plot within in the main panel.
  • How to add a title to the top of your {shiny} app.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.