TidyX 53: MLB Pitch Classification Series — EDA & Hierarchical Clustering

This week, Ellis Hughes and I begin a several week series on MLB Pitch Classification. Each week we will go over a different classification model using pitchf/x data, which we accessed via the {mlbgameday} R package.

In this first video of the series, we cover the following steps:

  • Loading the data
  • Data cleaning
  • Exploratory data analysis
  • Hierarchical cluster analysis
  • Interactive plotting of the dendrogram in {plotly}

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX 52: Creating Presentations in R with {Xaringan}

Tired of making all your nice tables and plots in R and then having to copy and paste them into power point for your next presentation?

If you answered “yes” to that question then you are going to love TidyX 52. This week, Ellis Hughes and I discuss how to make reproducible presentations in R using the {Xaringan} (pronounced cha-reen-gan) package.

Using R Markdown, you can build an entire presentation deck without having to copy and paste your work into power point. Moreover, the ease of {Xaringan} makes things like reproducibility of your work and updating of your presentation slides as simple as changing a few lines of code and hitting Knit.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX 51: Building Statistical Models into Shiny Apps

We’ve done a fair bit of {shiny} over the past year, detailing different approaches to provide an interactive data environment for end users. However, one thing we haven’t done yet is talk about embedding statistical models into your {shiny} apps. This week, Ellis Hughes and I do just that.

We create a random forest classifier, which uses body size features to predict the probability that a penguin is one of three different species and embed this into a {shiny} web app. For this, we will be using the penguins data set form the {palmerpenguins} R package.

In addition to from making an interactive statistical app, we cover a few other nuances of {shiny} app development:

  • Organizing the user components horizontally across the top of the app versus the commonly used sidebar panel.
  • How to increase the size and width of the plot in the main panel.
  • How to center the plot within in the main panel.
  • How to add a title to the top of your {shiny} app.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 50: James-Stein Estimator for MLB Batting Averages

We made it! 50 episodes in one year!

To celebrate, Ellis Hughes and I take the baseball data we used in Episode 49 and build a James-Stein Estimator to attempt to estimate a hitter’s true batting average given some number of observations (at bats).

After building estimates for the players we show how you can combine {gt} tables with {ggplot2} figures in {patchwork} to produce a figure like this:

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

NOTE: This isn’t the first time I’ve applied the James-Stein Estimator to this sort of problem. Nearly 2 years ago I wrote an R tutorial on this approach, comparing it to a Bayesian approach (CLICK HERE).