Category Archives: TidyX Screen Cast

TidyX 51: Building Statistical Models into Shiny Apps

We’ve done a fair bit of {shiny} over the past year, detailing different approaches to provide an interactive data environment for end users. However, one thing we haven’t done yet is talk about embedding statistical models into your {shiny} apps. This week, Ellis Hughes and I do just that.

We create a random forest classifier, which uses body size features to predict the probability that a penguin is one of three different species and embed this into a {shiny} web app. For this, we will be using the penguins data set form the {palmerpenguins} R package.

In addition to from making an interactive statistical app, we cover a few other nuances of {shiny} app development:

  • Organizing the user components horizontally across the top of the app versus the commonly used sidebar panel.
  • How to increase the size and width of the plot in the main panel.
  • How to center the plot within in the main panel.
  • How to add a title to the top of your {shiny} app.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 50: James-Stein Estimator for MLB Batting Averages

We made it! 50 episodes in one year!

To celebrate, Ellis Hughes and I take the baseball data we used in Episode 49 and build a James-Stein Estimator to attempt to estimate a hitter’s true batting average given some number of observations (at bats).

After building estimates for the players we show how you can combine {gt} tables with {ggplot2} figures in {patchwork} to produce a figure like this:

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

NOTE: This isn’t the first time I’ve applied the James-Stein Estimator to this sort of problem. Nearly 2 years ago I wrote an R tutorial on this approach, comparing it to a Bayesian approach (CLICK HERE).

TidyX 49: rowwise simulation of MLB Batting Average, {gt} table & sparklines

This week, Ellis Hughes and I continue our work on simulation methods in R by introducing the rowwise function from the {tidyverse} package. We use this function to do simulations of batting average at the player level. Basically, the function allows us to easily apply a simulation to each row, which represents one player’s seasons, of the data set. We then build our results into a {gt} table and use sparklines.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

 

 

 

TidyX 48: Monte Carlo Simulation for NBA Match Ups

Last week, we used an R optimizer to build a model for predicting game outcomes in the NHL. This week, Ellis Hughes and I continue on that work and build a Monte Carlo Simulation for forecasting NBA games. We use the model to obtain the probability that one team beats the other and then we extract the estimated margin of victory from our simulation and reflect the entire distribution of estimated values, rather than just a single point estimate.

To watch the screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX 47: NHL Win Probability, R optimizer, & gt tables

This week, Ellis Hughes and I discuss using an optimization algorithm in R to find team strength ratings for the NHL 2019-2020 season. We show how to then use the results from these ratings to forecast the probability that one team wins over another while accounting for the home ice edge. Finally, we output the team strength ratings into a {gt} table.

To watch the screen cast, CLICK HERE.

To access our code, CLICK HERE.