TidyX 23: for loops & functions

This week, Ellis Hughes and I delve into our mailbag and answer a viewer’s question about writing for loops and custom functions. To explain these two topics we walk through our script on how to optimize the exponent in Bill James’ Pythagorean Win Formula for baseball.

To watch our screen cast, CLICK HERE.

To get our code, CLICK HERE.

If you have any questions or topics you’d like to have explained, feel free to reach out to us a tidy.explained@gmail.com or connect with us on Twitter at @tidy_explained.

TidyX 22: Dumbbell Plots for NBA Player Stats

This week, Ellis Hughes and I go over how to create dumbbell plots. Dumbbell plots are a nice way of showing differences between two time points or between two variables that are on the same scale within a group or within individual.

To explain dumbbell plots, we first go through code provided by Kelly Cotton, who used this data visualization technique on the latest TidyTuesday data set to show changes in energy production over time within different European countries.

Ellis and I then take Kelly’s approach and apply it to NBA players to show differences between their Assists per Game and Shots per Game within the 2019 season. A report we termed the “Ball Hog Report”.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 21: Pairs Plots for Exploring Data

Pairs plots are plots of data in a matrix format to allow one to visualize as many numeric relationships as they would like. Oftentimes the data are plotted as scatter plots, however, this format can be extended to include other visualizations such as histograms, boxplots, and even correlation coefficients.

This week, Ellis Hughes and I explore these types of plots using the {palmerpenguins} data set pulled together by Allison Horst for the TidyTuesday Project.

This is a really fun data set for starting out in R because the data is very clean and has a few columns with nice relationships that can be exploited with statistical models. Ellis and I discuss some potential use cases for modeling this data and then we create a number of plots of the data to show various statistical relationships. We create pair plots using the {GGally} package and then we build some interactive {plotly} graphics with the data and explain how to build interactive visualizations of regression coefficients from a linear model.

Finally, we wrap up by going through the code of Roman Link. Roman created his own package, {corrmorant}, for creating pairs plots. This package offers a ton of flexibility and allows you to style the visualization in a more customized way. We had a lot of fun playing around with the package and the final project that Roman created using the {palmerpenguins} data set was this:

To listen to the screen cast, CLICK HERE.

To check out our code, CLICK HERE.

TidyX Episode 20: Special Guest David Robinson

This week, Ellis Hughes and I are joined by someone who has had a big influence on both of us, David Robinson!

For those that don’t know, David has been the developer of several R packages, such as {broom}, {tidytext}, {fuzzyjoin}, and {widyr}. Additionally, each week David does a live screen cast of himself working through the TidyTuesday data set from scratch. For anyone that has never watched these, they are excellent. David covers a lot of ground in 60 minutes of these live screen casts and shows you how he quickly extracts as much information as possible from a data set that he is seeing for the first time ever.

Today, we walk through one of David’s TidyTuesday screen casts where he does some text analysis of a data set consisting of cocktail ingredients. The screen cast features the use of his newest R package, {widyr}. After some exploratory data analysis and data cleaning, David calculates correlations between categorical variables (phi coefficient) and shows us how to plot the results in a network graph. The screen cast wraps up with David showing us how {widyr} can be used for Principal Components Analysis and then a short discussion on David’s journey into data science and how blogging and public work can be incredibly valuable for developing your professional network and career.

To check out the screen cast, CLICK HERE.

To check out David’s blog, CLICK HERE.

R Tips & Tricks: Building a Shiny Training Load Dashboard

In TidyX Episode 19 we discussed a way of building a dashboard with the {formattable} package. The dashboard included both a data table and a small visual using {sparkline}. Such a table is great when you need to make a report for a presentation but there are times when you might want to have something that is more interactive and flowing. In the sports science setting, this often comes in the form of evaluating athlete training loads. As such, I decided to spin up a quick {shiny} web app to show how easy it is to create whatever you want without sinking a ton of money into an athlete management system.

The code is available on my GITHUB page.

Packages & Custom Functions

Before doing anything, always start by loading the packages you will need for your work. In this tutorial, we will using the {tidyverse} and {shiny} packages, so be sure to install them if you haven’t already. I also like to set my plot theme to classic so that I get rid of grid lines for the {ggplot2} figures that I create.

Finally, I also wrote a custom function for calculating a z-score. This will come in handy when we go to visualize our data.

# custom function for calculating z-score
z_score <- function(x){
  z = (x - mean(x, na.rm = T)) / sd(x, na.rm = T)
  return(z)
}

 

Data

Next we simulate a bunch of fake data for a basketball team so that we have something to build our {shiny} app against. We will simulate total weekly training loads for 10 athletes across three different positions (fwd, guard, center), for a 3 week pre-season and a 15 week in-season.

 

set.seed(55)
athlete <- rep(LETTERS[1:10], each = 15)
position <- rep(c("fwd", "fwd", "fwd", "fwd", "guard", "guard", "guard", "center", "center", "center"), each = 15)
week <- rep(c("pre_1", "pre_2", "pre_3", 1:12), times = 10)
training_load <- round(rnorm(n = length(athlete), mean = 1500, sd = 350), 1)

df <- data.frame(athlete, position, week, training_load)
df$flag <- factor(df$flag, levels = c("high", "normal", "low"))
df$week <- factor(df$week, levels = c("pre_1", "pre_2", "pre_3", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12")) 

df %>% head()

The first few rows of the data look like this:

Shiny App

There are two components to building a {shiny} app:

1) The user interface, which defines the input that the user specifies on the web app.
2) The server, which reacts to what the user does and then produces the output that the user sees.

The user interface I will define as tl_ui (training load user interface). I start by creating a fluid page (since the user is going to be defining what they want). I set the parameters of what the user can control. Here, I provide them with the ability to select the position group of interest and then the week(s) of training they would like to see in their table and plot.

The server, which I call tl_server (training load server), is a little more involved as we need to have it take an input from the user interface , get the correct data from our stored data frame, and then produce an output. This part can be a bit involved. Below you can see the code and I’ll try and articulate a few of the main things that are taking place.

1) First, I get the data for the visualization the user wants. I’m using our simulated data and I calculate the z-score for each individual athlete, using the custom function I wrote. After that I create my flag (±1 SD from the individual athlete’s mean) and then I finally filter the data specific to the inputs provided in the user interface. This last part is critical. If I filter before applying my z-score and flags, then the mean and SD used to calculate the z-score will only be relative to the data that has been filtered down. This probably isn’t what we want. Rather, we will want to calculate our z-score using all data, season-to-date. So we retain everything and perform our calculation and then we filter the data.

2) After getting the data, I build my plot using {ggplot2}.

3) Then I create another data set specific to the table weeks of training selected by the user. This time, we will retain the raw values (non-standardized) for each athlete as the user may want to see raw values in a table. Notice that I also pivot that data using the pivot_wider() function as the data is currently stored in a long format with each row representing a new week for the athlete. Instead, I’d like the user to see this data across their screen, with each week representing a new column. As the user selects new weeks of data to visualize our server will tell {shiny} to populate the next column for that athlete.

4) Once I have the data we need, I simply render the table.

Those 2 steps complete the building of our {shiny} app!

Finally, the last step is to run the {shiny} app using the shinyApp() function which takes two arguments, the name of your user interface (tl_ui) and the name of your server (tl_server). This will open up a shiny app in a new window within R Studio that you can then open in your browser.

Visually, you can see what the app looks like below. If you want to see it in action, aside from getting my code and running it yourself, I put together this short video >> R Tips & Tricks – Training Load Dashboard

Conclusion

As you can tell, it is pretty easy to build and customize your own {shiny} app. I’ll admit, the syntax for {shiny} can get a bit frustrating but once you get the hang of it, like most things, it isn’t that bad. Also, unless you are doing really complex tasks, most of the syntax for something as simple as a training load dashboard wont be too challenging. Finally, such a quick and easy method not only allows you to customize things exactly as you want them, but it also saves you money from having to purchase an expensive athlete management system.

As always, you can obtain the code for this tutorial on my GITHUB page.