Author Archives: Patrick

TidyX Episode 21: Pairs Plots for Exploring Data

Pairs plots are plots of data in a matrix format to allow one to visualize as many numeric relationships as they would like. Oftentimes the data are plotted as scatter plots, however, this format can be extended to include other visualizations such as histograms, boxplots, and even correlation coefficients.

This week, Ellis Hughes and I explore these types of plots using the {palmerpenguins} data set pulled together by Allison Horst for the TidyTuesday Project.

This is a really fun data set for starting out in R because the data is very clean and has a few columns with nice relationships that can be exploited with statistical models. Ellis and I discuss some potential use cases for modeling this data and then we create a number of plots of the data to show various statistical relationships. We create pair plots using the {GGally} package and then we build some interactive {plotly} graphics with the data and explain how to build interactive visualizations of regression coefficients from a linear model.

Finally, we wrap up by going through the code of Roman Link. Roman created his own package, {corrmorant}, for creating pairs plots. This package offers a ton of flexibility and allows you to style the visualization in a more customized way. We had a lot of fun playing around with the package and the final project that Roman created using the {palmerpenguins} data set was this:

To listen to the screen cast, CLICK HERE.

To check out our code, CLICK HERE.

TidyX Episode 20: Special Guest David Robinson

This week, Ellis Hughes and I are joined by someone who has had a big influence on both of us, David Robinson!

For those that don’t know, David has been the developer of several R packages, such as {broom}, {tidytext}, {fuzzyjoin}, and {widyr}. Additionally, each week David does a live screen cast of himself working through the TidyTuesday data set from scratch. For anyone that has never watched these, they are excellent. David covers a lot of ground in 60 minutes of these live screen casts and shows you how he quickly extracts as much information as possible from a data set that he is seeing for the first time ever.

Today, we walk through one of David’s TidyTuesday screen casts where he does some text analysis of a data set consisting of cocktail ingredients. The screen cast features the use of his newest R package, {widyr}. After some exploratory data analysis and data cleaning, David calculates correlations between categorical variables (phi coefficient) and shows us how to plot the results in a network graph. The screen cast wraps up with David showing us how {widyr} can be used for Principal Components Analysis and then a short discussion on David’s journey into data science and how blogging and public work can be incredibly valuable for developing your professional network and career.

To check out the screen cast, CLICK HERE.

To check out David’s blog, CLICK HERE.

R Tips & Tricks: Building a Shiny Training Load Dashboard

In TidyX Episode 19 we discussed a way of building a dashboard with the {formattable} package. The dashboard included both a data table and a small visual using {sparkline}. Such a table is great when you need to make a report for a presentation but there are times when you might want to have something that is more interactive and flowing. In the sports science setting, this often comes in the form of evaluating athlete training loads. As such, I decided to spin up a quick {shiny} web app to show how easy it is to create whatever you want without sinking a ton of money into an athlete management system.

The code is available on my GITHUB page.

Packages & Custom Functions

Before doing anything, always start by loading the packages you will need for your work. In this tutorial, we will using the {tidyverse} and {shiny} packages, so be sure to install them if you haven’t already. I also like to set my plot theme to classic so that I get rid of grid lines for the {ggplot2} figures that I create.

Finally, I also wrote a custom function for calculating a z-score. This will come in handy when we go to visualize our data.

# custom function for calculating z-score
z_score <- function(x){
  z = (x - mean(x, na.rm = T)) / sd(x, na.rm = T)
  return(z)
}

 

Data

Next we simulate a bunch of fake data for a basketball team so that we have something to build our {shiny} app against. We will simulate total weekly training loads for 10 athletes across three different positions (fwd, guard, center), for a 3 week pre-season and a 15 week in-season.

 

set.seed(55)
athlete <- rep(LETTERS[1:10], each = 15)
position <- rep(c("fwd", "fwd", "fwd", "fwd", "guard", "guard", "guard", "center", "center", "center"), each = 15)
week <- rep(c("pre_1", "pre_2", "pre_3", 1:12), times = 10)
training_load <- round(rnorm(n = length(athlete), mean = 1500, sd = 350), 1)

df <- data.frame(athlete, position, week, training_load)
df$flag <- factor(df$flag, levels = c("high", "normal", "low"))
df$week <- factor(df$week, levels = c("pre_1", "pre_2", "pre_3", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12")) 

df %>% head()

The first few rows of the data look like this:

Shiny App

There are two components to building a {shiny} app:

1) The user interface, which defines the input that the user specifies on the web app.
2) The server, which reacts to what the user does and then produces the output that the user sees.

The user interface I will define as tl_ui (training load user interface). I start by creating a fluid page (since the user is going to be defining what they want). I set the parameters of what the user can control. Here, I provide them with the ability to select the position group of interest and then the week(s) of training they would like to see in their table and plot.

The server, which I call tl_server (training load server), is a little more involved as we need to have it take an input from the user interface , get the correct data from our stored data frame, and then produce an output. This part can be a bit involved. Below you can see the code and I’ll try and articulate a few of the main things that are taking place.

1) First, I get the data for the visualization the user wants. I’m using our simulated data and I calculate the z-score for each individual athlete, using the custom function I wrote. After that I create my flag (±1 SD from the individual athlete’s mean) and then I finally filter the data specific to the inputs provided in the user interface. This last part is critical. If I filter before applying my z-score and flags, then the mean and SD used to calculate the z-score will only be relative to the data that has been filtered down. This probably isn’t what we want. Rather, we will want to calculate our z-score using all data, season-to-date. So we retain everything and perform our calculation and then we filter the data.

2) After getting the data, I build my plot using {ggplot2}.

3) Then I create another data set specific to the table weeks of training selected by the user. This time, we will retain the raw values (non-standardized) for each athlete as the user may want to see raw values in a table. Notice that I also pivot that data using the pivot_wider() function as the data is currently stored in a long format with each row representing a new week for the athlete. Instead, I’d like the user to see this data across their screen, with each week representing a new column. As the user selects new weeks of data to visualize our server will tell {shiny} to populate the next column for that athlete.

4) Once I have the data we need, I simply render the table.

Those 2 steps complete the building of our {shiny} app!

Finally, the last step is to run the {shiny} app using the shinyApp() function which takes two arguments, the name of your user interface (tl_ui) and the name of your server (tl_server). This will open up a shiny app in a new window within R Studio that you can then open in your browser.

Visually, you can see what the app looks like below. If you want to see it in action, aside from getting my code and running it yourself, I put together this short video >> R Tips & Tricks – Training Load Dashboard

Conclusion

As you can tell, it is pretty easy to build and customize your own {shiny} app. I’ll admit, the syntax for {shiny} can get a bit frustrating but once you get the hang of it, like most things, it isn’t that bad. Also, unless you are doing really complex tasks, most of the syntax for something as simple as a training load dashboard wont be too challenging. Finally, such a quick and easy method not only allows you to customize things exactly as you want them, but it also saves you money from having to purchase an expensive athlete management system.

As always, you can obtain the code for this tutorial on my GITHUB page.

TidyX Episode 19: {formattable} for Dashboard Design

This week, Ellis Hughes and I explain coded written by Lauren Pandori, who built a nice dashboard for data on astronaut missions provided by the TidyTuesday Project.

Lauren had a few questions for us regarding other elements she wanted to add to her dashboard so Ellis and I tackle those and how you how you can build your own dashboard using the {formattable} and {sparkline} packages.

The end product is a dashboard that has trend lines, conditional formatting, and conditional bar plots all in one. The nice thing is that we show you how you can save the dashboard to an HTML so that it is easy to share with colleagues and co-workers. Finally, during the data exploration phase we walk through a cool way to use {plotly} to visualize trend lines in an interactive manner.

To watch our screen cast, CLICK HERE.

To get our code, CLICK HERE.

R Tips & Tricks: Force-Velocity-Power Profile Graphs in R Shiny

A colleague of mine was asking about how he could produce plots of force-velocity-power profiles for his coaches based on their Gymaware testing. Rather than plotting static plots for this stuff it is sometimes easier to build out a nice shiny app so that the coaches can interact with it or the practitioner can quickly change between players or position groups when giving a presentation to the staff.

All code and data is available at my GITHUB page (R Script and Data).

This tutorial will cover:

1) Building polynomial regression to represent the average team trend
2) Iterative approaches to building static plots
3) Iterative approaches to building Shiny web apps

This tutorial has a number of different iterations of plots and web apps so working through the R-code on your own is advised so you can see how every step is performed.

Data

After loading the two required packages, {tidyverse} and {shiny}, we load in the data set and we see that it consists of Force, Power, and Velocity values across 5 different external loads for 3 different athletes:

If you want to use the R-script to produce reports for yourself going forward just ensure that your data has the above columns (named the same way, since R is case-sensitive). If you have data with different column names, your two choices are: (1) change my code to match the column names of your data; or, (2) once you pull the data into R change the columns names in your code to match mine.

Average Trend Line (2nd Order Polynomial)

Eventually, we are going to want to compare our athletes to the team average (or position group average if your sport is more heterogeneous). This type of data is often modeled as a. 2nd order polynomial regression. Thus, we will build this type of regression to predict both Velocity and Power from Force. Once I have these two regressions built I can create a data frame that consists of a sequence of Force values (from the minimum to the maximum observed in the team’s data) and predict the velocity and power at each unit of force.

fit_velo <- lm(Velocity ~ poly(Force, 2), data = fv_profile)
fit_power <- lm(Power ~ poly(Force, 2), data = fv_profile)

avg_tbl <- data.frame(Force = seq(from = min(fv_profile$Force),
                                  to = max(fv_profile$Force),
                                  by = 0.5))

avg_tbl$Velocity_Avg <- predict(fit_velo, newdata = avg_tbl)
avg_tbl$Power_Avg <- predict(fit_power, newdata = avg_tbl)
colnames(avg_tbl)[1] <- "Force_Grp"

 

Static Plots

I’ll walk through a few iterations of static plots. See the GITHUB code to walk through how these were produced.

1) All athletes with an average team trend line

This plot gives us a sense for the trend in Velocity as it relates to increasing amounts of force. Below you will see one with a solid trend line and one with a dashed trend line. Feel free to use which ever one you prefer.

2) Team trend for Velocity and Power

The common way this type of data is visualized in the sports science literature it is a bit tricky in R because it requires a dual y-axis. To obtain this, within my ggplot call I shrink Power down to a scale more similar to Velocity (the main y-axis) by dividing it by 1000. Then when I call the second y-axis with the sec_axis() function I multiply Power by 1000 to put it back on its normal scale.

3) Accounting for individuals

The above plots are a look at the entire team (in this case only 3 athletes). However, we may want to look at the individuals more explicitly. As such, we build four plots to show individual differences:

1) All athletes all on the same plot with their corresponding individualized trend lines (NOTE: if you have a lot of athletes, this plot can get pretty busy and ultimately become useless).
2) Plot each individual as a facet to look at the athletes separately.
3) Create the same plot as #2 but add in the group average trend line (which we created using our 2nd order polynomial regression) to allow us to compare each athlete to the groupx.
4) Plot each individual as a facet with velocity and power on separate y-axes.

Shiny App Development

The still figures above are nice but making things interactive is much more useful. I have four {shiny} web iterations that we can go through. Again, using the R code while reading this will help you understand what is going on under the hood. Additionally, you’ll want to run these in R so that R can open up the webpage on your computer and allow you to play with the app. Below is just still shots of each app iteration.

1) Version 1: Independent Player Plots

This version allows you to select one player at a time.

2) Version 2: Add Players to Facets

This version lets you select however many players you want and it will add them in as facets. This is a useful app if you are presenting to the staff and want to select or de-select several players to show how they compare to each other.

3) Version 3: Same as Version 2 but add Power to the Plot on a second y-axis

4) Version 4: Combine all plot details

Version 4 combines everything from this tutorial in one single web app. You can select the  force-velocity-power profile for individual athletes (added to the plot as facets) and see the team average trend line (added to the plot as a dashed line) for velocity and power, to allow you to make comparisons between each player and to the group average.

Conclusion

{tidyverse} makes it incredibly easy to manipulate data and quickly iterate plots to our liking while {shiny} offers an easy way for us to turn our plots into an interactive webpage.

Again, for the code and data see my GITHUB page (R Script and Data).