Category Archives: Sports Science

Validity, Reliability, & Responsiveness — A few papers on measurement in sport science

I had the pleasure of speaking at the National Strength and Conditioning Association‘s (NSCA) National Conference this summer and while there I made it a point to attend the Sport Science & Performance Technology Special Interest Group meeting as well.

One thing that immediately stood out to me was the number of questions raised specific to what types of technologies to purchase (e.g. “Which brand of force plates should we buy?”, “Does anyone have a list comparing and contrasting different technologies so that we can determine what would be best for us?”, etc.).

While these are fine questions, I do feel they are a bit like putting the cart before the horse. Before thinking about what technology to purchase, we should spend a good bit of time gaining clarity on the question(s) we are attempting to answer. Once we have a firm understanding of the question we can then begin the process of evaluating whether a technology exists that can help us collect the necessary data to explore that question. In fact, this was the main crux of my lecture at the conference, as I spoke about using the PPDAC Framework in practice (I wrote an article about this framework a couple of years ago).

A force plate, a GPS unit, or an accelerometer won’t solve all of our problems. In fact, depending on our question, they might not solve any of our problems! Moreover, as sport scientists we need to concern ourselves not only with the research question but, also whether the desired technology is useful within our ecological setting. Just because something worked in a controlled lab environment or was valid in a different sport does not mean it will be useful for our sport, or in our setting, or with our athletes, or given our unique constraints.

So, I decided to share a few resources pertaining to measurement theory concepts such as validity, reliability, and responsiveness/sensitivity for those working in the sport science space who are interested in more critical approaches to evaluating the technology we use in practice.

Additionally, for those interested, several years ago I wrote a full R code blog for the last paper above (Swinton et al) ,which can he accessed HERE.

Happy reading!

The High Performance Hockey Podcast Interview

This week, I had the great pleasure of being interviewed by my good friend and colleague Anthony Donskov for his High Performance Hockey Podcast.

Anthony has done a tremendous job for the sports science and strength and conditioning community in his teaching, writing, and podcasting. He brings a wealth of knowledge from both the applied strength coach realm all the way through to his PhD work.

In this podcast interview, Anthony and I discuss:

  • Data analysis
  • The PPDAC Framework for conducting research
  • My criticisms of applied sport science
  • The challenge of measuring hard things and things that matter in applied sport.

Check out the podcast HERE.

R Tips & Tricks: Normalizing test dates & calculating test differences

A friend of mine was downloading some force plate data from the software provider so that he could evaluate test data in a few of his athletes during return to play. The issue he was running into was that the different athletes all had different numbers of tests and different start and end testing times. The software exports the test outputs by date and he was wondering how he could normalize the dates to numeric values (e.g. Test 1, Test 2, etc.) so that he could model the date (since we can’t really use a Date in a regression model).

I’ll be the first to admit that working with dates and times can be an incredible pain in the butt. For reference, I covered the topic of converting Catapult GPS practice duration strings to actual training minutes, HERE. To help him out, I provided a few different solutions depending on the research question. I also add some code for calculating changes in test performance between tests and from each test to baseline.

The full code is available on my GITHUB page.

Loading Packages & Simulating Data

## load packages ----------------------------------------------
library(tidyverse)
library(lubridate)

## Simulate data ----------------------------------------------
set.seed(78)
dat <- tibble(
  
  athlete = rep(c("Tom", "Bob", "Franklin"), times = c(5, 10, 3)),
  test_dates = c(
    seq(as.Date("2023-01-01"), as.Date("2023-01-5"), by = "days"),
    seq(as.Date("2023-02-15"), as.Date("2023-02-24"), by = "days"),
    as.Date(c("2023-01-19", "2023-01-30", "2023-02-26"))
  ),
  jump_height = round(rnorm(n = 18, mean = 28, sd = 2.5), 1)
  
)

dat


We can see that Tom has 5 tests, Bob has 10, and Franklin has only 3. Additionally, Tom and Bob tested every day, consecutively, while Franklin was less compliant and has larger time frames between his tests.

Create a test number

First, let’s normalize the Dates so that they are numeric. Basically, instead of dates we want a value indicating whether the test was test 1, or test 5, or test N. We will do this by creating a row_number() id/counter for each individual athlete.

### Create a test number ------------------------------------------
dat <- dat %>%
  group_by(athlete) %>%
  mutate(test_day = row_number())

dat

Calculating the time between tests

Alternatively, we may not just want to know the test number of each test but we may want to determine the amount of days between each test.

The code to do this is a bit ugly looking so let’s unpack it.

  1. Since we are dealing with dates we use the difftime() function which takes an argument for the two times you are looking to calculate the difference between. Here, we are trying to calculate the difference in time (days) between one date and the date preceding it for each individual athlete.
  2. The difftime() function will produce a to time variable. If we want to make this numeric we need to convert it to a character so we do that with the as.character() function.
  3. Once the variable is a character we use the as.numeric() function to convert it to a numeric value.
  4. Finally, since the first value for each athlete will be an NA, since there is no date preceding the first test, we use the coalesce() function to fill in a 0 value for each of the NA’s, to indicate that this was the first test and thus there was no time between it and any other test.
### Calculate the time between tests -------------------------------
dat <- dat %>%
  group_by(athlete) %>%
  mutate(time_btw_tests = coalesce(as.numeric(as.character(difftime(test_dates, lag(test_dates)))), 0))

dat

Notice that Tom and Bob have 1 day between all of their tests while Franklin’s second test was 11 days after his first and his third test was 27 days after his second.

Calculate the difference in jump height from one test to the next

### Calculate difference in jump height from one day to the next -------------------
dat <- dat %>%
  group_by(athlete) %>%
  mutate(test_to_test_diff = jump_height - lag(jump_height))

dat

Here, we use the lag() function to calculate the difference in one value from the value before it within in the same column. Since we grouped by athlete, which is what we want, their first test will always have an NA, in this new column, since there was no test preceding it.

Calculating the difference in jump height from the baseline test

Finally, we might also be interested to evaluate the performance on each test relative to the athlete’s baseline test. To do this we simply subtract jump_height from the jump_height indexed in row one for each athlete.

### Calculate difference in jump height from each test to the baseline test -------------

dat <- dat %>%
  group_by(athlete) %>%
  mutate(test_to_baseline_diff = jump_height - jump_height[1])

dat

Wrapping Up

Dates and times are always tricky to deal with. Most of the sports technology providers will proved data as dates (or unix timestamps) meaning that we have to do some cleaning of the data to codify the dates as numeric values representing the test number or the days between tests (depending on the research question). Additionally, using lag functions can be helpful for calculating he difference from one test to the next or from each test to the baseline.

The entire code is available on my GITHUB page.

If you have any data cleaning issues that you are dealing with from various sports science technologies, feel free to reach out!

Displaying Tables & Plots Together

A common question that I get asked is for a simple way of displaying tables and plots together in the same one-page report. Most in the sport science space that are new to R will copy and paste their plot and table outputs into a word document and then share that with their colleagues. But, this creates extra work — copying, pasting, making sure you don’t mess up and forget to paste the latest plot, etc. So, today’s blog article will walk through a really easy way to create a single page document for combining tables and plots into a single report, which you can save to PDF or jpeg directly from RStudio. This same approach is also useful for researchers looking to combine tables and plots into a single figure for publication. I’ll show how to do this using both ggarange() and {patchwork}.

As always, the full code is available on my GITHUB page.

Load Libraries and Set a Plotting Theme

### Load libraries
library(tidyverse)
library(ggpubr)
library(gridExtra)
library(patchwork)
library(broom)
library(palmerpenguins)

## set plot theme
theme_set(theme_classic() +
            theme(axis.text = element_text(size = 11, face = "bold"),
                  axis.title = element_text(size = 13, face = "bold"),
                  plot.title = element_text(size = 15),
                  legend.position = "top"))

 

Load Data

We will use the {palmerpenguins} data that is freely available in R.

## load data
data("penguins")
d <- penguins %>%
  na.omit()

 

Build the Plots & Table

First we will build our plots. We are going to create two plots and one table. The table will store the information from a linear regression which regresses bill length on flipper length and penguin sex. The plots will be our visualization of these relationships.

## Create Plots
plt1 <- d %>%
  ggplot(aes(x = flipper_length_mm, y = bill_length_mm)) +
  geom_point(aes(fill = sex),
             size = 4,
             shape = 21,
             color = "black",
             alpha = 0.5) +
  geom_smooth(method = "lm",
              aes(color = sex)) +
  scale_fill_manual(values = c("female" = "green", "male" = "blue")) +
  scale_color_manual(values = c("female" = "green", "male" = "blue")) +
  labs(x = "Flipper Length (mm)",
       y = "Bill Length (mm)",
       title = "Bill Length ~ Flipper Length")


plt2 <- d %>%
  ggplot(aes(x = sex, y = bill_length_mm)) +
  geom_violin(alpha = 0.5,
              aes(fill = sex)) +
  geom_boxplot(width = 0.2) +
  geom_jitter(alpha = 0.5) +
  labs(x = "Sex",
       y = "Bill Length (mm)",
       title = "Bill Length Conditional on Penguin Gender")


## Create table
fit <- d %>%
  lm(bill_length_mm ~ flipper_length_mm + sex, data = .) %>%
  tidy() %>%
  mutate(across(.cols = estimate:statistic,
                ~round(.x, 3)),
         term = case_when(term == "(Intercept)" ~ "Intercept",
                          term == "flipper_length_mm" ~ "Flipper Length (mm)",
                          term == "sexmale" ~ "Male"))

Convert the table into a ggtextable format

Right now the table is in a tibble/data frame format. To get it into a format that is usable within the display grid we will convert it to a ggtextable and use some styling to make it look pretty.

## Build table into a nice ggtextable() to visualize it
tbl <- ggtexttable(fit, rows = NULL, theme = ttheme("blank")) %>%
  tab_add_hline(at.row = 1:2, row.side = "top", linewidth = 2) %>%
  tab_add_hline(at.row = 4, row.side = "bottom", linewidth = 3, linetype = 1)

Display the Table and Plots using ggarrange

We simply add our plot and table elements to the ggarrange() function and get a nice looking report.

## Plots & Table together using ggarange()
ggarrange(plt1, plt2, tbl,
          ncol = 2, nrow = 2)

Display the Table and Plots using patchwork

We can accomplish the same goal using the {patchwork} package. The only trick here is that we can’t pass a ggarrange element into patchwork. We need to convert the table into a tableGrob() to make this work. A tableGrob() is simple a way for us to capture all of the information that is required for the table structure we’d like. Also note that we can pass the same tableGrob() into ggarrange above and it will work.

## Plots & Table together using patchwork
# Need to build the table as a tableGrob() instead of ggtextable
# to make it work with patch work
tbl2 <- tableGrob(fit, rows = NULL, theme = ttheme("blank")) %>%
  tab_add_hline(at.row = 1:2, row.side = "top", linewidth = 2) %>%
  tab_add_hline(at.row = 4, row.side = "bottom", linewidth = 3, linetype = 1)

Now we wrap the tableGrob and our plots into the wrap_plots() function and we are all set!

# now visualize together
wrap_plots(plt1, plt2, tbl2, 
           ncol = 2,
           nrow = 2)

Wrapping Up

Instead of copying and pasting tables and plots into word, try using these two simple approaches to creating a single report page that stores all of the necessary information that you colleagues need to see!

All of the code is available on my GITHUB page.

Using randomized controlled trials in the sports medicine and performance environment: is it time to reconsider and think outside the methodological box?

I recently had the chance to work on a fun view point paper for the Journal of Orthopaedic & Sports Physical Therapy about ideas around analyzing data in the applied sports and rehab environments. While randomized controlled trials are considered a gold standard in medicine, the applied environment is a bit messy due to the lack of ability to control a host of factors and having the daily cadence and structure dictated by coaches and other decision-makers.

Given these constraints, practitioners often lament that, “Research deals with group analysis but I deal with N-of-1!”. Indeed, it can be challenging to sometimes see the connection between group-based research and the person standing in front of you, whose performance and health you are in charge of managing. I discussed this issue a bit back in 2018 with Aaron Coutts, Richard Pruna, and Allan McCall, in our paper Putting the ‘i’ back inĀ  team, where we laid out some approaches to handling individual-based analysis.

In this recent view point myself and a group of great collaborators (Garrett Bullock, Tom Hughes, Charles A Thigpen, Chad E Cook, and Ellen Shanley) discuss ideas around natural experiments and N-of-1 methodology as it applies to the sports and rehabilitation environments.

Using randomized controlled trials in the sports medicine and performance environment: is it time to reconsider and think outside the methodological box?