Category Archives: Strength & Conditioning

Validity, Reliability, & Responsiveness — A few papers on measurement in sport science

I had the pleasure of speaking at the National Strength and Conditioning Association‘s (NSCA) National Conference this summer and while there I made it a point to attend the Sport Science & Performance Technology Special Interest Group meeting as well.

One thing that immediately stood out to me was the number of questions raised specific to what types of technologies to purchase (e.g. “Which brand of force plates should we buy?”, “Does anyone have a list comparing and contrasting different technologies so that we can determine what would be best for us?”, etc.).

While these are fine questions, I do feel they are a bit like putting the cart before the horse. Before thinking about what technology to purchase, we should spend a good bit of time gaining clarity on the question(s) we are attempting to answer. Once we have a firm understanding of the question we can then begin the process of evaluating whether a technology exists that can help us collect the necessary data to explore that question. In fact, this was the main crux of my lecture at the conference, as I spoke about using the PPDAC Framework in practice (I wrote an article about this framework a couple of years ago).

A force plate, a GPS unit, or an accelerometer won’t solve all of our problems. In fact, depending on our question, they might not solve any of our problems! Moreover, as sport scientists we need to concern ourselves not only with the research question but, also whether the desired technology is useful within our ecological setting. Just because something worked in a controlled lab environment or was valid in a different sport does not mean it will be useful for our sport, or in our setting, or with our athletes, or given our unique constraints.

So, I decided to share a few resources pertaining to measurement theory concepts such as validity, reliability, and responsiveness/sensitivity for those working in the sport science space who are interested in more critical approaches to evaluating the technology we use in practice.

Additionally, for those interested, several years ago I wrote a full R code blog for the last paper above (Swinton et al) ,which can he accessed HERE.

Happy reading!

The High Performance Hockey Podcast Interview

This week, I had the great pleasure of being interviewed by my good friend and colleague Anthony Donskov for his High Performance Hockey Podcast.

Anthony has done a tremendous job for the sports science and strength and conditioning community in his teaching, writing, and podcasting. He brings a wealth of knowledge from both the applied strength coach realm all the way through to his PhD work.

In this podcast interview, Anthony and I discuss:

Data analysis
The PPDAC Framework for conducting research
My criticisms of applied sport science
The challenge of measuring hard things and things that matter in applied sport.

Check out the podcast HERE.

R Tips & Tricks: Normalizing test dates & calculating test differences

A friend of mine was downloading some force plate data from the software provider so that he could evaluate test data in a few of his athletes during return to play. The issue he was running into was that the different athletes all had different numbers of tests and different start and end testing times. The software exports the test outputs by date and he was wondering how he could normalize the dates to numeric values (e.g. Test 1, Test 2, etc.) so that he could model the date (since we can’t really use a Date in a regression model).

I’ll be the first to admit that working with dates and times can be an incredible pain in the butt. For reference, I covered the topic of converting Catapult GPS practice duration strings to actual training minutes, HERE. To help him out, I provided a few different solutions depending on the research question. I also add some code for calculating changes in test performance between tests and from each test to baseline.

The full code is available on my GITHUB page.

Loading Packages & Simulating Data

## load packages ----------------------------------------------
library(tidyverse)
library(lubridate)

## Simulate data ----------------------------------------------
set.seed(78)
dat <- tibble(
  
  athlete = rep(c("Tom", "Bob", "Franklin"), times = c(5, 10, 3)),
  test_dates = c(
    seq(as.Date("2023-01-01"), as.Date("2023-01-5"), by = "days"),
    seq(as.Date("2023-02-15"), as.Date("2023-02-24"), by = "days"),
    as.Date(c("2023-01-19", "2023-01-30", "2023-02-26"))
  ),
  jump_height = round(rnorm(n = 18, mean = 28, sd = 2.5), 1)
  
)

dat

We can see that Tom has 5 tests, Bob has 10, and Franklin has only 3. Additionally, Tom and Bob tested every day, consecutively, while Franklin was less compliant and has larger time frames between his tests.

Create a test number

First, let’s normalize the Dates so that they are numeric. Basically, instead of dates we want a value indicating whether the test was test 1, or test 5, or test N. We will do this by creating a row_number() id/counter for each individual athlete.

### Create a test number ------------------------------------------
dat <- dat %>%
  group_by(athlete) %>%
  mutate(test_day = row_number())

dat

Calculating the time between tests

Alternatively, we may not just want to know the test number of each test but we may want to determine the amount of days between each test.

The code to do this is a bit ugly looking so let’s unpack it.

Since we are dealing with dates we use the difftime() function which takes an argument for the two times you are looking to calculate the difference between. Here, we are trying to calculate the difference in time (days) between one date and the date preceding it for each individual athlete.
The difftime() function will produce a to time variable. If we want to make this numeric we need to convert it to a character so we do that with the as.character() function.
Once the variable is a character we use the as.numeric() function to convert it to a numeric value.
Finally, since the first value for each athlete will be an NA, since there is no date preceding the first test, we use the coalesce() function to fill in a 0 value for each of the NA’s, to indicate that this was the first test and thus there was no time between it and any other test.

### Calculate the time between tests -------------------------------
dat <- dat %>%
  group_by(athlete) %>%
  mutate(time_btw_tests = coalesce(as.numeric(as.character(difftime(test_dates, lag(test_dates)))), 0))

dat

Notice that Tom and Bob have 1 day between all of their tests while Franklin’s second test was 11 days after his first and his third test was 27 days after his second.

Calculate the difference in jump height from one test to the next

### Calculate difference in jump height from one day to the next -------------------
dat <- dat %>%
  group_by(athlete) %>%
  mutate(test_to_test_diff = jump_height - lag(jump_height))

dat

Here, we use the lag() function to calculate the difference in one value from the value before it within in the same column. Since we grouped by athlete, which is what we want, their first test will always have an NA, in this new column, since there was no test preceding it.

Calculating the difference in jump height from the baseline test

Finally, we might also be interested to evaluate the performance on each test relative to the athlete’s baseline test. To do this we simply subtract jump_height from the jump_height indexed in row one for each athlete.

### Calculate difference in jump height from each test to the baseline test -------------

dat <- dat %>%
  group_by(athlete) %>%
  mutate(test_to_baseline_diff = jump_height - jump_height[1])

dat

Wrapping Up

Dates and times are always tricky to deal with. Most of the sports technology providers will proved data as dates (or unix timestamps) meaning that we have to do some cleaning of the data to codify the dates as numeric values representing the test number or the days between tests (depending on the research question). Additionally, using lag functions can be helpful for calculating he difference from one test to the next or from each test to the baseline.

The entire code is available on my GITHUB page.

If you have any data cleaning issues that you are dealing with from various sports science technologies, feel free to reach out!

Can I please be introduced to the Non-Applied Sport Scientist?

A recent discussion on Twitter spurred some thoughts that I had with respect to titles and roles in sport and in particular the title/role of Applied Sport Scientist.

@ScientistSport posed the following question:

It’s an interesting question to ponder. Given that sport science was originally born out of physiologists attempting to study human performance in Olympic sport athletes (which then eventually bled into team sport athletes) the question makes sense. Moreover, it seems like people generally think of sport science as something directed at helping the team “train better” – monitoring training loads, testing strength, power and conditioning, and even entering into the discussion of return to play following injury. Such a role has led many teams to employ an Applied Sport Scientist.

Titles in sport are weird. What does an Applied Sport Scientist do? What is the description of the role? More importantly, is there a Non-Applied Sport Scientist? If so, what are they doing?

Generally, when I’ve been introduced to the Applied Sport Scientist at a team when I’ve found is they are an assistant strength coach or assistant athletic trainer that has been tasked with turning on GPS units, conducting force plate jumps with the players, and coordinating the reports from the team’s Athlete Management System (AMS).

No doubt these are important tasks and critical to helping the staff plan and manage the team’s training! But, why is this a science role? What’s scientific about it? Is the individual ensuring data quality and integrity is being maintained before it is stored in the AMS? Is the individual conducting scientific inquiry of the data within the AMS to understand the measurements being made and determining if the measures are valid, reliable, or responsive? More importantly, how is the individual using the abundance of data being collected to answer larger questions that are relevant to the entire organization?

Perhaps the role shouldn’t be called Applied Sport Scientist? Maybe it should be Data Collection Coordinator or something more descriptive of the task at hand? Titles matter! They define what we do and how we do it. Again, if there is an Applied Sport Scientist is there a Non-Applied Sport Scientist? Maybe the latter is the one doing the real scientific work – identifying the pertinent research questions, planning applied science studies, structuring and establishing best practice data collection methods, analyzing data, and communicating the results to the end users.

What should the role of an Applied Sport Scientist be?

While some may feel like my argument is a bit pedantic here is why it matters.

The aim of the Applied Sport Scientist or the Sport Science Department should be to answer questions across the entire sporting organization. This shouldn’t simply be limited to matters of strength and conditioning. Rather, the goal should be to apply the scientific method to any and all questions in sport – training, return to play, performance evaluation, player acquisition, team tactics, etc. – and work at the intersection of such topics to provide analysis that helps the key stakeholders make decisions. A few colleagues and I wrote a paper about the parallels between Business Intelligence and Sports Science a few years ago <CLICK HERE>.

Science isn’t just a title; it is a framework and process for asking and answering questions. Or, as David Salsburg states, in his brilliant book The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century, “Science, we are often taught, is measurement. We make careful measurements and use them to find mathematical formulas that describe nature.” Consequently, someone that is given the title Applied Sport Scientist should actually have scientific training. The concept of framing a question, collecting data, doing basic statistics, knowing basic physiology and biomechanics, understanding how to run a simple reliability study, etc., are things that should be fundamental skills for this individual. Calling someone a Sport Scientist who doesn’t have these skills – even though they might be a really smart person and they might know a good deal about whatever technology they are using – is like calling me a strength coach. Sure, I can write a program and I can train and coach people. But, that’s probably not why you would hire me. Just as the strength coach can collect data and print reports, but you aren’t hiring them to conduct scientific investigations. You’re also not hiring the Physical Therapist to run the nutrition program.

Being smart and hardworking are important qualities in sport and everyone can help out in various areas of the organization. But titles should matter because they in some way define roles and responsibilities. The best organizations find the right people, with the right skill sets, to work together and create a super team.

As I like to say, Success boils down to four things:

Knowing what you know.
Working to be really good at what you know.
Knowing what you don’t know.
Knowing enough about what you don’t know to ask the right questions to get people in who can help you out with that thing.

Catapult GPS – Converting the practice duration string to minutes

One of the most frustrating things to deal with is date and time strings. Using Catapult GPS, a popular GPS provider for professional and collegiate sports teams, practice duration is reported in their export as a string, hours : minutes : seconds. Unfortunately, we can’t do much with this if we want to perform additional computations, for example calculate player load per minute, we need to convert this column into total minutes.

I’ve had a few people in the sports performance field reach out and ask how to do this in R because they often get frustrated and just resort to changing the data in their CSV download prior to importing it into R, where they then do their plotting and visualizing. Today, I’ll walk through a few steps using the {lubridate} package and show you how you can handle this data cleaning all within you R environment.

Load Packages & Get Data

We start by loading {tidyverse} and {lubridate} and some fake Catpault data that I’ve created.

### Packages ---------------------------------------
library(tidyverse)
library(lubridate)

### Load Data -------------------------------------
catapult <- read.csv("catapult_example.csv", header = TRUE) %>%
janitor::clean_names()

catapult

Adjusting time

We can see the duration string (hour : minute : second) indicating that the session was 97 minutes and 10 seconds long. Before handling the entire column of data, let’s just grab a single observation and work through the functions we need so that we know what is going on.

### Adjust Time ------------------------------------
# hms() function to split out duration to its component parts into a string
single_time <- catapult %>% 
  slice(1) %>% 
  pull(duration)

single_time

The hms() function can be used to convert each of the time components into a named string.

single_time2 <- hms(single_time)
single_time2

Once we have the individual components in a named string we can extract them out with the hour(), minute(), and second() functions and have each returned back as an integer.

# Select each component 
hour(single_time2)
minute(single_time2)
second(single_time2)

Once in integer form, converting this data to a total minutes value we first multiplying hour by 60 and divide second by 60 and then sum those up with minutes.

hour(single_time2)*60 + minute(single_time2) + second(single_time2)/60

The finished product suggests the session was 97.2 minutes long.

Applying the approach to all of our data

Now that we understand what is going on under the hood, we can apply this at scale, to our of our data.

catapult <- catapult %>%
  mutate(hour_min_sec = hms(duration),
    pract_time = hour(hour_min_sec) * 60 + minute(hour_min_sec) + second(hour_min_sec) / 60)

catapult

After getting practice time into minutes we will adjust the date column from a character string to an actual date, using the as.Date() function.

catapult$date <- as.Date(catapult$date, "%m/%d/%y")
catapult

To finish, we will do a bit of clean up and remove the duration and hour_min_sec columns, round the player_load and pract_time columns to one significant digit and create a player_load_per_min column.

catapult %>%
  select(-duration, -hour_min_sec) %>%
  mutate(across(.cols = player_load:pract_time,
                ~round(.x, 1)),
         player_load_per_min = round(player_load / pract_time, 2))

Now we have a cleaned data set that we can worth with!

Access to the full code is available on my GITHUB page.