Category Archives: TidyX Screen Cast

TidyX 126: Keeping duplicates when pivoting wider

Working more on data engineering/data cleaning steps, Ellis and I talk about the pivot_wider() function within {tidyverse}. One of the issues is that when pivoting a data set to a wide format, if you have duplicate rows in the id columnsĀ pivot_wider() will collapse them into a list. This may be problematic if you are needing to retain all rows in the original data set. Thus, Ellis and I discuss a method to pivot_wider() while retaining all rows of data, even when there are duplicate values in the id_cols.

To view the screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX 125: Combining Multiple Conditions – Follow Up

In our previous episode, Ellis Hughes and I discussed a few approaches to creating a new column in your data which records multiple conditions that are based on data in a different column.

This week, we are following up that episode because we got some great viewer feedback. One viewer provided an alternative approach to solving the problem and another viewer had a follow up question for a problem he was solving, which had to do with making some joins to a separate table.

If you have questions or want some help on data engineering/data cleaning issues you are dealing with, be sure to reach out by liking and subscribing on the YouTube channel and dropping us a comment. We will try and answer your question in an episode!

To watch the screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 124: Combining Multiple Conditions in a Single Column

Some of the best TidyX episodes are born out of questions from viewers or colleagues. This week, Ellis Hughes and I dip into the mailbag and answer a question from a colleague about identifying multiple conditions.

The colleague that had a data set with specific observations in one column. Their goal was to create a new column which put those observations into one of several conditional groups. The issue they were running into was that case_when() and ifelse() only output the first condition that is met. The colleague was looking for a way to append the conditional groups column with a comma separate string, so that some of the observations could match multiple conditional groups.

In this episode, we solve the problem in three different ways.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 123: Using crossing to build data sets for simulation or player tracking data analysis

This week, Ellis and I discuss the {tidyverse} function crossing() and show how it can be used to construct data sets of every possible combination of input variables (Cartesian product).

This function is very powerful when attempting to create data sets, in particular for simulation purposes or for building a data set of all paired permutations of model input variables to test a model’s predictions and evaluate how it behaves under every circumstance.

We end with a simple example of how to use crossing() and left_join() to build a data set for player tracking data that allows you to calculate the Euclidean distance between all players on the field/pitch/court/ice.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 122: Advanced Filtering with Event-Based Data

This week, Ellis and I talk through some advanced filtering techniques for dealing with event-based data. We talk through filtering data when dealing with time-to-event and survival analysis as well as when working with sensor data that requires you to filter between two discrete events of interest. For example, filtering all the accelerometer data occurring between events such as starting to walk and finishing the walk.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.