Category Archives: TidyX Screen Cast

TidyX Episode 142: Storing data in for loops – An example using k-fold cross validation

for() loops are a necessary evil – you probably wont love writing them but they come in handy. I often find myself writing a lot of for() loops and viewers of our screen cast often have questions about writing for() loops. This week, Ellis Hughes and I discuss four different approaches to storing your data from a for() loop so that you can use it for downstream projects (e.g., data models, visualizations, etc.). We do this with an example of creating a for() loop for k-fold cross validation.

K-fold cross validation is a useful component of the model building process, allowing you to split the data into K number of folds. Each fold serves as a one time hold-out set whereby a model is constructed on the rest of the data and that model is then used to predict on the hold-out fold. This continues until each k-fold has been held out and predicted on. The model performance is then compared across each of the folds. This type of iterative approach to model building makes k-fold cross validation a good candidate for a for() loop.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 141: Function Factories

Two episodes ago, Ellis Hughes and I briefly discussed function factories — a function that returns a function — and we got a few questions about how to use them and how to construct them. The good new is that you’ve probably been using function factories in your normal R coding and never knew it! This week we go a little deeper into function factories and answer some viewer questions about them.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 140: Data restructuring via splits

This week, Ellis Hughes and I answer a viewer question about data restructuring using splits in the data.

The viewer has some data sets that include information about competitions, the name of the athlete, the team the athlete is on, and the athlete gender. The goal was to help the viewer write some code that could output this information into a clean looking list with all of the information arranged appropriately within each competition.

Ellis and I both approached it (and interpreted it) in different ways, so you get to see multiple different ways of solving the problem.

To watch the screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 139: Custom z-score function with pre-specified population parameters

This week, Ellis Hughes and I work through a custom function for calculating z-scores conditional on specific population parameters.

Often, when building models, it is common to normalize the data to get all of the features onto the same scale. Occasionally we are dealing with a situation where we want to scale the most recent data (e.g., this year’s data) with the mean and standard deviation of prior year’s data. Another example would be scaling our testing set to the mean and standard deviation of a training set. Thus, we create a custom function to handle this task. We go step-by-step through our process of construction the function and show the errors along the way and the iterations we went through.

To watch the screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 137: Magically Multiplying Tabbed Reports

This week, Ellis Hughes and I answer a view question about plotting in different tabs within an {Rmarkdown} report. The issue the viewer was running into was that they had a lot of groups that they wanted to create the same plot for and have a tab representing each of those individual groups. Rather than copying and pasting the tab code hundreds of times (and risk messing up and not changing the data for one of the groups) we walk through a quick approach to have R do this for you in a loop. In literally 20 lines of code you can plot as many tabs as you’d like!

To watch the screen cast, CLICK HERE.

To access our code, CLICK HERE.