TidyX Episode 142: Storing data in for loops – An example using k-fold cross validation

for() loops are a necessary evil – you probably wont love writing them but they come in handy. I often find myself writing a lot of for() loops and viewers of our screen cast often have questions about writing for() loops. This week, Ellis Hughes and I discuss four different approaches to storing your data from a for() loop so that you can use it for downstream projects (e.g., data models, visualizations, etc.). We do this with an example of creating a for() loop for k-fold cross validation.

K-fold cross validation is a useful component of the model building process, allowing you to split the data into K number of folds. Each fold serves as a one time hold-out set whereby a model is constructed on the rest of the data and that model is then used to predict on the hold-out fold. This continues until each k-fold has been held out and predicted on. The model performance is then compared across each of the folds. This type of iterative approach to model building makes k-fold cross validation a good candidate for a for() loop.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.