Author Archives: Patrick

TidyX Episode 118: Calculating summary statistics using window functions

Ellis Hughes and I continue working on data cleaning and data engineering processes in {tidyverse}. This week, we go over various ways of creating window functions using the {zoo} package. These types of functions are very useful when dealing with signal processing data such as GPS, accelerators, or force plates. In our example, we use the Lahman baseball database to get the home runs per season for Albert Pujols (who recently hit his 700th career HR!) to show windows of rolling average and standard deviation.

If you have any data cleaning or engineering problems that are giving your trouble, feel free to reach out to us and maybe we can work on the problem in one of our upcoming episodes!

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX Episode 117: Creating Unique Participant IDs & Observation Groups in tidyverse

This week, Ellis Hughes and I go back to our {tidyverse} roots to do a series of episodes covering a number of different data engineering and data cleaning types of activities we commonly do.

This week, we show how to set up unique participant IDs using {tidyverse} and then we use integer division to produce groupings of observations by participant so that data can be aggregated for research purposes.

If you have a data engineering or data cleaning issue you are currently working through in {tidyverse} and want some help, feel free to comment on the YouTube channel and we can see about working on it for an upcoming episode.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

R Tips & Tricks: Write Data to Separate Excel Sheet Tabs

When working with colleagues and managers, sometimes they prefer to have data or analysis written out to an excel workbook so that they can look at the information and sort or filter it, however they see fit. Rather than saving each output to a single csv file and then spending time copying and pasting them all into a single excel sheet (different data outputs on different tabs), let’s see how we can write everything out to a multi-tab excel sheet from R in one shot using the {openxlsx} package.

Data

We will simulate some sports data. In this example, we have two data sources:

  1. Data about teams
  2. Data about players
### Libraries -----------------------
library(tidyverse)
library(openxlsx)


### Create Data -----------------------

teams <- data.frame(
  team = c("Bats", "Sharks", "Tigers", "Bisons"),
  stat1 = round(rnorm(n = 4, mean = 100, sd = 20), 1),
  stat2 = round(rnorm(n = 4, mean = 250, sd = 70), 1),
  stat3 = round(rnorm(n = 4, mean = 0, sd = 3), 2)
)

players <- data.frame(
  player = c("Tom", "Joe", "Karl", "Bob", "Ben", "Albert", "Simon", "Harold", "Ken", "Cal"),
  cool_stat1 = round(rnorm(n = 10, mean = 300, sd = 10), 1),
  cool_stat1 = round(rnorm(n = 10, mean = 20, sd = 5), 1),
  cool_stat1 = round(rnorm(n = 10, mean = 500, sd = 50), 1),
  cool_stat1 = round(rnorm(n = 10, mean = 10, sd = 15), 1)
)

 

Store the data frames in a named list

Next, we need to store each of the data frames we want in our excel output into a named list. Naming of the elements in the list is important as these will be the names given to the excel tabs.

## Store each data frame as a named list -----------------------
list_of_dataframes <- list(teams = teams,
                   players = players)

 

Write the Data to an Excel Workbook

Once we have the data sets we are interested in structured in a named list we are ready to use the {openxlsx} package to write the data to a multi-tab excel workbook.

First, we create an empty excel workbook object to store our results.

## Create an empty excel workbook object -----------------------
blank_excel <- createWorkbook()

 

Next, we loop over our list of data sets and store each data set on its own named tab.

## loop over the list storing each data set on its own tab in the blank excel workbook -----------
Map(function(df, tab_name){     
  
  addWorksheet(blank_excel, tab_name)
  writeData(blank_excel, tab_name, df)
  }, 
  
  list_of_dataframes, names(list_of_dataframes)
)

 

If you’ve done it correctly, in your R console you should see an output indicating each of the data sets that were placed into the workbook.

 

 

Finally, we want to save the results into an excel workbook that we can email out to our co-workers. We use the saveWorkbook() function to save the results to our working directory.

## Save the now populated excel workbook --------------------------------------------
saveWorkbook(blank_excel, file = "league_data.xlsx", overwrite = TRUE)

 

When we open the excel  workbook we see that we have both tabs, properly named for the data they contain.