TidyX 66: Cleaning ugly excel data, Part 3

This week, Ellis Hughes and I finish cleaning our ugly excel data. In parts 1 and 2 we concentrated on:

  • Bringing the data into R
  • Reading multiple excel tabs
  • Cleaning up merged columns
  • Transforming the data into a long format
  • Dealing with the structure of an excel sheet were pieces of information are contained in different areas on the sheet
  • Organizing the data into a logical data frame for analysis
  • Writing a function to automate all of the above processes across all excel tabs

This week, handle the last portion of our ugly excel data — dealing with dates and the training duration column.

  • Dates can be tricky and, if you recall from previous episodes, R read them in as excel numeric values. So we discuss how to turn the numeric values into actual date values by understanding the origin argument within the as.Date() function.
  • The training duration column in our ugly data file had various different values recorded in it (1:05, :30, 45, etc.). So, we have to figure out a way to handle this character string and covert it into a numeric value, representing the minutes of training that the individual performed.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.