This week, Ellis Hughes and I finish cleaning our ugly excel data. In parts 1 and 2 we concentrated on:
- Bringing the data into R
- Reading multiple excel tabs
- Cleaning up merged columns
- Transforming the data into a long format
- Dealing with the structure of an excel sheet were pieces of information are contained in different areas on the sheet
- Organizing the data into a logical data frame for analysis
- Writing a function to automate all of the above processes across all excel tabs
This week, handle the last portion of our ugly excel data — dealing with dates and the training duration column.
- Dates can be tricky and, if you recall from previous episodes, R read them in as excel numeric values. So we discuss how to turn the numeric values into actual date values by understanding the origin argument within the as.Date() function.
- The training duration column in our ugly data file had various different values recorded in it (1:05, :30, 45, etc.). So, we have to figure out a way to handle this character string and covert it into a numeric value, representing the minutes of training that the individual performed.
To watch our screen cast, CLICK HERE.
To access our code, CLICK HERE.