One of the less glamorous tasks in data science is data cleaning. Because data can come in many different forms, it is off “dirty” and requires some level of treatment prior to analysis. One of the more complex data cleaning tasks is working with strings and regular expressions.
Regular expressions can both look intimidating and daunting as parsing strings requires a lot of weird looking characters. For example, look at this regular expression Joe Cheng shared on Twitter recently:
As such, this week on TidyX, Ellis Hughes and I begin a series on regular expressions and start with Regular Expressions 101. We cover:
- Searching strings for key words
- Manipulating the string
- Extracting components of the string
- Splitting the string based on a specific character
- Regular expression anchors
- Matching within the string
To watch the screen cast, CLICK HERE.
To access our code, CLICK HERE.