Author Archives: Patrick

TidyX 64: Cleaning ugly excel data, Part 1

Continuing on with the data cleaning theme of our previous 3 episodes, Ellis Hughes and I work through messy excel data that was simulated to look like something we’d see in practice.

We decided to take turns with this, to show how both of us might handle the problem. This week is my approach to handling the issue and next week will be Ellis approach.

Things that we cover:

Reading in excel files to R, selecting the tabs to read data from, and setting the data types
Turning messy data in a data frame that can be analyzed
Writing a for loop to operationalize the approach over a larger excel file, with many tabs

To access our code, CLICK HERE.

To watch the screen cast, CLICK HERE.

TidyX 63: Regex Lookarounds

Continuing with our series on regex expressions (TidyX 61: Regex 101, TidyX 62: Applied Regex), this week Ellis Hughes and I discuss regex lookarounds, as a way of setting new anchors when parsing text strings. We follow this up by taking the NBA play-by-play data and turning the text into minutes played by players in a game and then plot this data in a gantt chart.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX 62: Applied Regex

Last week, Ellis Hughes and I introduced some fundamentals of regular expressions in our Regular Expressions 101 screen cast. This week, we expand on those concepts and show how to enumerate regular expressions (regex) into a format that can be analyzed and visualized.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX 61: Regular Expressions 101

One of the less glamorous tasks in data science is data cleaning. Because data can come in many different forms, it is off “dirty” and requires some level of treatment prior to analysis. One of the more complex data cleaning tasks is working with strings and regular expressions.

Regular expressions can both look intimidating and daunting as parsing strings requires a lot of weird looking characters. For example, look at this regular expression Joe Cheng shared on Twitter recently:

As such, this week on TidyX, Ellis Hughes and I begin a series on regular expressions and start with Regular Expressions 101. We cover:

Searching strings for key words
Manipulating the string
Extracting components of the string
Splitting the string based on a specific character
Regular expression anchors
Matching within the string

To watch the screen cast, CLICK HERE.

To access our code, CLICK HERE.

TidyX 60: pitchf/x model evaluation

We’ve made it!

Over the past 7-weeks, Ellis Hughes and I have been working on various approaches to building a classification model of pitchf/x data using the {mlbgameday} package in R. We are finally ready to compare all of the models we’ve built, displaying the results in a conditionally formatted {gt} table, and discuss our findings.

To watch our screen cast, CLICK HERE.

To access our code, CLICK HERE.

For previous episodes in this series:

Patrick Ward, PhD

Patrick Ward, PhD | Sports Science & Analytics

Author Archives: Patrick

TidyX 64: Cleaning ugly excel data, Part 1

TidyX 63: Regex Lookarounds

TidyX 62: Applied Regex

TidyX 61: Regular Expressions 101

TidyX 60: pitchf/x model evaluation