{"id":2031,"date":"2021-06-23T22:43:27","date_gmt":"2021-06-23T22:43:27","guid":{"rendered":"http:\/\/optimumsportsperformance.com\/blog\/?p=2031"},"modified":"2021-06-23T23:59:21","modified_gmt":"2021-06-23T23:59:21","slug":"doing-things-in-python-that-you-would-normally-do-in-excel","status":"publish","type":"post","link":"https:\/\/optimumsportsperformance.com\/blog\/doing-things-in-python-that-you-would-normally-do-in-excel\/","title":{"rendered":"Doing things in Python that you would normally do in Excel"},"content":{"rendered":"<p>Learning a new coding language is always a challenge. One thing that helps me is to create a short tutorial for myself of some of the basic data tasks that I might do when I initially sit down with a data set. I try and think through this in relationship to stuff I might have done (a long, long time ago) in Excel with regards to summarizing data, adding new features, and creating pivot tables.<\/p>\n<p>Since I&#8217;m not that great in Python, here is my <em><strong>Doing things in Python that you would normally do in Excel<\/strong><\/em> tutorial that may help others looking to get started with this coding language.<\/p>\n<p>The tasks that I cover are:<\/p>\n<ol>\n<li>Exploring features of the data<\/li>\n<li>Sorting the columns<\/li>\n<li>Filtering the columns<\/li>\n<li>Creating new features<\/li>\n<li>Calculating summary statistics<\/li>\n<li>Building pivot tables<\/li>\n<li>Data visualization<\/li>\n<\/ol>\n<p>The data I use comes form the <span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/pypi.org\/project\/pybaseball\/\">pybaseball<\/a><\/strong><\/span> library, freely available for install in python. I&#8217;ll be using the <em><strong>pitchers<\/strong><\/em> dataset from years 2012 to 2016.<\/p>\n<p>The entire jupyter notebook is accessible on my <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Python-Tips-and-Tricks\/blob\/master\/Doing%20things%20in%20Python%20that%20you%20would%20normally%20do%20in%20Excel.ipynb\">GitHub page<\/a><\/span><\/strong>.<\/p>\n<p><em><strong>Libraries and Data<\/strong><\/em><\/p>\n<p>The libraries I use are:<\/p>\n<ul>\n<li><strong>pandas <\/strong>&#8212; for working with data frames<\/li>\n<li><strong>numpy <\/strong>&#8212; for additional computational support<\/li>\n<li><strong>matplotlib &amp; seaborn <\/strong>&#8212; for plotting data<\/li>\n<li><strong>pybaseball <\/strong>&#8212; for data acquisition<\/li>\n<\/ul>\n<p>I called the data set <em><strong>pitchers<\/strong><\/em><em>. <\/em>The data consists of 408 rows and 334 columns. After doing a bit of exploring the data set (seeing how large it is, checking columns for NA values, etc), we begin by sorting the columns.<\/p>\n<p><strong>Sort Columns<\/strong><\/p>\n<p>Sorting columns is done by calling the name of the data set and using the <strong>sort_values()<\/strong> function, passing it the column you&#8217;d like to sort on (in this case, sorting alphabetically by pitcher name)<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n## Sort the data by pitcher name\r\n\r\npitchers.sort_values(by='Name')\r\n\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.49.27-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2034\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.49.27-PM.png\" alt=\"\" width=\"1022\" height=\"421\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.49.27-PM.png 1022w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.49.27-PM-300x124.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.49.27-PM-768x316.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.49.27-PM-624x257.png 624w\" sizes=\"auto, (max-width: 1022px) 100vw, 1022px\" \/><\/a><\/p>\n<p>If you have a specific direction you&#8217;d like to sort by, set the <em><strong>&#8216;ascending&#8217; <\/strong><\/em>argument to either True or False. In this case, setting <em><strong>ascending = False<\/strong><\/em> allows us to sort the ERA from highest to lowest (descending order).<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n## Sort the data by ERA from highest to lowest\r\n\r\npitchers.sort_values(by = 'ERA', ascending = False)\r\n\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.50.12-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2035\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.50.12-PM.png\" alt=\"\" width=\"1011\" height=\"558\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.50.12-PM.png 1011w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.50.12-PM-300x166.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.50.12-PM-768x424.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.50.12-PM-624x344.png 624w\" sizes=\"auto, (max-width: 1011px) 100vw, 1011px\" \/><\/a><\/p>\n<p><em>(For additional sorting examples, see the <strong><span style=\"color: #0000ff;\">GitHub<\/span><\/strong> code)<\/em><\/p>\n<p><strong>Filtering Columns<\/strong><\/p>\n<p>Filtering can be performed by explicitly stating the value you&#8217;d like to filter on within the square brackets. Here, I call the data frame (pitchers) and add the <strong>.loc <\/strong>function after the data frame name in order to access the rows that are specific to the condition of interest. Here, I&#8217;m only interested in looking at the 2012 season. Additionally, I only want a few columns (rather than the 334 from the full data set). As such, I specify those columns AFTER the comma within the square brackets. Everything to the left of the comma is specific to the rows I want to filter (Season == 2012) and everything to the right of the comma represents the columns of interest I&#8217;d like returned.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n\r\n## Filter to only see the 2012 season\r\n# Keep only columns: Name, Team, Age, G, W, L, WAR, and ERA\r\n\r\nseason2012 = pitchers.loc&#x5B;pitchers&#x5B;'Season']== 2012, &#x5B;'Name', 'Team', 'Age', 'G', 'W', 'L', 'WAR', 'ERA']]\r\nseason2012.head()\r\n\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.03-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-2036\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.03-PM-1024x285.png\" alt=\"\" width=\"625\" height=\"174\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.03-PM-1024x285.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.03-PM-300x84.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.03-PM-768x214.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.03-PM-624x174.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.03-PM.png 1034w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>If I only want to look at Clayton Kershaw over this time period, I can filter him out like so:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n\r\n## Filter Clayton Kershaw's seasons\r\n# Keep only columns: Season, Name, Team, Age, G, W, L, WAR, and ERA\r\n# arrange the data set from earliest season to latest\r\n\r\nkershaw = pitchers.loc&#x5B;pitchers&#x5B;'Name']=='Clayton Kershaw', &#x5B;'Season', 'Name', 'Team', 'Age', 'G', 'W', 'L', 'WAR', 'ERA']].sort_values('Season', ascending = True)\r\nkershaw.head()\r\n\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.57-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2037\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.57-PM.png\" alt=\"\" width=\"1004\" height=\"252\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.57-PM.png 1004w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.57-PM-300x75.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.57-PM-768x193.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.57.57-PM-624x157.png 624w\" sizes=\"auto, (max-width: 1004px) 100vw, 1004px\" \/><\/a><\/p>\n<p>To make the data set more palatable for the rest of the tutorial, I&#8217;m going to create a smaller data set, with fewer columns (<strong>pitchers_small<\/strong>).<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n\r\n## Create a smaller data set\r\n# Keep only columns: Season, Name, Team, Age, G, W, L, WAR, ERA, Start-IP, and Relief-IP\r\n# arrange the data set from earliest season to latest for each pitcher\r\n\r\npitchers_small = pitchers&#x5B;&#x5B;'Season', 'Name', 'Team', 'Age', 'G', 'W', 'L', 'WAR', 'ERA', 'Start-IP','Relief-IP']].sort_values(&#x5B;'Name', 'Season'], ascending = True)\r\npitchers_small.head(30)\r\n\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.58.43-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2038\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.58.43-PM.png\" alt=\"\" width=\"1012\" height=\"355\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.58.43-PM.png 1012w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.58.43-PM-300x105.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.58.43-PM-768x269.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-3.58.43-PM-624x219.png 624w\" sizes=\"auto, (max-width: 1012px) 100vw, 1012px\" \/><\/a><br \/>\n<strong>Creating New Features<\/strong><\/p>\n<p>I create three new features in the data set:<\/p>\n<ol>\n<li><strong><strong>A sequence counter that counts the season number of each pitcher from 1 to N seasons that they are in the data set.<\/strong><\/strong>This is done by simply grouping the data by the pitcher name and then cumulatively counting each row that the pitcher is seen in the data. Notice I add &#8220;+1&#8221; to the end of the code because python begins counter at &#8220;0&#8221;. <strong>NOTE: <\/strong>To make this work properly, ensure that the data is ordered by pitcher name and season. I did this at the end of my code, in the previous step.<\/li>\n<li><strong><strong>An &#8216;age group&#8217; feature that groups the ages of the pitchers in 5 year bins.<\/strong><\/strong>To accomplish this task, I use a 3 step process. First, I specify where I want the age bins to occur and assign it to the <strong>bins<\/strong> variable. I then create the labels I would like to correspond to each of the bins and assign that to the <strong>age_group<\/strong> variable. Then I use the <strong>np.select()<\/strong> function to combine this information, assigning it to a new column in my data set called <strong>&#8216;age_group<\/strong>&#8216;.<\/li>\n<li><strong><strong>A &#8216;pitcher type&#8217; feature that considers anyone who&#8217;s starter innings pitched was greater or equal to the median number of starting innings pitched as a &#8216;starter&#8217; and all others as &#8216;relievers&#8217;.<\/strong><\/strong>To create the <strong>pitcher_type<\/strong> column, I use the <strong>np.where() <\/strong>function, which works like <strong>ifelse()<\/strong> or <strong>case_when()<\/strong> in R or like <strong>IF()<\/strong> in excel. The first argument is the condition I&#8217;d like checked (<em>&#8220;did this pitcher have starter innings pitched that were greater than or equal to the median number of starter innings pitched?<\/em>). If the condition is met, the function will assign the pitcher in that row as a &#8220;starter&#8221;. If the condition is not met, then the pitcher in that row is designated as a &#8220;reliever&#8221;.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n## Add a sequence counter for each season for each pitcher\r\npitchers_small&#x5B;'season_id'] = pitchers_small.groupby(&#x5B;'Name']).cumcount() + 1\r\n\r\n## Create a new column called 'age_group'\r\n# create conditions for the age_group bins\r\nbins = &#x5B;\r\n    (pitchers_small&#x5B;'Age'] &amp;lt;= 25), (pitchers_small&#x5B;'Age'] &amp;gt; 25) &amp;amp; (pitchers_small&#x5B;'Age'] &amp;lt;= 30), (pitchers_small&#x5B;'Age'] &amp;gt; 30) &amp;amp; (pitchers_small&#x5B;'Age'] &amp;lt;= 35), (pitchers_small&#x5B;'Age'] &amp;gt; 35) &amp;amp; (pitchers_small&#x5B;'Age'] &amp;lt;= 40), (pitchers_small&#x5B;'Age'] &amp;gt; 40)\r\n]\r\n\r\n# create the age_group names to be assigned to each bin\r\nage_group = &#x5B;'&amp;lt;= 25', '25 to 30', '31 to 35', '36 to 40', '&amp;gt; 40']\r\n\r\n# add the age_group bins into the data\r\npitchers_small&#x5B;'age_group'] = np.select(bins, age_group)\r\n\r\n## Create a pitcher_type column which makes a distinction between starters and relievers\r\npitchers_small&#x5B;'pitcher_type'] = np.where(pitchers_small&#x5B;'Start-IP'] &amp;gt;= pitchers_small&#x5B;'Start-IP'].median(), 'starter', 'reliever') \r\n\r\npitchers_small.head()\r\n\r\n\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.11.33-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2039\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.11.33-PM.png\" alt=\"\" width=\"695\" height=\"150\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.11.33-PM.png 695w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.11.33-PM-300x65.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.11.33-PM-624x135.png 624w\" sizes=\"auto, (max-width: 695px) 100vw, 695px\" \/><\/a><br \/>\n<strong>Calculating Summary Statistics<br \/>\n<\/strong><\/p>\n<p>The <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Python-Tips-and-Tricks\/blob\/master\/Doing%20things%20in%20Python%20that%20you%20would%20normally%20do%20in%20Excel.ipynb\">GItHub repo<\/a><\/span><\/strong> for this post offers a few ways of obtaining the mean, standard deviation, and counts of values for different columns. For simplicity, I&#8217;ll show a convenient way to get summary stats over an entire data set using the <strong>describe()<\/strong> function, which is called following the name of the data frame and a period.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n\r\n## Get summary stats for each column\r\n\r\npitchers_small.describe()\r\n\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.23.31-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2040\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.23.31-PM.png\" alt=\"\" width=\"1006\" height=\"317\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.23.31-PM.png 1006w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.23.31-PM-300x95.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.23.31-PM-768x242.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.23.31-PM-624x197.png 624w\" sizes=\"auto, (max-width: 1006px) 100vw, 1006px\" \/><\/a><\/p>\n<p><strong>Pivot Tables<\/strong><\/p>\n<p>There are a few ways to create a pivot table in python. One way is to use the <strong>groupby()<\/strong> function for the class you&#8217;d like to summarize over and then call the mathematical operation (e.g., mean) you are interested in. I have an example of this in the <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Python-Tips-and-Tricks\/blob\/master\/Doing%20things%20in%20Python%20that%20you%20would%20normally%20do%20in%20Excel.ipynb\">GitHub post<\/a><\/span><\/strong> in code chuck 42 as well as several examples of other pivot table options. Another way is to use the <strong>pivot_table()<\/strong> function from the <strong>pandas<\/strong> library.<\/p>\n<p>Below is a pivot table of the mean and standard deviation of pitcher age across the 5 seasons in the data set.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n## Pivot Table of Average and Standard Deviation of Wins by Season\r\n# Round the results to 1 significant digit\r\n\r\nround(pitchers_small.pivot_table(values = &#x5B;'Age'], index = &#x5B;'Season'], aggfunc = (np.mean, np.std)), ndigits = 1)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.27.43-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2041\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.27.43-PM.png\" alt=\"\" width=\"1007\" height=\"304\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.27.43-PM.png 1007w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.27.43-PM-300x91.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.27.43-PM-768x232.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.27.43-PM-624x188.png 624w\" sizes=\"auto, (max-width: 1007px) 100vw, 1007px\" \/><\/a><\/p>\n<p>You can also make more complicated pivot tables by setting the columns to a second grouping variable, as one would do in Excel. Below, we look at the average WAR across all 5 seasons within the 5 age groups (which we created in the previous section).<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n## Calculate the average WAR per season across age group\r\n\r\npitchers_small.pivot_table(values = &#x5B;'WAR'], index = &#x5B;'Season'], columns = &#x5B;'age_group'], aggfunc = np.mean)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.30.07-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2042\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.30.07-PM.png\" alt=\"\" width=\"994\" height=\"292\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.30.07-PM.png 994w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.30.07-PM-300x88.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.30.07-PM-768x226.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.30.07-PM-624x183.png 624w\" sizes=\"auto, (max-width: 994px) 100vw, 994px\" \/><\/a><br \/>\n<strong>Data Visualization<br \/>\n<\/strong><\/p>\n<p>In the <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Python-Tips-and-Tricks\/blob\/master\/Doing%20things%20in%20Python%20that%20you%20would%20normally%20do%20in%20Excel.ipynb\">GitHub post<\/a><\/span><\/strong> I walk through how to plot 8 different plots:<\/p>\n<ol>\n<li>Histogram<\/li>\n<li>Density plots by group<\/li>\n<li>Boxplots by group<\/li>\n<li>Scatter plot<\/li>\n<li>Scatter plot highlighting groups<\/li>\n<li>Bar plots for counting values<\/li>\n<li>Line plot for a single player over time<\/li>\n<li>Line plots for multiple players over time<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.37.51-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-2043\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.37.51-PM-1024x715.png\" alt=\"\" width=\"625\" height=\"436\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.37.51-PM-1024x715.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.37.51-PM-300x210.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.37.51-PM-768x536.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.37.51-PM-624x436.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/06\/Screen-Shot-2021-06-23-at-4.37.51-PM.png 1778w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learning a new coding language is always a challenge. One thing that helps me is to create a short tutorial for myself of some of the basic data tasks that I might do when I initially sit down with a data set. I try and think through this in relationship to stuff I might have [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46],"tags":[],"class_list":["post-2031","post","type-post","status-publish","format-standard","hentry","category-python-tips-tricks"],"_links":{"self":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2031","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/comments?post=2031"}],"version-history":[{"count":6,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2031\/revisions"}],"predecessor-version":[{"id":2047,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2031\/revisions\/2047"}],"wp:attachment":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/media?parent=2031"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/categories?post=2031"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/tags?post=2031"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}