{"id":2144,"date":"2021-10-28T06:39:54","date_gmt":"2021-10-28T06:39:54","guid":{"rendered":"http:\/\/optimumsportsperformance.com\/blog\/?p=2144"},"modified":"2021-10-28T12:51:17","modified_gmt":"2021-10-28T12:51:17","slug":"r-tips-tricks-excel-within-column-iteration-in-r-part-2","status":"publish","type":"post","link":"https:\/\/optimumsportsperformance.com\/blog\/r-tips-tricks-excel-within-column-iteration-in-r-part-2\/","title":{"rendered":"R Tips &#038; Tricks: Excel Within Column Iteration in R (part 2)"},"content":{"rendered":"<p><span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/r-tips-tricks-recreating-within-column-iteration-as-you-would-do-in-excel\/\">Earlier this week<\/a><\/strong><\/span> I shared a method for doing within column iteration in R, as you might do in excel. For example, you want to create a new column that requires a starting value from a different column (or a 0 value) and then requires new values within that column to iterate over the prior values. The way I handled this was to use the <strong>accumuate()<\/strong> function available in the <strong>{tidyverse}<\/strong> package.<\/p>\n<p>The article got some good feedback and discussion on Twitter. For example, <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/twitter.com\/thomas_mock\">Thomas Mock<\/a><\/span><\/strong>, provided some examples of using the {<strong>slider}<\/strong> package to handle window functions, see <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/twitter.com\/thomas_mock\/status\/1452650389090799640\">HERE<\/a><\/span><\/strong> and <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/gist.github.com\/jthomasmock\/95f2bb15d1d0f4c195276f7d083be892\">HERE<\/a><\/span><\/strong>. The package looks to be very handy and easy to use. I&#8217;m going have to play around with it some more.<\/p>\n<p>Someone else asked, <strong><em>&#8220;how might we do this in a for() loop?&#8221;<\/em><\/strong><\/p>\n<p>It&#8217;s a good question. Sometimes you might need to use base R or sometimes the for() loop might be easier. So, let&#8217;s walk through an example:<\/p>\n<p><strong><span style=\"text-decoration: underline;\">Simulate Data<\/span><\/strong><\/p>\n<p>First, we need to simulate a basic data set:<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nlibrary(tidyverse)\r\n\r\ndf &lt;- tibble(\r\n  id = 1:5,\r\n  val = c(5,7,3,4,2)\r\n)\r\n\r\ndf\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.09.25-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2145\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.09.25-PM.png\" alt=\"\" width=\"142\" height=\"140\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Setting Up the Problem<\/strong><\/span><\/p>\n<p>Let&#8217;s say we want to create a new value that applies a very simple algorithm:<\/p>\n<p style=\"text-align: center;\"><em><strong>New Value = observed + 2 * lag(new value)<\/strong><\/em><\/p>\n<p>Putting the above data in excel the formula and answer looks like this:<\/p>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.13.32-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2146\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.13.32-PM.png\" alt=\"\" width=\"420\" height=\"198\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.13.32-PM.png 522w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.13.32-PM-300x141.png 300w\" sizes=\"auto, (max-width: 420px) 100vw, 420px\" \/><\/a><\/p>\n<p>Notice that the first new value starts with our initial observation (5) and then begins to iterate from there.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Writing the for() loop<\/strong><\/span><\/p>\n<p>for() loops can sometimes be scary but if you sequentially think through what you are trying to do you can often come up with a reasonable solution. Let&#8217;s step through this one:<\/p>\n<ol>\n<li>\u00a0We begin outside of the for() loop by creating two elements. We create <strong>N<\/strong> which simply gives us a count of the number of observations in our <strong>val<\/strong> column and we create <strong>new_val<\/strong> which is nothing more than a place holder for the new values we will create. Notice that the <strong>new_val<\/strong> place holder starts with the first element of the <strong>df$val <\/strong>column because, remember, we need to begin the new column with our first value observation in the <strong>val<\/strong> column. After that, I simply concatenate a bunch of NA values that will be populated with the new values that the for() loop will produce. Notice that I have NA repeat for\u00a0<strong>N-1<\/strong> times. This is important, as <strong>N\u00a0<\/strong>represents the number of observations in the <strong>val<\/strong> column and since we&#8217;ve already put a place holder in for the first observation we need to remove one of the NA&#8217;s to ensure the <strong>new_val<\/strong> column will be the same length as the <strong>val <\/strong>column.<\/li>\n<li>Next, we create our loop. I specify that I want to iterate over all <strong>&#8220;i&#8221;<\/strong> iterations from 2 to N. Why 2? Because the first value is already specified, as discussed above. Inside the for() loop, for each iteration that the loop runs it will store the new value, <strong>&#8220;i&#8221;<\/strong> in the <strong>new_val<\/strong> vector we created above. The equation that we specified earlier is within the for loop and I use &#8220;<strong>i<\/strong>&#8221; to index the observations. For example, for the second observation, what the for() loop is doing is saying, <em><strong>df$val[2] + new_val[2 &#8211; 1]*2, <\/strong><\/em>and for the third time through the loop it says,<em> <strong>df$val[3] + new_val[3 &#8211; 1]*2<\/strong>, <\/em>etc. until it goes through all <strong>N <\/strong>observations. Everything in the brackets is simply specifying the row indexes.<\/li>\n<\/ol>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## We want to create a new value\r\n# New Value = observed + 2 * lag(new value)\r\n# The first value for the new value is the first observation in row one for value\r\n\r\nN &lt;- length(df$val)\r\nnew_val &lt;- c(df$val&#x5B;1], rep(NA, N-1))\r\n\r\nfor(i in 2:N){\r\n\r\n  new_val&#x5B;i] &lt;- df$val&#x5B;i] + new_val&#x5B;i - 1]*2\r\n  \r\n}\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p>Once the loop is done running we can simply attach the results to our data frame and see what it looks like:<\/p>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.25.33-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2148\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.25.33-PM.png\" alt=\"\" width=\"203\" height=\"180\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.25.33-PM.png 314w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.25.33-PM-300x266.png 300w\" sizes=\"auto, (max-width: 203px) 100vw, 203px\" \/><\/a><\/p>\n<p>Same results as the excel sheet!<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Wrapping this into a function<\/strong><\/span><\/p>\n<p>After seeing how the for() loop works, you might want to wrap it up into a function so that you don&#8217;t need to do the first steps of creating an element for the number of iterations and vector place holder. Also, having it in a function might be useful if you need to frequently use it for other data sets.<\/p>\n<p>We simply wrap all of the steps into a single function that takes an input of the data frame name and the value column that has your most recent observations. Run the function on the data set above and you will obtain the same output.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\niterate_column_func &lt;- function(df, val){\r\n  \r\n  N &lt;- length(df$val)\r\n  new_val &lt;- c(df$val&#x5B;1], rep(NA, N-1))\r\n  \r\n  for(i in 2:N){\r\n    \r\n    new_val&#x5B;i] &lt;- df$val&#x5B;i] + new_val&#x5B;i - 1]*2\r\n    \r\n  }\r\n  \r\n  df$new_val &lt;- new_val\r\n  return(df)\r\n}\r\n\r\niterate_column_func(df, val)\r\n<\/pre>\n<p><span style=\"text-decoration: underline;\"><strong><br \/>\nApplying the function to multiple subjects<\/strong><\/span><\/p>\n<p>What if we have more than one subject that we need to apply the function to?<\/p>\n<p>First, we simulate some more data:<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ndf_2 &lt;- tibble(\r\n  subject = as.factor(rep(1:10, each = 5)),\r\n  id = rep(1:5, times = 10),\r\n  val = round(runif(n = 50, min = 10, max = 20), 0)\r\n)\r\n\r\ndf_2\r\n<\/pre>\n<p>Next, I&#8217;m going to make a slight tweak to the function. I&#8217;m going to have the output get returned as a single column data frame.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\niterate_column_func &lt;- function(x){\r\n  \r\n  N &lt;- length(x)\r\n  new_val &lt;- c(x&#x5B;1], rep(NA, N-1))\r\n  \r\n  for(i in 2:N){\r\n    \r\n    new_val&#x5B;i] &lt;- x&#x5B;i] + new_val&#x5B;i - 1]*2\r\n    \r\n  }\r\n  \r\n  new_val &lt;- as.data.frame(new_val)\r\n  return(new_val)\r\n}\r\n<\/pre>\n<p>Now, I&#8217;m going to apply the custom function to my new data frame, with multiple subjects, using the <strong>group_modify()<\/strong> function in {<strong>tidyverse<\/strong>}. This function allows us to apply other functions to groups of subjects, iterating over them and producing a data frame as a result.<\/p>\n<p>&nbsp;<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nnew_df &lt;- df_2 %&gt;%\r\n  group_by(subject) %&gt;% \r\n  group_modify(~iterate_column_func(.x$val)) %&gt;%\r\n  ungroup()\r\n<\/pre>\n<p>Then, I simply bind this new data to the original data frame and we have our <strong>new_val<\/strong> produced within individual.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ndf_2 %&gt;%\r\n  bind_cols(new_df %&gt;% select(-subject)) %&gt;% as.data.frame()\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.35.52-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2149\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.35.52-PM.png\" alt=\"\" width=\"263\" height=\"409\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.35.52-PM.png 374w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2021\/10\/Screen-Shot-2021-10-27-at-11.35.52-PM-193x300.png 193w\" sizes=\"auto, (max-width: 263px) 100vw, 263px\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Conclusion<\/strong><\/span><\/p>\n<p>And there you go, within column iteration in R, just as you would do in excel. <span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/r-tips-tricks-recreating-within-column-iteration-as-you-would-do-in-excel\/\">Part 1<\/a><\/strong><\/span> covered an approach in {<strong>tidyverse<\/strong>} while Part 2 used for() loops in base R to accomplish the same task.<\/p>\n<p>The full code for this article is available on my <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/R-Tips-Tricks\/blob\/master\/Excel%20within%20column%20iteration%20for%20loop.R\">GitHub page<\/a><\/span><\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Earlier this week I shared a method for doing within column iteration in R, as you might do in excel. For example, you want to create a new column that requires a starting value from a different column (or a 0 value) and then requires new values within that column to iterate over the prior [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[45],"tags":[],"class_list":["post-2144","post","type-post","status-publish","format-standard","hentry","category-r-tips-tricks"],"_links":{"self":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2144","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/comments?post=2144"}],"version-history":[{"count":3,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2144\/revisions"}],"predecessor-version":[{"id":2152,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2144\/revisions\/2152"}],"wp:attachment":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/media?parent=2144"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/categories?post=2144"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/tags?post=2144"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}