{"id":3118,"date":"2023-07-04T19:03:57","date_gmt":"2023-07-04T19:03:57","guid":{"rendered":"http:\/\/optimumsportsperformance.com\/blog\/?p=3118"},"modified":"2023-07-05T13:05:37","modified_gmt":"2023-07-05T13:05:37","slug":"simulations-in-r-part-1-functions-for-simulation-resampling","status":"publish","type":"post","link":"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-1-functions-for-simulation-resampling\/","title":{"rendered":"Simulations in R Part 1: Functions for Simulation &#038; Resampling"},"content":{"rendered":"<p>Simulating data is something I find myself doing all the time. Not only to explore uncertainty in data but also to explore model assumptions, understand how models behave under different circumstances, or to try and understand how a future analysis might work given some underlying data generating process. Thus, I decided to put together a series on simulations and resampling using R (I&#8217;ll also add a few analog scripts using Python to the <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Constructing-Simulations-Tutorial\">GitHub repository<\/a><\/span><\/strong>).<\/p>\n<p>In Part 1, I&#8217;ll provide some thoughts around why you might want to simulate or resample data and then show how you can simply do this in R. Additionally, I&#8217;ll walk through several helper functions for conducting and summarizing simulations\/resamples as well as some basics around <strong>for()<\/strong> and <strong>while()<\/strong> loops, as we will use these extensively in our simulation and resampling processes.<\/p>\n<p>My <span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Constructing-Simulations-Tutorial\">Github repository<\/a><\/strong><\/span> will contain all of the scripts in this series.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Why do we simulate or resample data?<\/strong><\/span><\/p>\n<ul>\n<li>The data generating process is what defines the properties of our data and dictates the type of distribution we are dealing with. For example, the mean and standard deviation reflect the two parameters of the data generating process for a normal distribution. We rarely know what the data generating process of our data is in the real world, thus we must infer it from our sample data. Both resampling and simulation offer methods of understanding the data generating process of data.<\/li>\n<li>Sample data represents a small sliver of what\u00a0<em>might<\/em> be occurring in the broader population. Using resampling and simulation, we are able to build larger data sets based on information contained in the sample data. Such approaches allow us to explore our uncertainty around what we have observed in our sample and the inferences we might be able to make about that larger population.<\/li>\n<li>Creating samples of data allows us to assess patterns in the data and evaluate those patterns under different circumstances, which we can directly program.<\/li>\n<li>By coding a simulation, we are able to reflect a desired data generating process, allowing us to evaluate assumptions or limitations of data that we have collected or are going to collect.<\/li>\n<li>The world is full of randomness, meaning that every observation we make comes with some level of uncertainty. The uncertainty that we have about the true value of our observation can be expressed via various probability distributions. Resamping and simulation are ways that we can mimic this randomness in the world and help calibrate our expectation about the probability of certain events or observations occurring.<\/li>\n<\/ul>\n<p><span style=\"text-decoration: underline;\"><strong>Difference between resampling and simulation<\/strong><\/span><\/p>\n<p>Resampling and simulation are both useful at generating data sets and reflecting uncertainty. However, they accomplish this task in different ways.<\/p>\n<ul>\n<li>Resampling deals with techniques that take the observed sample data and randomly draw observations from that data to construct a new data set. This is often done thousands of times, building thousands of new data sets, and then summary statistics are produced on those data sets as a means of understanding the data generating properties.<\/li>\n<li>Simulation works by assuming a data generating process (e.g., making a best guess or estimating a plausible mean and standard deviation for the population from previous literature) and then generating multiple samples of data, randomly, from the data generating process features.<\/li>\n<\/ul>\n<p><span style=\"text-decoration: underline;\"><strong>Sampling from common distributions<\/strong><\/span><\/p>\n<p>To create a distribution in R we can use any one of the four primary prefixes, which define the type of information we want returned about the distribution, followed by the suffix that defines the distribution we are interested in.<\/p>\n<p>Here is a helpful cheat sheet I put together for some of the common distributions one might use:<\/p>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/r_distributions.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3119\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/r_distributions-1024x357.png\" alt=\"\" width=\"671\" height=\"234\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/r_distributions-1024x357.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/r_distributions-300x104.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/r_distributions-768x267.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/r_distributions-624x217.png 624w\" sizes=\"auto, (max-width: 671px) 100vw, 671px\" \/><\/a><\/p>\n<p>Some examples:<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# The probability that a random variable is less than or equal to 1.645 has a cumulative density of 95% (CDF)\r\npnorm(q = 1.645, mean = 0, sd = 1)\r\n\r\n# What is the exact probability (PDF) that we flip 10 coins, with 50% chance of heads or tails, and get 1 heads?\r\ndbinom(x = 1, size = 10, prob = 0.5)\r\n\r\n# What is the z-score for the 95 percentile when the data is Normal(0, 1)?\r\nqnorm(p = 0.95, mean = 0, sd = 1)\r\n\r\n# randomly draw 10 values from a uniform distribution with a min of 5 and max of 10\r\nrunif(n = 10, min = 5, max = 10)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.43.13-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3120\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.43.13-AM-1024x309.png\" alt=\"\" width=\"668\" height=\"202\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.43.13-AM-1024x309.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.43.13-AM-300x91.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.43.13-AM-768x232.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.43.13-AM-624x189.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.43.13-AM.png 1694w\" sizes=\"auto, (max-width: 668px) 100vw, 668px\" \/><\/a><\/p>\n<p>We can completely simulate different distributions and properties of those distributions using these function. For several examples of different distributions see the <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Constructing-Simulations-Tutorial\/blob\/main\/Simulations%20in%20R%20Part%201%20-%20Functions%20for%20Simulation%20%26%20Resampling.Rmd\">GitHub code<\/a><\/span><\/strong>. Below is an example of 1,000 random observations from a normal distribution with a mean of 30 and standard deviation of 15 and plot the results..<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## set the seed for reproducibility\r\nset.seed(10)\r\nnorm_dat &lt;- rnorm(n = 1000, mean = 30, sd = 15)\r\n\r\nhist(norm_dat,\r\n     main = &quot;Random Simulation from a Normal Distribution&quot;,\r\n     xlab = &quot;N(30, 15^2)&quot;)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.46.17-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3121\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.46.17-AM-971x1024.png\" alt=\"\" width=\"517\" height=\"545\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.46.17-AM-971x1024.png 971w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.46.17-AM-284x300.png 284w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.46.17-AM-768x810.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.46.17-AM-624x658.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.46.17-AM.png 1022w\" sizes=\"auto, (max-width: 517px) 100vw, 517px\" \/><\/a><\/p>\n<p>We can produce a number of summary statistics on this vector of random values:<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# sample size\r\nlength(norm_dat)\r\n\r\n# mean, standard deviation, and variance\r\nmean(norm_dat)\r\nsd(norm_dat)\r\nvar(norm_dat)\r\n\r\n# median, median absolute deviation\r\nmedian(norm_dat)\r\nmad(norm_dat)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.47.52-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3122\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.47.52-AM.png\" alt=\"\" width=\"505\" height=\"444\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.47.52-AM.png 664w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.47.52-AM-300x264.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.47.52-AM-624x549.png 624w\" sizes=\"auto, (max-width: 505px) 100vw, 505px\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>for &amp; while loops<\/strong><\/span><\/p>\n<p>Typically, we are going to want to resample data more than once or to run multiple simulations. Often, we will want to do this thousands of times. We can use R to help us in the endeavor by programming <strong>for()<\/strong> and <strong>while()<\/strong> loops to do the heavy lifting for us and store the results in a convenient format (e.g., vector, data frame, matrix, or list) so that we can summarize it later.<\/p>\n<p><strong><em>for loops<\/em><\/strong><\/p>\n<p><strong>for()<\/strong> loops are easy ways to tell `R` that we want it to do some sort of task <em><span style=\"text-decoration: underline;\">for<\/span><\/em> a specified number of iterations.<\/p>\n<p>For example, let&#8217;s create a <strong>for()<\/strong> loop that adds 5 for every value from 1 to 10, <strong>for(i in 1:10)<\/strong>.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# program the loop to add 5 to every value from 1:10\r\nfor(i in 1:10){\r\n  \r\n  print(i + 5)\r\n  \r\n}\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.52.35-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3123\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.52.35-AM.png\" alt=\"\" width=\"514\" height=\"356\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.52.35-AM.png 812w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.52.35-AM-300x208.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.52.35-AM-768x532.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.52.35-AM-624x432.png 624w\" sizes=\"auto, (max-width: 514px) 100vw, 514px\" \/><\/a><\/p>\n<p>We notice that the result is printed directly to the console. If we are doing thousands of iterations or if we want to store the results to plot and summarize them later, this wont be a good option. Instead, we can allocate an empty vector or data frame to store these values.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## storing values as vector\r\nn &lt;- 10\r\nvector_storage &lt;- rep(NA, times = n)\r\n\r\nfor(i in 1:n){\r\n  vector_storage&#x5B;i] &lt;- i + 5\r\n}\r\n\r\nvector_storage\r\n\r\n## store results back to a data frame\r\nn &lt;- 10\r\ndf_storage &lt;- data.frame(n = 1:10)\r\n\r\nfor(i in 1:n){\r\n  df_storage$n2&#x5B;i] &lt;- i + 5\r\n}\r\n\r\ndf_storage\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.54.16-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3124\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.54.16-AM.png\" alt=\"\" width=\"137\" height=\"278\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.54.16-AM.png 208w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.54.16-AM-148x300.png 148w\" sizes=\"auto, (max-width: 137px) 100vw, 137px\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>while loops<\/strong><\/span><\/p>\n<p><strong>while()<\/strong> loops differ from <strong>for()<\/strong> loops in that they continue to perform a process <span style=\"text-decoration: underline;\"><em>while<\/em><\/span> some condition is met.<\/p>\n<p>For example, if we start with a count of 0 observations and continually add 1 observation we want to perform this process as long as the observations are below 10.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nobservations &lt;- 0\r\n\r\nwhile(observations &lt; 10){\r\n\tobservations &lt;- observations + 1\r\n\tprint(observations)\r\n} \r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.56.18-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3125\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.56.18-AM.png\" alt=\"\" width=\"400\" height=\"322\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.56.18-AM.png 592w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.56.18-AM-300x241.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a><\/p>\n<p>We can also use <strong>while()<\/strong> loops to test logical arguments.<\/p>\n<p>For example, let&#8217;s say we have five coins in our pocket and want to play a game with a fried where we flip a fair coin and every time it ends on heads (<em><strong>coin_flip == 1<\/strong><\/em>) we get a coin and every time it ends on tails we lose a coin. We are only willing to continue playing the game as long as retain between 3 and 10 coins.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## starting number of coins\r\ncoins &lt;- 5 \r\n\r\n## while loop \r\nwhile(coins &gt;= 3 &amp;&amp; coins &lt;= 10){\r\n\t\r\n  # flip a fair coin (50\/50 chance of heads or tails)\r\n\tcoin_flip &lt;- rbinom(1,1,0.5)\r\n\t\r\n\t# If the coin leads on heads (1) you win a coin and if it lands on tails (0) you lose a coin\r\n\tif(coin_flip == 1){\r\n\t  \r\n\t\tcoins &lt;- coins + 1\r\n\t\t\r\n\t\t}else{\r\n\t\t\tcoins &lt;- coins - 1\r\n\t\t}\r\n\t\r\n\t## NOTE: we only play while our winnings are between 3 and 10 coins\r\n\t\r\n\t# print the result\r\n\tprint(coins)\r\n}\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.58.27-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3126\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.58.27-AM-1024x666.png\" alt=\"\" width=\"625\" height=\"406\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.58.27-AM-1024x666.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.58.27-AM-300x195.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.58.27-AM-768x499.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.58.27-AM-624x406.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.58.27-AM.png 1470w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>You can run the code many times and find out, on average, how many flips you will get!<\/p>\n<p>Finally, we can also use <strong>while()<\/strong> loops if we are building models to minimize error. For example, lets say we have an <strong>error = 30<\/strong> and we want to continue running the code until we have minimized the error below 1. So, the code will run <strong>while(error &gt; 1)<\/strong>.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nerror &lt;- 30 while(error &gt; 1){\r\n  \r\n error &lt;- error \/ 2\r\n print(error)\r\n}\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.59.46-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3127\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.59.46-AM.png\" alt=\"\" width=\"274\" height=\"245\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.59.46-AM.png 392w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-11.59.46-AM-300x268.png 300w\" sizes=\"auto, (max-width: 274px) 100vw, 274px\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Helper functions for summarizing distributions<\/strong><\/span><\/p>\n<p>There are a number of helper functions in base R that can assist us in summarizing data.<\/p>\n<ul>\n<li><strong>apply()<\/strong> will return your results in a vector<\/li>\n<li><strong>lapply()<\/strong> will return your results as a list<\/li>\n<li><strong>sapply()<\/strong> can return the results as a vector or a list (if you set the argument `simplify = FALSE`)<\/li>\n<li><strong>tapply()<\/strong> will return your results in a named vector based on whichever grouping variable you specify<\/li>\n<\/ul>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## create fake data\r\nset.seed(1234)\r\nfake_dat &lt;- data.frame(\r\n  group = rep(c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;), each = 5),\r\n  x = rnorm(n = 15, mean = 10, sd = 2),\r\n  y = rnorm(n = 15, mean = 30, sd = 10),\r\n  z = rnorm(n = 15, mean = 75, sd = 20)\r\n)\r\n\r\nfake_dat\r\n\r\n#### apply ####\r\n# get the column averages\r\napply(X = fake_dat&#x5B;,-1], MARGIN = 2, FUN = mean)\r\n\r\n# get the row averages\r\napply(X = fake_dat&#x5B;,-1], MARGIN = 1, FUN = mean)\r\n\r\n#### lapply ####\r\n# Get the 95% quantile interval for each column\r\nlapply(X = fake_dat&#x5B;,-1], FUN = quantile, probs = c(0.025, 0.975))\r\n\r\n#### sapply ####\r\n# Get the standard deviation of each column in a vector\r\nsapply(X = fake_dat&#x5B;,-1], FUN = sd)\r\n\r\n# Get the standard deviation of each column in a list\r\nsapply(X = fake_dat&#x5B;,-1], FUN = sd, simplify = FALSE)\r\n\r\n#### tapply ####\r\n# Get the average of x for each group\r\ntapply(X = fake_dat$x, INDEX = fake_dat$group, FUN = mean)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-12.03.18-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3128\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-12.03.18-PM-1024x1021.png\" alt=\"\" width=\"625\" height=\"623\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-12.03.18-PM-1024x1021.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-12.03.18-PM-150x150.png 150w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-12.03.18-PM-300x300.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-12.03.18-PM-768x766.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-12.03.18-PM-624x622.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-04-at-12.03.18-PM.png 1308w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>We can alternatively do a lot of this type of data summarizing using the convenient R package {<strong>tidyverse<\/strong>}.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nlibrary(tidyverse)\r\n\r\n## get the mean of each numeric column\r\nfake_dat %&gt;%\r\n  summarize(across(.cols = x:z,\r\n                   .fns = ~mean(.x)))\r\n\r\n## get the mean across each row for the numeric columns\r\nfake_dat %&gt;%\r\n  rowwise() %&gt;%\r\n  mutate(AVG = mean(c_across(cols = x:z)))\r\n\r\n## Get the mean of x for each grou\r\nfake_dat %&gt;%\r\n  group_by(group) %&gt;%\r\n  summarize(avg_x = mean(x),\r\n            .groups = &quot;drop&quot;)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.48.48-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3131\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.48.48-AM-593x1024.png\" alt=\"\" width=\"593\" height=\"1024\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.48.48-AM-593x1024.png 593w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.48.48-AM-174x300.png 174w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.48.48-AM-768x1326.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.48.48-AM-624x1077.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.48.48-AM.png 774w\" sizes=\"auto, (max-width: 593px) 100vw, 593px\" \/><\/a><\/p>\n<p>Finally, another handy base R function is <strong>replicate()<\/strong>, which allows us to replicate a task <em><strong>n <\/strong><\/em>number of times.<\/p>\n<p>For example, let&#8217;s say we want to draw from a random normal distribution, <strong>rnorm()<\/strong> with a <strong>mean = 0<\/strong> and <strong>sd = 1<\/strong> but, we want to run this random simulation 10 times and get 10 different data sets. <strong>replicate()<\/strong>` allows us to do this and stores the results in a matrix with 10 columns, each with 10 rows of the random sample.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nreplicate(n = 10, expr = rnorm(n = 10, mean = 0, sd = 1))\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.50.58-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3132\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.50.58-AM-1024x243.png\" alt=\"\" width=\"625\" height=\"148\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.50.58-AM-1024x243.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.50.58-AM-300x71.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.50.58-AM-768x182.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.50.58-AM-624x148.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-05-at-5.50.58-AM.png 1776w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Wrapping Up<\/strong><\/span><\/p>\n<p>In this first part of my simulation and resampling series we went through some of the key functions in R that will help us build the scaffolding for our future work. In Part 2, we we dive into bootstrap resampling and simulating bivariate and multivariate normal distributions.<\/p>\n<p>All code is available in both rmarkdown and html format on my <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Constructing-Simulations-Tutorial\">Github page<\/a><\/span><\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Simulating data is something I find myself doing all the time. Not only to explore uncertainty in data but also to explore model assumptions, understand how models behave under different circumstances, or to try and understand how a future analysis might work given some underlying data generating process. Thus, I decided to put together a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[47,45],"tags":[],"class_list":["post-3118","post","type-post","status-publish","format-standard","hentry","category-model-building-in-r","category-r-tips-tricks"],"_links":{"self":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/3118","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/comments?post=3118"}],"version-history":[{"count":4,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/3118\/revisions"}],"predecessor-version":[{"id":3134,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/3118\/revisions\/3134"}],"wp:attachment":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/media?parent=3118"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/categories?post=3118"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/tags?post=3118"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}