{"id":3135,"date":"2023-07-06T18:35:52","date_gmt":"2023-07-06T18:35:52","guid":{"rendered":"http:\/\/optimumsportsperformance.com\/blog\/?p=3135"},"modified":"2023-07-06T18:35:52","modified_gmt":"2023-07-06T18:35:52","slug":"simulations-in-r-part-2-bootstrapping-simulating-bivariate-and-multivariate-distributions","status":"publish","type":"post","link":"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-2-bootstrapping-simulating-bivariate-and-multivariate-distributions\/","title":{"rendered":"Simulations in R Part 2: Bootstrapping &#038; Simulating Bivariate and Multivariate Distributions"},"content":{"rendered":"<p>In <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-1-functions-for-simulation-resampling\/\">Part 1<\/a><\/span><\/strong> of this series we covered some of the basic functions that we will need to perform resampling and simulations in R.<\/p>\n<p>In Part 2 we will now move to building bootstrap resamples and simulating distributions for both bivariate and multivariate relationships. Some stuff we will cover:<\/p>\n<ul>\n<li>Coding bootstrap resampling by hand for both a single variable (mean) and regression coefficients.<\/li>\n<li>Using the <strong>boot()<\/strong> function from the <strong>boot<\/strong> package to perform bootstrapping for both a single variable and regression coefficients.<\/li>\n<li>Creating a simulation where two variables are in some way dependent on each other (correlate).<\/li>\n<li>Creating a simulation where multiple variables are correlated with each other.<\/li>\n<li>Finally, to understand what is going on with these simulated distributions we will also work through code that shows us the relationship between variables using both covariance and correlation matrices.<\/li>\n<\/ul>\n<p>As always, all code is freely available in the <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Constructing-Simulations-Tutorial\">Github repository<\/a><\/span><\/strong>.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Resampling<\/strong><\/span><\/p>\n<p>The resampling approach we will use here is bootstrapping. The general concept of bootstrapping is as follows:<\/p>\n<ul>\n<li>Draw multiple random samples from observed data with replacement.<\/li>\n<li>Draws must be independent and each observation must have an equal chance of being selected.<\/li>\n<li>The bootstrap sample should be the same size as the observed data in order to use as much information from the sample as possible.<\/li>\n<li>Calculate the mean resampled data and store it.<\/li>\n<li>Repeat this process thousands of times and summarize the mean of resampled means and the standard deviation of resampled means to obtain summary statistics of your bootstrapped resamples.<\/li>\n<\/ul>\n<p><em><strong>Write the bootstrap resampling by hand<\/strong><\/em><\/p>\n<p>The code below goes through the process of creating some fake data and then writing a <strong>for()<\/strong> loop that produces 1000 bootstrap resamples. The <strong>for()<\/strong> loop was introduced in <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-1-functions-for-simulation-resampling\/\">Part 1<\/a><\/span><\/strong> of this series. In a nutshell, we are taking a random sample from the fake data (with replacement), calculating the mean of that random sample, and storing it in the element <strong>boot_dat<\/strong>. From there, we calculate the summary statistics or the original sample and the bootstrap resample, which we store in a data frame for comparison purposes, and then produce a histogram of the original sample and the bootstrap resample, which are visualized below the code.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nlibrary(tidyverse)\r\n\r\n## create fake data\r\ndat &lt;- c(5, 10, 44, 3, 29, 10, 16.7, 22.3, 28, 1.4, 25)\r\n\r\n### Bootstrap Resamples ###\r\n# we want 1000 bootstrap resamples\r\nn_boots &lt;- 1000\r\n\r\n## create an empty vector to store our bootstraps\r\nboot_dat &lt;- rep(NA, n_boots)\r\n\r\n# set seed for reproducibility\r\nset.seed(786)\r\n\r\n# write for() loop for the resampling\r\nfor(i in 1:n_boots){\r\n  # random sample of 1:n number of observations in our data, with replacement\r\n  ind &lt;- sample(1:length(dat), replace = TRUE)\r\n  \r\n  # Use the row indexes to select the given values from the vector and calculate the mean\r\n  boot_dat&#x5B;i] &lt;- mean(dat&#x5B;ind])\r\n}\r\n\r\n# Look at the first 6 bootstrapped means\r\nhead(boot_dat)\r\n\r\n### Compare Bootstrap data to original data ###\r\n## mean and standard deviation of the fake data\r\ndat_mean &lt;- mean(dat)\r\ndat_sd &lt;- sd(dat)\r\n\r\n# standard error of the mean\r\ndat_se &lt;- sd(dat) \/ sqrt(length(dat))\r\n\r\n# 95% confidence interval\r\ndat_ci95 &lt;- paste0(round(dat_mean - 1.96*dat_se, 1), &quot;, &quot;, round(dat_mean + 1.96*dat_se, 1))\r\n\r\n# mean an SD of bootstrapped data\r\nboot_mean &lt;- mean(boot_dat)\r\n\r\n# the vector is the mean of each bootstrap sample, so the standard deviation of these means represents the standard error\r\nboot_se &lt;- sd(boot_dat)\r\n\r\n# to get the standard deviation we can convert the standard error back\r\nboot_sd &lt;- boot_se * sqrt(length(dat))\r\n\r\n# 95% quantile interval\r\nboot_ci95 &lt;- paste0(round(boot_mean - 1.96*boot_se, 1), &quot;, &quot;, round(boot_mean + 1.96*boot_se, 1)) ## Put everything together data.frame(data = c(&quot;fake sample&quot;, &quot;bootstrapped resamples&quot;), N = c(length(dat), length(boot_dat)), mean = c(dat_mean, boot_mean), sd = c(dat_sd, boot_sd), se = c(dat_se, boot_se), ci95 = c(dat_ci95, boot_ci95)) %&amp;gt;%\r\n  knitr::kable()\r\n\r\n# plot the distributions\r\npar(mfrow = c(1, 2))\r\nhist(dat,\r\n     xlab = &quot;Obsevations&quot;,\r\n     main = &quot;Fake Data&quot;)\r\nabline(v = dat_mean,\r\n       col = &quot;red&quot;,\r\n       lwd = 3,\r\n       lty = 2)\r\nhist(boot_dat,\r\n     xlab = &quot;bootstrapped means&quot;,\r\n     main = &quot;1000 bootstrap resamples&quot;)\r\nabline(v = boot_mean,\r\n       col = &quot;red&quot;,\r\n       lwd = 3,\r\n       lty = 2)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.22-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3136\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.22-AM-1024x170.png\" alt=\"\" width=\"625\" height=\"104\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.22-AM-1024x170.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.22-AM-300x50.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.22-AM-768x127.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.22-AM-624x103.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.22-AM.png 1050w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a> <a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.36-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3137\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.36-AM-1024x962.png\" alt=\"\" width=\"625\" height=\"587\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.36-AM-1024x962.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.36-AM-300x282.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.36-AM-768x722.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.36-AM-624x586.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.42.36-AM.png 1160w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>R offers a bootstrap function from the <strong>boot<\/strong> package that allows you to do the same thing without writing out your own <strong>for()<\/strong> loop. Below is an example of coding the same procedure in the <strong>boot<\/strong> package and the outputs the function provides, which are similar to the output we get above, save for slight differences due to random sampling.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# write a function to calculate the mean of our sample data\r\nsample_mean &lt;- function(x, d){\r\n     return(mean(x&#x5B;d]))\r\n}\r\n\r\n# run the boot() function\r\nlibrary(boot)\r\n\r\n# run the boot function\r\nboot_func_output &lt;- boot(dat, statistic = sample_mean, R = 1000)\r\n\r\n# produce a plot of the output\r\nplot(boot_func_output)\r\n\r\n# get the mean and standard error\r\nboot_func_output\r\n\r\n# get 95% CI around the mean\r\nboot.ci(boot_func_output, type = &quot;basic&quot;, conf = 0.95)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.45-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3138\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.45-AM.png\" alt=\"\" width=\"538\" height=\"228\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.45-AM.png 962w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.45-AM-300x127.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.45-AM-768x326.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.45-AM-624x265.png 624w\" sizes=\"auto, (max-width: 538px) 100vw, 538px\" \/><\/a> <a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.58-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3139\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.58-AM-1024x963.png\" alt=\"\" width=\"525\" height=\"494\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.58-AM-1024x963.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.58-AM-300x282.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.58-AM-768x723.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.58-AM-624x587.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.45.58-AM.png 1148w\" sizes=\"auto, (max-width: 525px) 100vw, 525px\" \/><\/a><\/p>\n<p>We can bootstrap pretty much anything we want. We don&#8217;t have to limit ourselves to producing the distribution around the mean of a population. For example, let&#8217;s bootstrap regression coefficients to understand the uncertainty in them.<\/p>\n<p>First, let&#8217;s use the <strong>boot()<\/strong> function to conduct our analysis. We will fit a simple linear regression predicting miles per gallon from engine weight using the <strong>mtcars<\/strong> package. We will then write a function for that uses a random sample of rows to create the same linear regression and store those coefficients from 1000 linear regressions so that we can plot a histogram representing the slope coefficient from the resampled models and also summarize the distribution with confidence intervals.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# load the mtcars data\r\nd &lt;- mtcars d %&gt;%\r\n  head()\r\n\r\n# fit a regression model\r\nfit_mpg &lt;- lm(mpg ~ wt, data = d)\r\nsummary(fit_mpg)\r\ncoef(fit_mpg)\r\nconfint(fit_mpg)\r\n\r\n# Write a function that can perform a bootstrap over the intercept and slope of the model\r\n# bootstrap function\r\nreg_coef_boot &lt;- function(data, row_id){\r\n  # we want to resample the rows\r\n  fit &lt;- lm(mpg ~ wt, data = d&#x5B;row_id, ])\r\n  coef(fit)\r\n}\r\n\r\n# run this once on a small subset of the row ids to see how it works\r\nreg_coef_boot(data = d, row_id = 1:20)\r\n\r\n# run the boot() function 1000 times\r\ncoef_boot &lt;- boot(data = d, reg_coef_boot, 1000) \r\n\r\n# check the output (coefficient and SE) \r\ncoef_boot \r\n\r\n# get the confidence intervals \r\nboot.ci(coef_boot, index= 2) \r\n\r\n# all 1000 of the bootstrap resamples can be called coef_boot$t %&amp;gt;%\r\n  head()\r\n\r\n# plot the first 20 bootstrapped intercepts and slopes over the original data\r\nplot(x = d$wt,\r\n     y = d$mpg,\r\n     pch = 19)\r\nfor(i in 1:20){\r\n  abline(a = coef_boot$t&#x5B;i, 1],\r\n       b = coef_boot$t&#x5B;i, 2],\r\n       lty = 2,\r\n       lwd = 3,\r\n       col = &quot;light grey&quot;)\r\n}\r\n\r\n## histogram of the slope coefficient\r\nhist(coef_boot$t&#x5B;, 2])\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.28-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3140\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.28-AM-765x1024.png\" alt=\"\" width=\"465\" height=\"623\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.28-AM-765x1024.png 765w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.28-AM-224x300.png 224w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.28-AM-624x836.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.28-AM.png 766w\" sizes=\"auto, (max-width: 465px) 100vw, 465px\" \/><\/a> <a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.37-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3141\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.37-AM-1024x1017.png\" alt=\"\" width=\"480\" height=\"476\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.37-AM-1024x1017.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.37-AM-150x150.png 150w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.37-AM-300x298.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.37-AM-768x762.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.37-AM-624x619.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.53.37-AM.png 1096w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><\/a><\/p>\n<p>We can do this by hand if we don&#8217;t want to use the built in <strong>boot()<\/strong> function. (<strong>SIDE NOTE: <\/strong>I usually prefer to code my own resampling and simulations as it gives me more flexibility with respect to the things I&#8217;d like to add or the values I&#8217;d like to store from each iteration.<\/p>\n<p>Below, instead of producing a histogram of just the slope coefficient, I use both the resampled intercept and slope and add 20 (of the 1000) lines to a scatter plot to show the way in which each of these lines represents a plausible regression line for the data. As you can see, the regression line confidence interval is starting to take shape, even with just 20 out of 1000 resamples, and this gives us a good understanding of not only the variability in our possible regression line fit for the underlying data but also, perhaps should make us less overconfident in our research findings knowing that there are many possible outcomes from the sample data we have obtained.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## 1000 resamples\r\nn_samples &lt;- 1000\r\n\r\n## N observations\r\nn_obs &lt;- nrow(mtcars)\r\n\r\n## empty storage data frame for the coefficients\r\ncoef_storage &lt;- data.frame(\r\n  intercept = rep(NA, n_samples),\r\n  slope = rep(NA, n_samples)\r\n)\r\n\r\nfor(i in 1:n_samples){\r\n  \r\n  ## sample dependent and independent variables\r\n  row_ids &lt;- sample(1:n_obs, size = n_obs, replace = TRUE)\r\n  new_df &lt;- d&#x5B;row_ids, ]\r\n  \r\n  ## construct model\r\n  model &lt;- lm(mpg ~ wt, data = new_df)\r\n  \r\n  ## store coefficients\r\n  # intercept\r\n  coef_storage&#x5B;i, 1] &lt;- coef(model)&#x5B;1]\r\n  \r\n  # slope\r\n  coef_storage&#x5B;i, 2] &lt;- coef(model)&#x5B;2]\r\n  \r\n}\r\n\r\n## see results\r\nhead(coef_storage)\r\ntail(coef_storage)\r\n\r\n## Compare the results to those of the boot function\r\napply(X = coef_boot$t, MARGIN = 2, FUN = mean)\r\napply(X = coef_storage, MARGIN = 2, FUN = mean)\r\n\r\napply(X = coef_boot$t, MARGIN = 2, FUN = sd)\r\napply(X = coef_storage, MARGIN = 2, FUN = sd)\r\n\r\n## plot first 20 lines\r\nplot(x = d$wt,\r\n     y = d$mpg,\r\n     pch = 19)\r\nfor(i in 1:20){\r\n  abline(a = coef_storage&#x5B;i, 1],\r\n       b = coef_storage&#x5B;i, 2],\r\n       lty = 2,\r\n       lwd = 3,\r\n       col = &quot;light grey&quot;)\r\n}\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.58.35-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3142\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.58.35-AM-1024x901.png\" alt=\"\" width=\"625\" height=\"550\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.58.35-AM-1024x901.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.58.35-AM-300x264.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.58.35-AM-768x676.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.58.35-AM-624x549.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-10.58.35-AM.png 1134w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Simulating a relationship between two variables<\/strong><\/span><\/p>\n<p>As discussed in <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-1-functions-for-simulation-resampling\/\">Part 1<\/a><\/span><\/strong>, simulation differs from resampling in that we use the parameters of the observed data to compute a new distribution, versus sampling from the data we have on hand.<\/p>\n<p>For example, using the mean and standard deviation of <strong>mpg<\/strong> from the <strong>mtcars<\/strong> data set, we can simulate 1000 random draws from a normal distribution.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## load the mtcars data set\r\nd &lt;- mtcars\r\n\r\n## make a random draw from the normal distribution for mph\r\nset.seed(5)\r\nmpg_sim &lt;- rnorm(n = 1000, mean = mean(d$mpg), sd = sd(d$mpg))\r\n\r\n## plot and summarize\r\nhist(mpg_sim)\r\n\r\nmean(mpg_sim)\r\nsd(mpg_sim)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.01.07-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3143\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.01.07-AM-1024x1018.png\" alt=\"\" width=\"489\" height=\"486\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.01.07-AM-1024x1018.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.01.07-AM-150x150.png 150w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.01.07-AM-300x298.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.01.07-AM-768x764.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.01.07-AM-624x621.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.01.07-AM.png 1092w\" sizes=\"auto, (max-width: 489px) 100vw, 489px\" \/><\/a><\/p>\n<p>Frequently, we are interested in the relationship between two variables (e.g., correlation, regression, etc.). Let&#8217;s simulate two variables, <strong>x<\/strong> and<strong> y<\/strong>, which are linearly related in some way. To do this, we first simulate the variable <strong>x<\/strong> and then simulate <strong>y<\/strong> to be <strong>x<\/strong> plus some level of random noise.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# simulate x and y\r\nset.seed(1098)\r\nx &lt;- rnorm(n = 10, mean = 50, sd = 10)\r\ny &lt;- x + rnorm(n = length(x), mean = 0, sd = 10)\r\n\r\n# put the results in a data frame\r\ndat &lt;- data.frame(x, y)\r\ndat\r\n\r\n# how correlated are the two variables\r\ncor.test(x, y)\r\n\r\n# fit a regression for the two variables\r\nfit &lt;- lm(y ~ x)\r\nsummary(fit)\r\n\r\n# plot the two variables with the regression line\r\nplot(x, y, pch = 19)\r\nabline(fit, col = &quot;red&quot;, lwd = 2, lty = 2)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.22-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3144\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.22-AM.png\" alt=\"\" width=\"478\" height=\"386\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.22-AM.png 918w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.22-AM-300x242.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.22-AM-768x621.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.22-AM-624x504.png 624w\" sizes=\"auto, (max-width: 478px) 100vw, 478px\" \/><\/a> <a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.31-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3145\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.31-AM-1024x908.png\" alt=\"\" width=\"459\" height=\"407\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.31-AM-1024x908.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.31-AM-300x266.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.31-AM-768x681.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.31-AM-624x553.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.02.31-AM.png 1132w\" sizes=\"auto, (max-width: 459px) 100vw, 459px\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Simulating a data set with multiple variables<\/strong><\/span><\/p>\n<p>Frequently, we might have a hypothesis regarding how correlated multiple variables are with each other. The example above produced a relationship of two variables with a direct relationship between them along with some noise. We might want to specify this relationship given a correlation coefficient or covariance between them. Additionally, we might have more than two variables that we want to simulate relationships between.<\/p>\n<p>To do this in R we can take advantage of two packages:<\/p>\n<ul>\n<li><strong>MASS <\/strong>via the <strong>mvrnorm()<\/strong><\/li>\n<li><strong>mvtnorm <\/strong>via the <strong>mvrnorm()<\/strong><\/li>\n<\/ul>\n<p>Both packages have a function for simulating multivariate normal distributions. The primary difference is that the <strong>Sigma<\/strong> argument in the <strong>MASS<\/strong> package function, <strong>mvrnorm()<\/strong>, accepts a covariance matrix while the <strong>sigma<\/strong> argument in the <strong>mvtnorm<\/strong> package, <strong>rmvnorm()<\/strong> accepts a correlation matrix. I&#8217;ll show both examples but I tend to stick with the <strong>mtvnorm<\/strong> package because (at least for my brain) it is easier for me to think in terms of correlation coefficients instead of covariances.<\/p>\n<p>First we simulate some data:<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## create fake data\r\nset.seed(1234)\r\nfake_dat &lt;- data.frame(\r\n  group = rep(c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;), each = 5),\r\n  x = rnorm(n = 15, mean = 10, sd = 2),\r\n  y = rnorm(n = 15, mean = 30, sd = 10),\r\n  z = rnorm(n = 15, mean = 75, sd = 20)\r\n)\r\n\r\nfake_dat\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.06.50-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3146\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.06.50-AM.png\" alt=\"\" width=\"445\" height=\"443\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.06.50-AM.png 590w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.06.50-AM-150x150.png 150w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.06.50-AM-300x300.png 300w\" sizes=\"auto, (max-width: 445px) 100vw, 445px\" \/><\/a><\/p>\n<p>Look at the correlation and variance between the three numeric variables.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# correlation\r\nround(cor(fake_dat&#x5B;, -1]), 3)\r\n\r\n# variance\r\nround(var(fake_dat&#x5B;, -1]), 3)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.08.10-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3147\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.08.10-AM.png\" alt=\"\" width=\"365\" height=\"338\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.08.10-AM.png 486w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.08.10-AM-300x278.png 300w\" sizes=\"auto, (max-width: 365px) 100vw, 365px\" \/><\/a><\/p>\n<p>We can use this information to simulate new x, y, or z variables.<\/p>\n<p><em><strong>Simulating x and y with the MASS package<\/strong><\/em><\/p>\n<p>Remember, for the <strong>MASS<\/strong> package, the <strong>Sigma<\/strong> argument is a matrix of covariances for the variables you are simulating from a multivariate normal distribution.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## get a vector of the mean for each variable\r\nvariable_means &lt;- apply(X = fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;)], MARGIN = 2, FUN = mean)\r\n\r\n## Get a matrix of the covariance between x and y\r\nvariable_sigmas &lt;- var(fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;)])\r\n\r\n## simulate 1000 new x and y variables using the MASS package\r\nset.seed(98)\r\nnew_sim &lt;- MASS::mvrnorm(n = 1000, mu = variable_means, Sigma = variable_sigmas)\r\nhead(new_sim)\r\n\r\n### look at the results relative to the original x and y\r\n## column means\r\nvariable_means\r\napply(X = new_sim, MARGIN = 2, FUN = mean)\r\n\r\n## covariance\r\nvar(fake_dat&#x5B;, c(&quot;x&quot;,&quot;y&quot;)])\r\nvar(new_sim)\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.10.00-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3148\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.10.00-AM.png\" alt=\"\" width=\"367\" height=\"264\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.10.00-AM.png 442w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.10.00-AM-300x216.png 300w\" sizes=\"auto, (max-width: 367px) 100vw, 367px\" \/><\/a><\/p>\n<p>Notice that the variance between <strong>x <\/strong>and <strong>y<\/strong> (the off diagonal of the matrix) is very similar between the fake data (the observed sample) and the simulated data (new_sim).<\/p>\n<p><em><strong>Simulating x and y with the mtvnorm package<\/strong><\/em><\/p>\n<p>Different than the <strong>MASS<\/strong> package, The <strong>rmvnorm()<\/strong> function from the <strong>mtvnorm<\/strong> package requires the <strong>sigma<\/strong> argument to be a correlations matrix.<\/p>\n<p>Let&#8217;s repeat the above process with our <strong>fake_dat<\/strong> and simulate a relationship between <strong>x<\/strong> and <strong>y<\/strong>.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## get a vector of the mean for each variable\r\nvariable_means &lt;- apply(X = fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;)], MARGIN = 2, FUN = mean)\r\n\r\n## Get a matrix of the correlation between x and y\r\nvariable_cor &lt;- cor(fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;)])\r\n\r\n## simulate 1000 new x and y variables using the mvtnorm package\r\nset.seed(98)\r\nnew_sim &lt;- mvtnorm::rmvnorm(n = 1000, mean = variable_means, sigma = variable_cor)\r\nhead(new_sim)\r\n\r\n### look at the results relative to the original x and y\r\n## column means\r\nvariable_means\r\napply(X = new_sim, MARGIN = 2, FUN = mean)\r\n\r\n## correlation\r\ncor(fake_dat&#x5B;, c(&quot;x&quot;,&quot;y&quot;)])\r\ncor(new_sim)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.14.17-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3150\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.14.17-AM.png\" alt=\"\" width=\"368\" height=\"261\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.14.17-AM.png 432w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.14.17-AM-300x213.png 300w\" sizes=\"auto, (max-width: 368px) 100vw, 368px\" \/><\/a><\/p>\n<p>Similar to the previous example, we notice that the off diagonal correlation coefficient between <strong>x <\/strong>and <strong>y<\/strong> is very similar when comparing the simulated data to the fake data.<\/p>\n<p>So, what is happening here? Both packages produce the same result, one uses a covariance matrix and the other uses a correlation matrix. The kicker here is understanding the relationship between covariance and correlation. Covariance is explaining how two variables vary together, however, its units aren&#8217;t on a scale that is directly interpretable to us. But, we can convert the covariance between two variables to a correlation by dividing their covariance by the product of their individual standard deviations.<\/p>\n<p>For example, here is the covariance matrix between <strong>x<\/strong> and <strong>y<\/strong> in the fake data set.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ncov(fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;)])\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.16.23-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3151\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.16.23-AM.png\" alt=\"\" width=\"389\" height=\"119\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.16.23-AM.png 464w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.16.23-AM-300x92.png 300w\" sizes=\"auto, (max-width: 389px) 100vw, 389px\" \/><\/a><\/p>\n<p>The covariance between the two variables is on the off diagonal, 2.389. We can store this in its own element.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ncov_xy &lt;- cov(fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;)])&#x5B;2,1]\r\ncov_xy\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.17.28-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-3152\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.17.28-AM.png\" alt=\"\" width=\"716\" height=\"108\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.17.28-AM.png 716w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.17.28-AM-300x45.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.17.28-AM-624x94.png 624w\" sizes=\"auto, (max-width: 716px) 100vw, 716px\" \/><\/a><\/p>\n<p>Let&#8217;s store the standard deviation of both `x` and `y` in their own elements to make the equation easier to read.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nsd_x &lt;- sd(fake_dat$x)\r\nsd_y &lt;- sd(fake_dat$y)\r\n<\/pre>\n<p>Finally, we calculate the correlation by dividing the covariance by the product of the two standard deviations and check our results by calling the <strong>cor()<\/strong> function on the two variables.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## covariance to correlation\r\ncov_to_cor &lt;- cov_xy \/ (sd_x * sd_y)\r\ncov_to_cor\r\n\r\n## check results with the corr() function\r\ncor(fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;)])\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.19.03-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3153\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.19.03-AM.png\" alt=\"\" width=\"536\" height=\"288\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.19.03-AM.png 644w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.19.03-AM-300x161.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.19.03-AM-624x335.png 624w\" sizes=\"auto, (max-width: 536px) 100vw, 536px\" \/><\/a><\/p>\n<p>By dividing the covariance by the product of the standard deviation of the two variables we can see the relationship between covariance and correlation and now understand why the results from the <strong>MASS <\/strong>and <strong>mtvnorm<\/strong> produce similar results. Understanding this relationship becomes valuable when we move onto simulating more complex relationships, for example when simulating mixed models.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>What about three variables?<\/strong><\/span><\/p>\n<p>What if we want to simulate all three variables &#8212; <strong>x<\/strong>, <strong>y<\/strong>, and <strong>z<\/strong>?<\/p>\n<p>All we need is a larger covariance or correlation matrix, depending on which of the above packages you&#8217;d like to use. Since we usually won&#8217;t be creating these matrices from a data set, as I did above, I&#8217;ll show how to create your own matrix and run the simulation.<\/p>\n<p>First, let&#8217;s store a vector of plausible mean values for <strong>x<\/strong>, <strong>y<\/strong>, and <strong>z<\/strong>.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## Look at the mean values we had in the fake data\r\napply(X = fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)], MARGIN = 2, FUN = mean)\r\n\r\n## create a vector of possible mean values for the simulation\r\nmus &lt;- c(9, 26, 63)\r\n\r\n## look at the correlation matrix for the three variables in the fake data\r\ncor(fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)])\r\n\r\n## Create a matrix that stores plausible correlations between the variables you want to simulate\r\nr_matrix &lt;- matrix(c(1, 0.14, -0.24,\r\n                    0.14, 1, -0.35,\r\n                    -0.24, -0.35, 1), \r\n                   nrow = 3, ncol = 3,\r\n       dimnames = list(c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;),\r\n                       c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)))\r\n\r\nr_matrix\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.22.10-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3154\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.22.10-AM-1024x680.png\" alt=\"\" width=\"625\" height=\"415\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.22.10-AM-1024x680.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.22.10-AM-300x199.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.22.10-AM-768x510.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.22.10-AM-624x415.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.22.10-AM.png 1442w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>Next, we create 1000 simulations of a multivariate normal distribution between <strong>x<\/strong>, <strong>y<\/strong>, and <strong>z<\/strong>. We then compare our correlation coefficients between the fake data, which is our observed sample, and the simulated data<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## simulate 1000 new x, y, and z variables using the mvtnorm package\r\nset.seed(43)\r\nnew_sim &lt;- mvtnorm::rmvnorm(n = 1000, mean = mus, sigma = r_matrix)\r\nhead(new_sim)\r\n\r\n### look at the results relative to the original x, y, and z\r\n## column means\r\napply(X = fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)], MARGIN = 2, FUN = mean)\r\napply(X = new_sim, MARGIN = 2, FUN = mean)\r\n\r\n## correlation\r\ncor(fake_dat&#x5B;, c(&quot;x&quot;, &quot;y&quot;, &quot;z&quot;)])\r\ncor(new_sim)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.24.24-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3155\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.24.24-AM-986x1024.png\" alt=\"\" width=\"625\" height=\"649\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.24.24-AM-986x1024.png 986w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.24.24-AM-289x300.png 289w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.24.24-AM-768x798.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.24.24-AM-624x648.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/07\/Screenshot-2023-07-06-at-11.24.24-AM.png 1026w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>The results from the simulation are pretty similar to the fake_dat dataset. If you recall from the correlation matrix and the vector of means, we didn&#8217;t use exact values from the observed data, so that, along with the random draws from the multivariate normal distribution, leads to the small amount of differences.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Wrapping Up<\/strong><\/span><\/p>\n<p>In this second installment of the Simulations in R series we&#8217;ve walked through how to code our own bootstrap resampling for both means and regression coefficients. We then progressed to building simulations of both bivariate and multivariate normal distributions. This, along with the basic info in <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-1-functions-for-simulation-resampling\/\">Part 1<\/a><\/span><\/strong>, will serve us well as we progress forward in our work and begin to explore using simulations for comparisons between group means (simulated t-tests) and building regression models.<\/p>\n<p>As always, all of the code is available in the <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Constructing-Simulations-Tutorial\">Github repository<\/a><\/span><\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Part 1 of this series we covered some of the basic functions that we will need to perform resampling and simulations in R. In Part 2 we will now move to building bootstrap resamples and simulating distributions for both bivariate and multivariate relationships. Some stuff we will cover: Coding bootstrap resampling by hand for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[47,43],"tags":[],"class_list":["post-3135","post","type-post","status-publish","format-standard","hentry","category-model-building-in-r","category-sports-analytics"],"_links":{"self":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/3135","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/comments?post=3135"}],"version-history":[{"count":1,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/3135\/revisions"}],"predecessor-version":[{"id":3156,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/3135\/revisions\/3156"}],"wp:attachment":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/media?parent=3135"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/categories?post=3135"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/tags?post=3135"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}