{"id":3198,"date":"2023-08-12T05:30:59","date_gmt":"2023-08-12T05:30:59","guid":{"rendered":"http:\/\/optimumsportsperformance.com\/blog\/?p=3198"},"modified":"2023-08-12T12:48:10","modified_gmt":"2023-08-12T12:48:10","slug":"simulations-in-r-part-5-homoskedasticity-assumption-in-regression","status":"publish","type":"post","link":"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-5-homoskedasticity-assumption-in-regression\/","title":{"rendered":"Simulations in R Part 5: Homoskedasticity Assumption in Regression"},"content":{"rendered":"<p>We&#8217;ve worked through a number of tutorials on building simulations and in Part 4 we worked up to building simulations for linear regression. Here are the previous 4 parts:<\/p>\n<ul>\n<li><strong><a href=\"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-1-functions-for-simulation-resampling\/\"><span style=\"color: #0000ff;\">Part 1<\/span><\/a><\/strong> discussed the basic functions for simulating and sampling data in R.<\/li>\n<li><strong><a href=\"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-2-bootstrapping-simulating-bivariate-and-multivariate-distributions\/\"><span style=\"color: #0000ff;\">Part 2<\/span><\/a><\/strong> walked us through how to perform bootstrap resampling and then simulate bivariate and multivariate distributions.<\/li>\n<li><span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-3-group-comparisons-via-simulation-simulating-t-tests\/\">Part 3<\/a><\/strong> <\/span>we worked making group comparisons by simulating thousands of t-tests<\/li>\n<li><span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/simulations-in-r-part-4-simulating-regression-models\/\">Part 4<\/a><\/strong><\/span> building simulations for linear regression<\/li>\n<\/ul>\n<p>There are a number of assumptions that underpin linear regression models. Simulation can be a useful way of exploring these assumptions and understanding how violating these assumptions can lead to bias, large variance in the regression coefficients, and\/or poor predictions.<\/p>\n<p>Some typical assumptions include:<\/p>\n<ol>\n<li>Homoskedasticity<\/li>\n<li>Multicollinearity of independent variables<\/li>\n<li>Measurement Error<\/li>\n<li>Serial correlation<\/li>\n<\/ol>\n<p>Today, we will explore the assumption of homoskedasticity.<\/p>\n<p>As always, all code is freely available in the <span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Constructing-Simulations-Tutorial\">Github repository<\/a><\/strong><\/span>.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Creating the baseline simulation<\/strong><\/span><\/p>\n<p>Before exploring how violations of the homoskedasticity assumption influence a regression model, we need a baseline model to compare it against. So, we will begin by simulating a simple linear regression with 1 predictor. Our model will look like this:<\/p>\n<p style=\"text-align: center;\"><em>y = 2 + 5*x + e<\/em><\/p>\n<p>Where <em><strong>e<\/strong><\/em> will be random error from a normal distribution with a mean of 0 and standard deviation of 1.<\/p>\n<p>The code below should look familiar as we&#8217;ve been building up simulations like this in the previous 4 tutorials. We specify the intercept to be 2 and the slope to be 5. The independent variable, <em><strong>x<\/strong><\/em><em>, <\/em>is drawn from a uniform distribution between -1 and 1. With each of the 500 iterations of the <strong>for()<\/strong> loop we store the simulated intercept, slope, and their corresponding standard errors, which we calculate using the variance-covariance matrix (which we discussed in the previous tutorial). Finally, we also store the residual standard error (RSE) of each of the simulated models.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nlibrary(tidymodels)\r\nlibrary(patchwork)\r\n\r\n## set seed for reproducibility\r\nset.seed(58968)\r\n\r\n## create a data frame to store intercept values, slope values, their standard errors, and the model residual standard error, for each simulation\r\nsim_params &lt;- data.frame(intercept = NA,\r\n                      slope = NA,\r\n                      intercept_se = NA,\r\n                      slope_se = NA,\r\n                      model_rse = NA)\r\n\r\n# true intercept value\r\nintercept &lt;- 2\r\n\r\n# true slope value\r\nslope &lt;- 5\r\n\r\n## Number of simulations to run\r\nn &lt;- 500\r\n\r\n# random draw from a uniform distribution to simulate the independent variable\r\nX &lt;- runif(n = n, min = -1, max = 1)\r\n\r\n## loop for regression model\r\nfor(i in 1:n){\r\n  \r\n  # create dependent variable, Y\r\n  Y &lt;- intercept + slope*X + rnorm(n = n, mean = 0, sd = 1)\r\n  \r\n  # build model\r\n  model &lt;- lm(Y ~ X)\r\n  \r\n  ## store predictions\r\n  fitted_vals &lt;- model$fitted.values\r\n\r\n  ## store residuals\r\n  # output_df&#x5B;i, 2] &amp;lt;- model$residuals\r\n  \r\n  # variance-covariance matrix for the model\r\n  vcv &lt;- vcov(model)\r\n  \r\n  # estimates for the intercept\r\n  sim_params&#x5B;i, 1] &lt;- model$coef&#x5B;1]\r\n  \r\n  # estimates for the slope\r\n  sim_params&#x5B;i, 2] &lt;- model$coef&#x5B;2]\r\n  \r\n  # SE for the intercept\r\n  sim_params&#x5B;i, 3] &lt;- sqrt(diag(vcv)&#x5B;1])\r\n  \r\n  # SE for the slope\r\n  sim_params&#x5B;i, 4] &lt;- sqrt(diag(vcv)&#x5B;2])\r\n  \r\n  # model RSE\r\n  sim_params&#x5B;i, 5] &lt;- summary(model)$sigma\r\n  \r\n}\r\n\r\nhead(sim_params)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.30.25-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3199\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.30.25-PM.png\" alt=\"\" width=\"521\" height=\"185\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.30.25-PM.png 804w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.30.25-PM-300x107.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.30.25-PM-768x273.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.30.25-PM-624x222.png 624w\" sizes=\"auto, (max-width: 521px) 100vw, 521px\" \/><\/a><\/p>\n<p>Now we summarize the data to see if we have values close to the specified model parameters.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nsim_params %&gt;%\r\n  summarize(across(.cols = everything(),\r\n                   ~mean(.x)))\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.36.22-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3200\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.36.22-PM.png\" alt=\"\" width=\"511\" height=\"116\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.36.22-PM.png 812w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.36.22-PM-300x68.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.36.22-PM-768x174.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.36.22-PM-624x141.png 624w\" sizes=\"auto, (max-width: 511px) 100vw, 511px\" \/><\/a><\/p>\n<p>The average intercept and slope of the 500 simulated models are pretty much identical to the specified intercept and slope of 2 and 5, respectively.<\/p>\n<p>The final model of the 500 iterations is also stored from our for loop and we can look directly at it and create plots of the model fit.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# model summary\r\nsummary(model)\r\n\r\n# model fit plots\r\npar(mfrow = c(2,2))\r\nplot(model)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.36-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3201\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.36-PM.png\" alt=\"\" width=\"451\" height=\"337\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.36-PM.png 944w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.36-PM-300x224.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.36-PM-768x574.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.36-PM-624x467.png 624w\" sizes=\"auto, (max-width: 451px) 100vw, 451px\" \/><\/a> <a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.48-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3202\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.48-PM-1024x862.png\" alt=\"\" width=\"503\" height=\"423\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.48-PM-1024x862.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.48-PM-300x253.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.48-PM-768x647.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.48-PM-624x525.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.38.48-PM.png 1140w\" sizes=\"auto, (max-width: 503px) 100vw, 503px\" \/><\/a><\/p>\n<p>We can also create a function that lets us evaluate how often the 95% confidence interval of our simulated beta coefficients cover the true beta coefficients that we specified for the simulation. From there, we can get a coverage probability and a 95% probability coverage interval.<\/p>\n<p>&nbsp;<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n### Create a coverage probability function\r\ncoverage_interval95 &lt;- function(beta_coef, se_beta_coef, true_beta_val, model_df){\r\n  \r\n  level95 &lt;- 1 - (1 - 0.95) \/ 2\r\n  \r\n  # lower 95\r\n  lower95 &lt;- beta_coef - qt(level95, df = model_df)*se_beta_coef\r\n  \r\n  # upper 95\r\n  upper95 &lt;- beta_coef + qt(level95, df = model_df)*se_beta_coef\r\n  \r\n  # what rate did we cover the true value (hits and misses)\r\n  hits &lt;- ifelse(true_beta_val &gt;= lower95 &amp;amp; true_beta_val &lt;= upper95, 1, 0)\r\n  prob_cover &lt;- mean(hits)\r\n  \r\n  # create the probability coverage intervals\r\n  low_coverage_interval &lt;- prob_cover - 1.96 * sqrt((prob_cover * (1 - prob_cover)) \/ length(beta_coef))\r\n  \r\n  upper_coverage_interval &lt;- prob_cover + 1.96 * sqrt((prob_cover * (1 - prob_cover)) \/ length(beta_coef))\r\n  \r\n  # results in a list\r\n  return(list('Probability of Covering the True Value' = prob_cover,\r\n              '95% Prob ability Coverage Intervals' = c(low_coverage_interval, upper_coverage_interval)))\r\n  \r\n}\r\n<\/pre>\n<p>Let&#8217;s apply the function to the intercept.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ncoverage_interval95(beta_coef = sim_params$intercept,\r\n                    se_beta_coef = sim_params$intercept_se,\r\n                    true_beta = intercept,\r\n                    model_df = model$df.residual)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.42.36-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3203\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.42.36-PM.png\" alt=\"\" width=\"500\" height=\"147\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.42.36-PM.png 612w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.42.36-PM-300x88.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/a><\/p>\n<p>Now apply the function to the slope.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ncoverage_interval95(beta_coef = sim_params$slope,\r\n                    se_beta_coef = sim_params$slope_se,\r\n                    true_beta = slope,\r\n                    model_df = model$df.residual)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.43.20-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3204\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.43.20-PM.png\" alt=\"\" width=\"479\" height=\"147\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.43.20-PM.png 612w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.43.20-PM-300x92.png 300w\" sizes=\"auto, (max-width: 479px) 100vw, 479px\" \/><\/a><\/p>\n<p>In both cases we are covering the true betas around 95% of the time, with relatively small intervals.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Homoskedasticity<\/strong><\/span><\/p>\n<p>Linear models make an assumption that the variance of the residuals remain constant across the predicted values (homoskedastic). We can see what this looks like by plotting the fitted values relative to the residuals, which was the first plot in the model check plots we created for the 500th simulation above. We can see that the residuals exhibit relatively the same amount of variance across the fitted values.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nplot(model, which = 1)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.47.13-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3205\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.47.13-PM-1024x870.png\" alt=\"\" width=\"451\" height=\"383\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.47.13-PM-1024x870.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.47.13-PM-300x255.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.47.13-PM-768x652.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.47.13-PM-624x530.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.47.13-PM.png 1130w\" sizes=\"auto, (max-width: 451px) 100vw, 451px\" \/><\/a><\/p>\n<p>Let&#8217;s simulate a model with heteroskedastic residuals and see what it looks like. We will keep the same intercept and slope parameters as above. The only thing will we do is add an exponential parameter to the error term\u00a0 of the model to create a heteroskedastic outcome in the residuals.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## parameter for heteroskedasticity \r\nheteroskedasticity_param &lt;- 2\r\n\r\n## set seed for reproducibility\r\nset.seed(22)\r\n\r\n## data frame for results\r\nheteroskedastic_sim_params &lt;- data.frame(intercept = NA,\r\n                      slope = NA,\r\n                      intercept_se = NA,\r\n                      slope_se = NA,\r\n                      model_rse = NA)\r\n\r\n## for loop\r\nfor(i in 1:n ){\r\n  \r\n  # the error variance of Y is a function of X plus some random noise\r\n  Y &lt;- intercept + slope*X + rnorm(n = n, mean = 0, sd = exp(X*heteroskedasticity_param))\r\n  \r\n  # model\r\n  heteroskedastic_model &lt;- lm(Y ~ X)\r\n  \r\n  \r\n  # variance-covariance matrix\r\n  vcv &lt;- vcov(heteroskedastic_model)\r\n  \r\n  # estimates for the intercept\r\n  heteroskedastic_sim_params&#x5B;i, 1] &lt;- heteroskedastic_model$coef&#x5B;1]\r\n  \r\n  # estimates for the slope\r\n  heteroskedastic_sim_params&#x5B;i, 2] &lt;- heteroskedastic_model$coef&#x5B;2]\r\n  \r\n  # SE for the intercept\r\n  heteroskedastic_sim_params&#x5B;i, 3] &lt;- sqrt(diag(vcv)&#x5B;1])\r\n  \r\n  # SE for the slope\r\n  heteroskedastic_sim_params&#x5B;i, 4] &lt;- sqrt(diag(vcv)&#x5B;2])\r\n  \r\n  # model RSE\r\n  heteroskedastic_sim_params&#x5B;i, 5] &lt;- summary(heteroskedastic_model)$sigma\r\n  \r\n}\r\n\r\nhead(heteroskedastic_sim_params)\r\n\r\n\r\nplot(X, Y, pch = 19)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.49.54-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3206\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.49.54-PM-1024x825.png\" alt=\"\" width=\"446\" height=\"360\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.49.54-PM-1024x825.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.49.54-PM-300x242.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.49.54-PM-768x619.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.49.54-PM-624x503.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.49.54-PM.png 1134w\" sizes=\"auto, (max-width: 446px) 100vw, 446px\" \/><\/a><\/p>\n<p>The relationship between X and Y certainly looks weird given how it starts very tightly on the left side and then fans out on the right side.<\/p>\n<p>Let&#8217;s take the average across all 500 simulations for each coefficient and their corresponding standard errors.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nheteroskedastic_sim_params %&gt;%\r\n  summarize(across(.cols = everything(),\r\n                   ~mean(.x)))\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.50.53-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3207\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.50.53-PM.png\" alt=\"\" width=\"472\" height=\"126\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.50.53-PM.png 792w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.50.53-PM-300x80.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.50.53-PM-768x206.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.50.53-PM-624x167.png 624w\" sizes=\"auto, (max-width: 472px) 100vw, 472px\" \/><\/a><\/p>\n<p>The coefficients of 2.0 for the intercept and 5 for the slope are exactly what we set them as for the simulation. However, notice how much larger the standard errors are for the intercept and slope compared to the original model above. Additionally, notice that the model residual standard error has increased substantially compared to the previous model.<\/p>\n<p>Let&#8217;s get the 500th model again and check out the fitted vs residual plot.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# fitted vs residuals\r\nplot(heteroskedastic_model, which = 1)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.52.16-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3208\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.52.16-PM-1024x868.png\" alt=\"\" width=\"430\" height=\"365\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.52.16-PM-1024x868.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.52.16-PM-300x254.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.52.16-PM-768x651.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.52.16-PM-624x529.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.52.16-PM.png 1114w\" sizes=\"auto, (max-width: 430px) 100vw, 430px\" \/><\/a><\/p>\n<p>That looks like a large amount of heteroskedasticity as the residual variance is no longer homogenous across the range of fitted values. Notice the large fanning out towards the right side of the plot. As the predictions get larger so two does the variability in residuals, which we noticed when we plotted Y and X above.<\/p>\n<p>What we&#8217;ve learned is that the estimate of intercept and slope is unbiased for both the heteroskedastic and homoskedastic models, as they both are centered on the parameters that we specified for the simulation (intercept = 2, slope = 5). However, the heteroskedastic model creates greater variance in our coefficients. We can visualize how much uncertainty there is under the heteroskedastic model relative to the homoskedastic model by visualizing the density of the coefficient estimates from our two model simulations.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nplt_intercept &lt;- sim_params %&gt;%\r\n  mutate(model = 'homoskedastic model') %&gt;%\r\n  bind_rows(\r\n    heteroskedastic_sim_params %&gt;%\r\n      mutate(model = 'heteroskedastic model')\r\n  ) %&gt;%\r\n  ggplot(aes(x = intercept, fill = model)) +\r\n  geom_density(alpha = 0.6) +\r\n  theme_classic() +\r\n  theme(legend.position = &quot;top&quot;)\r\n\r\nplt_slope &lt;- sim_params %&gt;%\r\n  mutate(model = 'homoskedastic model') %&gt;%\r\n  bind_rows(\r\n    heteroskedastic_sim_params %&gt;%\r\n      mutate(model = 'heteroskedastic model')\r\n  ) %&gt;%\r\n  ggplot(aes(x = slope, fill = model)) +\r\n  geom_density(alpha = 0.6) +\r\n  theme_classic() +\r\n  theme(legend.position = &quot;none&quot;)\r\n\r\nplt_intercept | plt_slope\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.56.26-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-3209\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.56.26-PM-1024x742.png\" alt=\"\" width=\"542\" height=\"393\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.56.26-PM-1024x742.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.56.26-PM-300x217.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.56.26-PM-768x556.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.56.26-PM-624x452.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.56.26-PM.png 1378w\" sizes=\"auto, (max-width: 542px) 100vw, 542px\" \/><\/a><\/p>\n<p>Finally, let&#8217;s see how often the 95% coverage interval is covering the true intercept and slope in the heteroskedastic model.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\ncoverage_interval95(beta_coef = heteroskedastic_sim_params$intercept,\r\n                    se_beta_coef = heteroskedastic_sim_params$intercept_se,\r\n                    true_beta = intercept,\r\n                    model_df = model$df.residual)\r\n\r\n\r\ncoverage_interval95(beta_coef = heteroskedastic_sim_params$slope,\r\n                    se_beta_coef = heteroskedastic_sim_params$slope_se,\r\n                    true_beta = slope,\r\n                    model_df = model$df.residual)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.57.17-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-3210\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.57.17-PM-1024x656.png\" alt=\"\" width=\"625\" height=\"400\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.57.17-PM-1024x656.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.57.17-PM-300x192.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.57.17-PM-768x492.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.57.17-PM-624x400.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2023\/08\/Screenshot-2023-08-11-at-7.57.17-PM.png 1146w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>Notice that we are no longer covering the true model values at the 95% level.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Wrapping Up<\/strong><\/span><\/p>\n<p>Simulations can be useful for evaluating model assumptions and seeing how a model may be have under different circumstances. In this tutorial we saw that a model suffering from severe heteroskedasticity is still able to return the true values for the intercept and slope. However, the variance around those values is large, meaning that we have a lot of uncertainty about what those true parameters actually are. In upcoming tutorials we will explore other linear regression assumptions. If this is a topic you enjoy, a book that I recommend (which heavily influenced today&#8217;s tutorial) is <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/www.amazon.com\/Simulation-Resampling-Methods-Social-Science\/dp\/1452288909\">Monte Carlo Simulation and Resampling Methods for Social Sciences by Carsey &amp; Harden<\/a><\/span><\/strong>.<\/p>\n<p>As always, all code is freely available in the <span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/Constructing-Simulations-Tutorial\">Github repository<\/a><\/strong>.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We&#8217;ve worked through a number of tutorials on building simulations and in Part 4 we worked up to building simulations for linear regression. Here are the previous 4 parts: Part 1 discussed the basic functions for simulating and sampling data in R. Part 2 walked us through how to perform bootstrap resampling and then simulate [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[47],"tags":[],"class_list":["post-3198","post","type-post","status-publish","format-standard","hentry","category-model-building-in-r"],"_links":{"self":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/3198","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/comments?post=3198"}],"version-history":[{"count":6,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/3198\/revisions"}],"predecessor-version":[{"id":3217,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/3198\/revisions\/3217"}],"wp:attachment":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/media?parent=3198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/categories?post=3198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/tags?post=3198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}