{"id":2608,"date":"2022-07-25T01:13:01","date_gmt":"2022-07-25T01:13:01","guid":{"rendered":"http:\/\/optimumsportsperformance.com\/blog\/?p=2608"},"modified":"2022-11-08T03:34:05","modified_gmt":"2022-11-08T03:34:05","slug":"tidymodels-extract-model-coefficients-for-all-cross-validated-folds","status":"publish","type":"post","link":"https:\/\/optimumsportsperformance.com\/blog\/tidymodels-extract-model-coefficients-for-all-cross-validated-folds\/","title":{"rendered":"tidymodels &#8211; Extract model coefficients for all cross validated folds"},"content":{"rendered":"<p>As I&#8217;ve discussed previously, we sometimes don&#8217;t have enough data where doing a train\/test split makes sense. As such, we are better off building our model using cross-validation. In previous blog articles, I&#8217;ve talked about how to build models using cross-validation within the {<strong>tidymodels<\/strong>} framework (see <span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/fitting-saving-and-deploying-tidymodels-with-cross-validated-data\/\">HERE<\/a><\/strong><\/span> and <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/making-predictions-from-cross-validated-workflow-using-tidymodels\/\">HERE<\/a><\/span><\/strong>). In my prior examples, we fit the model over the cross-validation folds and then constructed the final model that we could then use to make predictions with, later on.<\/p>\n<p>Recently, I ran into a situation where I wanted to see what the model coefficients look like across all of the cross-validation folds. So, I decided to make a quick blog post on how to do this, in case it is useful to others.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Load Packages &amp; Data<\/strong><\/span><\/p>\n<p>We will use the {<strong>mtcars<\/strong>} package from R and build a regression model, using several independent variables, to predict miles per gallon (mpg).<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n### Packages -------------------------------------------------------\r\n\r\nlibrary(tidyverse)\r\nlibrary(tidymodels)\r\n\r\n### Data -------------------------------------------------------\r\n\r\ndat &lt;- mtcars dat %&gt;%\r\n  head()\r\n<\/pre>\n<p>&nbsp;<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Create Cross-Validation Folds of the Data<\/strong><\/span><\/p>\n<p>I&#8217;ll use 10-fold cross validation.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n### Modelling -------------------------------------------------------\r\n## Create 10 Cross Validation Folds\r\n\r\nset.seed(1)\r\ncv_folds &lt;- vfold_cv(dat, v = 10)\r\ncv_folds\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.59.07-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2609\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.59.07-PM.png\" alt=\"\" width=\"280\" height=\"378\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.59.07-PM.png 382w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.59.07-PM-222x300.png 222w\" sizes=\"auto, (max-width: 280px) 100vw, 280px\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Specify a linear model and set up the model formula<\/strong><\/span><\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## Specify the linear regression engine\r\n## model specs\r\nlm_spec &lt;- linear_reg() %&gt;%\r\n  set_engine(&quot;lm&quot;) \r\n\r\n\r\n## Model formula\r\nmpg_formula &lt;- mpg ~ cyl + disp + wt + drat\r\n<\/pre>\n<p><span style=\"text-decoration: underline;\"><strong>Set up the model workflow\u00a0 and fit the model to the cross-validated folds<\/strong><\/span><\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## Set up workflow\r\nlm_wf &lt;- workflow() %&gt;%\r\n  add_formula(mpg_formula) %&gt;%\r\n  add_model(lm_spec) \r\n\r\n## Fit the model to the cross validation folds\r\nlm_fit &lt;- lm_wf %&gt;%\r\n  fit_resamples(\r\n    resamples = cv_folds,\r\n    control = control_resamples(extract = extract_model, save_pred = TRUE)\r\n  )\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.02.17-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-2610\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.02.17-PM-1024x423.png\" alt=\"\" width=\"625\" height=\"258\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.02.17-PM-1024x423.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.02.17-PM-300x124.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.02.17-PM-768x317.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.02.17-PM-624x258.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.02.17-PM.png 1356w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Extract the model coefficients for each of the 10 folds (this is the fun part!)<\/strong><\/span><\/p>\n<p>Looking at the <strong>lm_fit<\/strong> output above, we see that it is a tibble consisting of various nested lists. The <strong>id<\/strong> column indicates which cross-validation fold the lists in each row pertain to. The model coefficients for each fold are stored in the <strong>.extracts<\/strong> column of lists. Instead of printing out all 10, let&#8217;s just have a look at the first 3 folds to see what they look like.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nlm_fit$.extracts %&gt;% \r\n  .&#x5B;1:3]\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.04.35-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2611\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.04.35-PM.png\" alt=\"\" width=\"296\" height=\"380\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.04.35-PM.png 510w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.04.35-PM-234x300.png 234w\" sizes=\"auto, (max-width: 296px) 100vw, 296px\" \/><\/a><\/p>\n<p>There we see in the <strong>.extracts<\/strong> column, <strong>&lt;lm<\/strong><strong>&gt; <\/strong>indicating the linear model for each fold. With a series of unnesting we can snag the model coefficients and then put them into a tidy format using the {<strong>broom<\/strong>} package. I&#8217;ve commented out each line of code below so that you know exactly what is happening.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n# Let's unnest this and get the coefficients out\r\nmodel_coefs &lt;- lm_fit %&gt;% \r\n  select(id, .extracts) %&gt;%                    # get the id and .extracts columns\r\n  unnest(cols = .extracts) %&gt;%                 # unnest .extracts, which produces the model in a list\r\n  mutate(coefs = map(.extracts, tidy)) %&gt;%     # use map() to apply the tidy function and get the coefficients in their own column\r\n  unnest(coefs)                                # unnest the coefs column you just made to get the coefficients for each fold\r\n\r\nmodel_coefs\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.06.40-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-2612\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.06.40-PM-1024x396.png\" alt=\"\" width=\"625\" height=\"242\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.06.40-PM-1024x396.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.06.40-PM-300x116.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.06.40-PM-768x297.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.06.40-PM-624x241.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-6.06.40-PM.png 1350w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>Now that we have a table of estimates, we can plot the coefficient estimates and their 95% confidence intervals. The <strong>term<\/strong> column indicates each variable. We will remove the <strong>(Intercept)<\/strong> for plotting purposes.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Plot the Coefficients<\/strong><\/span><\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## Plot the model coefficients and 2*SE across all folds\r\nmodel_coefs %&gt;%\r\n  filter(term != &quot;(Intercept)&quot;) %&gt;%\r\n  select(id, term, estimate, std.error) %&gt;%\r\n  group_by(term) %&gt;%\r\n  mutate(avg_estimate = mean(estimate)) %&gt;%\r\n  ggplot(aes(x = id, y = estimate)) +\r\n  geom_hline(aes(yintercept = avg_estimate),\r\n             size = 1.2,\r\n             linetype = &quot;dashed&quot;) +\r\n  geom_point(size = 4) +\r\n  geom_errorbar(aes(ymin = estimate - 2*std.error, ymax = estimate + 2*std.error),\r\n                width = 0.1,\r\n                size = 1.2) +\r\n  facet_wrap(~term, scales = &quot;free_y&quot;) +\r\n  labs(x = &quot;CV Folds&quot;,\r\n       y = &quot;Estimate \u00b1 95% CI&quot;,\r\n       title = &quot;Regression Coefficients \u00b1 95% CI for 10-fold CV&quot;,\r\n       subtitle = &quot;Dashed Line = Average Coefficient Estimate over 10 CV Folds per Independent Variable&quot;) +\r\n  theme_classic() +\r\n  theme(strip.background = element_rect(fill = &quot;black&quot;),\r\n        strip.text = element_text(face = &quot;bold&quot;, size = 12, color = &quot;white&quot;),\r\n        axis.title = element_text(size = 14, face = &quot;bold&quot;),\r\n        axis.text.x = element_text(angle = 60, hjust = 1, face = &quot;bold&quot;, size = 12),\r\n        axis.text.y = element_text(face = &quot;bold&quot;, size = 12),\r\n        plot.title = element_text(size = 18),\r\n        plot.subtitle = element_text(size = 16))\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.49.09-PM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-2613\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.49.09-PM-1024x707.png\" alt=\"\" width=\"625\" height=\"432\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.49.09-PM-1024x707.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.49.09-PM-300x207.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.49.09-PM-768x531.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.49.09-PM-624x431.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/07\/Screen-Shot-2022-07-24-at-5.49.09-PM.png 2012w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/p>\n<p>Now we can clearly see the model coefficients and confidence intervals for each of the 10 cross validated folds.<\/p>\n<p><span style=\"text-decoration: underline;\"><strong>Wrapping Up<\/strong><\/span><\/p>\n<p>This was just a quick and easy way of fitting a model using cross-validation to extract out the model coefficients for each fold. Often, this is probably not necessary as you will fit your model, evaluate your model, and be off and running. However, there may be times where more specific interrogation of the model is required or, you might want to dig a little deeper into the various outputs of the cross-validated folds.<\/p>\n<p>All of the code is available on my <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/tidymodels_template\/blob\/main\/tidymodels%20with%20crossvalidation%20-%20extract%20model%20coefficients%20across%20all%20folds.R\">GitHub page<\/a><\/span><\/strong>.<\/p>\n<p>If you notice any errors in code, please reach out!<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As I&#8217;ve discussed previously, we sometimes don&#8217;t have enough data where doing a train\/test split makes sense. As such, we are better off building our model using cross-validation. In previous blog articles, I&#8217;ve talked about how to build models using cross-validation within the {tidymodels} framework (see HERE and HERE). In my prior examples, we fit [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[47,45,43],"tags":[],"class_list":["post-2608","post","type-post","status-publish","format-standard","hentry","category-model-building-in-r","category-r-tips-tricks","category-sports-analytics"],"_links":{"self":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2608","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/comments?post=2608"}],"version-history":[{"count":3,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2608\/revisions"}],"predecessor-version":[{"id":2616,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2608\/revisions\/2616"}],"wp:attachment":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/media?parent=2608"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/categories?post=2608"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/tags?post=2608"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}