{"id":2264,"date":"2022-03-16T15:32:34","date_gmt":"2022-03-16T15:32:34","guid":{"rendered":"http:\/\/optimumsportsperformance.com\/blog\/?p=2264"},"modified":"2022-11-08T03:38:06","modified_gmt":"2022-11-08T03:38:06","slug":"fitting-saving-and-deploying-tidymodels-with-cross-validated-data","status":"publish","type":"post","link":"https:\/\/optimumsportsperformance.com\/blog\/fitting-saving-and-deploying-tidymodels-with-cross-validated-data\/","title":{"rendered":"Fitting, Saving, and Deploying tidymodels with Cross Validated Data"},"content":{"rendered":"<p>I&#8217;ve talked about {tidymodels} previously when I laid out a <span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/tidymodels-model-fitting-template\/\">{tidymodels} model fitting template<\/a>, <\/strong><span style=\"color: #000000;\">which serves as a framework to wrap up the 10 series screen cast we did on {tidymodels} for <span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/optimumsportsperformance.com\/blog\/tidyx-77-intro-to-tidymodels\/\">Tidy Explained<\/a><\/strong><\/span>.<\/span><\/span><\/p>\n<p>During all 10 of our episodes, within my model fitting template, and pretty much every single tutorial I&#8217;ve seen online, people follow the same initial steps, which are to split the data into a training and testing set and then split the training data into cross validation sets.<\/p>\n<p>This approach is fine when you have enough data to actually perform a training and testing split. But, there are times where we don&#8217;t really have enough data to do this, meaning we are fitting a model to a small training set and then <em>hoping<\/em> it picks up all of the necessary information in order to generalize well to external data.<\/p>\n<p>In these instances, we may prefer to use all of our available data, split it into cross validation sets, fit and test the model, and then save the model workflow so that it can be deployed later on and used in production.<\/p>\n<p>To cover this issue, I&#8217;ve put together a template for taking a data set, creating cross validation folds, fitting the model, and then saving the model. The code has both a regression and random forest classification model on the <strong><em>mtcars<\/em><\/strong> data set. I&#8217;ll only show the regression example below, but all code is available on my <span style=\"color: #0000ff;\"><strong><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/tidymodels_template\/blob\/main\/tidymodels%20with%20cross-validation%20only.R\">GITHUB<\/a><\/strong><\/span> page.<\/p>\n<p><strong>Load Packages &amp; Data<\/strong><\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n### load packages\r\nlibrary(tidymodels)\r\nlibrary(tidyverse)\r\n\r\n############ Regression Example ############\r\n### get data\r\ndf &lt;- mtcars\r\nhead(df)\r\n\r\n<\/pre>\n<p><strong>Create Cross\u00a0 Validation Folds &amp; Specify Linear Model<\/strong><\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n### cross validation folds\r\ndf_cv &lt;- vfold_cv(df, v = 10)\r\ndf_cv\r\n\r\n### specify linear model\r\nlm_spec &lt;- linear_reg() %&gt;%\r\n  set_engine(&quot;lm&quot;) %&gt;%\r\n  set_mode(&quot;regression&quot;)\r\n<\/pre>\n<p><strong>Create the Model Recipe and Workflow<\/strong><\/p>\n<p>To keep things simple, I wont do any pre-processing of the data. I&#8217;ll just set the recipe with the regression model I am fitting.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n### recipe\r\nmpg_rec &lt;- recipe(mpg ~ cyl + disp + wt, data = df)\r\nmpg_rec\r\n\r\n### workflow\r\nmpg_wf &lt;- workflow() %&gt;%\r\n  add_recipe(mpg_rec) %&gt;%\r\n  add_model(lm_spec)\r\n<\/pre>\n<p><strong>Control Function to Save Predictions<\/strong><\/p>\n<p>To save our model predictions using only cross-validated folds, we need to set a control function that will be passed as an argument when we fit our model. Without this argument, we can fit the model using the cross-validated folds but we wont be able to extract the predictions.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n### set a control function to save the predictions from the model fit to the CV-folds\r\nctrl &lt;- control_resamples(save_pred = TRUE)\r\n\r\n<\/pre>\n<p><strong>Fit the Model<\/strong><\/p>\n<p><strong><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.37.41-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-2267\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.37.41-AM-1024x479.png\" alt=\"\" width=\"625\" height=\"292\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.37.41-AM-1024x479.png 1024w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.37.41-AM-300x140.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.37.41-AM-768x359.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.37.41-AM-624x292.png 624w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.37.41-AM.png 1094w\" sizes=\"auto, (max-width: 625px) 100vw, 625px\" \/><\/a><\/strong><\/p>\n<p><strong>Evaluate model performance<\/strong><\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n### view model metrics\r\ncollect_metrics(mpg_lm)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.38.49-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2268\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.38.49-AM.png\" alt=\"\" width=\"908\" height=\"184\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.38.49-AM.png 908w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.38.49-AM-300x61.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.38.49-AM-768x156.png 768w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.38.49-AM-624x126.png 624w\" sizes=\"auto, (max-width: 908px) 100vw, 908px\" \/><\/a><\/p>\n<p><strong>Unnest the .predictions column from the model fit and look at the predicted mpg versus actual mpg<br \/>\n<\/strong><\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n### get predictions\r\nmpg_lm %&gt;%\r\n  unnest(cols = .predictions) %&gt;%\r\n  select(.pred, mpg)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.40.51-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2269\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.40.51-AM.png\" alt=\"\" width=\"189\" height=\"289\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.40.51-AM.png 318w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.40.51-AM-196x300.png 196w\" sizes=\"auto, (max-width: 189px) 100vw, 189px\" \/><\/a><\/p>\n<p><strong>Fit the final model and extract the workflow<\/strong><\/p>\n<p>If we are happy with our model performance and the workflow that we&#8217;ve built (which contains our pre-processing steps) we can fit final model to the data set.<\/p>\n<p>To do this, we use the function <strong>fit()<\/strong> and pass it our data set and then we use <strong>extract_fit_parsnip()<\/strong> to extract the workflow that you&#8217;ve created. Then save the workflow as an RDA file to be loaded and used at a later time.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\n## Fit the final model &amp; extract the workflow\r\nmpg_final &lt;- mpg_wf %&gt;% \r\n  fit(df) %&gt;%\r\n  extract_fit_parsnip()\r\n\r\nmpg_final\r\n\r\n## Save model to use later\r\n# save(mpg_final, file = &quot;mpg_final.rda&quot;)\r\n<\/pre>\n<p><a href=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.51.49-AM.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-2270\" src=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.51.49-AM.png\" alt=\"\" width=\"481\" height=\"223\" srcset=\"https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.51.49-AM.png 756w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.51.49-AM-300x139.png 300w, https:\/\/optimumsportsperformance.com\/blog\/wp-content\/uploads\/2022\/03\/Screen-Shot-2022-03-16-at-8.51.49-AM-624x289.png 624w\" sizes=\"auto, (max-width: 481px) 100vw, 481px\" \/><\/a><\/p>\n<p>To access all of the code for this template and see an example with a random forest classifier go to my <strong><span style=\"color: #0000ff;\"><a style=\"color: #0000ff;\" href=\"https:\/\/github.com\/pw2\/tidymodels_template\/blob\/main\/tidymodels%20with%20cross-validation%20only.R\">GITHUB<\/a><\/span><\/strong> page.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve talked about {tidymodels} previously when I laid out a {tidymodels} model fitting template, which serves as a framework to wrap up the 10 series screen cast we did on {tidymodels} for Tidy Explained. During all 10 of our episodes, within my model fitting template, and pretty much every single tutorial I&#8217;ve seen online, people [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[47,45],"tags":[],"class_list":["post-2264","post","type-post","status-publish","format-standard","hentry","category-model-building-in-r","category-r-tips-tricks"],"_links":{"self":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2264","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/comments?post=2264"}],"version-history":[{"count":4,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2264\/revisions"}],"predecessor-version":[{"id":2272,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/posts\/2264\/revisions\/2272"}],"wp:attachment":[{"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/media?parent=2264"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/categories?post=2264"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/optimumsportsperformance.com\/blog\/wp-json\/wp\/v2\/tags?post=2264"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}