In Part 3 of this series we are going to look at including
categorical predictors in our STAN model. Unlike using the
lm() function or one of the STAN wrappers, like
rstanarm or brms, STAN gets tricky with
categorical values because it doesn’t understand character
or factor variable types. To make this work in STAN we need
to convert the categorical variable into an integer group index.
We will continue to use the Palmer Penguins data set.
suppressPackageStartupMessages({
suppressWarnings({
suppressMessages({
library(tidyverse)
library(palmerpenguins)
library(rstanarm)
library(arm)
library(rstan)
library(ggridges)
library(bayesplot)
})
})
})
dat <- penguins %>%
drop_na()
dat %>%
head()
## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen 36.7 19.3 193 3450
## 5 Adelie Torgersen 39.3 20.6 190 3650
## 6 Adelie Torgersen 38.9 17.8 181 3625
## # ℹ 2 more variables: sex <fct>, year <int>
We will code the model in rstanarm to see how the
categorical variables are handled and then attempt to recreate the same
analysis in STAN.
We will specify some priors on our model features – not becuase I think these are good priors (they are weakly informative), but rather, to just get reps specifying priors within STAN and not just relying on the default settings.
You’ll notice that when we set the priors for the betas that we have
4 predictor variables but five priors. This is because the
island variable has 3 levels. One of those levels is the
reference level that is in part of the intercept and the other two
levels obtain beta coefficients that represent their difference to the
reference level (the intercept). Unless specified by the user, the
default in R is that the levels are assigned alphabetically.
Look at the categorical variable levels
levels(dat$island)
## [1] "Biscoe" "Dream" "Torgersen"
str(dat$island)
## Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
We can easily convert the categorical variable into its numeric indexes:
dat$island_num <- as.integer(factor(dat$island))
table(dat$island, dat$island_num)
##
## 1 2 3
## Biscoe 163 0 0
## Dream 0 123 0
## Torgersen 0 0 47
Fit the model
prior_betas <- normal(location = c(0, 0, 0, 0, 0), scale = c(10, 10, 10, 10, 10))
fit_rstanarm <- stan_glm(body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm + island,
prior_intercept = normal(0, 1000),
prior = prior_betas,
prior_aux = cauchy(0, 3),
iter = 2000,
seed = 2244,
cores = 2,
data = dat)
print(fit_rstanarm, digits = 3)
Now we will create the same model in STAN.
Create the data list
island_num variable that we created above
since STAN can’t handle categorical features. So, Island is now taking
on values of 1, 2, 3 for Biscoe, Dream, and Torgersen,
respectfully.# Store data for the model in a list
data_list <- list(
N = nrow(dat),
body_mass = dat$body_mass_g, # outcome variable
bill_length = dat$bill_length_mm,
bill_depth = dat$bill_depth_mm,
flipper_length = dat$flipper_length_mm,
island = dat$island_num,
K = length(unique(dat$island)) # get the number of island groups
)
Write the STAN model
model{} chunk we will specify the same priors
that we did in rstanarm for consistency.stan_code <- "//Stan Model
data{
int<lower=1> N; // Total number of obsevations
vector[N] body_mass; // outcome variable
vector[N] bill_length;
vector[N] bill_depth;
vector[N] flipper_length;
int<lower=1> K; // number of islands
int<lower=1, upper = K> island[N]; // island numeric group index for 1:K
}
parameters {
real alpha; // Intercept
real beta_bill_length; // coef for bill_length
real beta_bill_depth; // coef for bill_depth
real beta_flipper_length; // coef for flipper_length
vector[K] beta_island; // Coef for island (no intercept)
real<lower=0> sigma; // Standard deviation of the error term
}
model {
// Priors
alpha ~ normal(0, 1000); // Weakly informative prior for the intercept
beta_bill_length ~ normal(0, 10); // Weakly informative prior for bill_length
beta_bill_depth ~ normal(0, 10); // Weakly informative prior for bill_depth
beta_flipper_length ~ normal(0, 10); // Weakly informative prior for flipper_length
beta_island ~ normal(0, 10); // Weakly informative prior for island
sigma ~ cauchy(0, 3); // Weakly informative prior for the standard deviation
// Likelihood
for(n in 1:N)
body_mass[n] ~ normal(alpha + beta_bill_length * bill_length[n] + beta_bill_depth * bill_depth[n] + beta_flipper_length * flipper_length[n] + beta_island[island[n]], sigma);
}
"
Compile the model
## Compile & run the model
fit_stan <- stan(model_code = stan_code,
data = data_list,
iter = 2000,
warmup = 500,
chains = 4,
seed = 2244)
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.6)’
## using SDK: ‘MacOSX15.2.sdk’
## clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include '/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/opt/R/x86_64/include -fPIC -falign-functions=64 -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Core:19:
## /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
## 679 | #include <cmath>
## | ^~~~~~~
## 1 error generated.
## make: *** [foo.o] Error 1
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000371 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 3.71 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 1: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 9.03 seconds (Warm-up)
## Chain 1: 14.937 seconds (Sampling)
## Chain 1: 23.967 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 8.6e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.86 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 2: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 8.991 seconds (Warm-up)
## Chain 2: 14.85 seconds (Sampling)
## Chain 2: 23.841 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 8.3e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.83 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 3: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 3: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 24.137 seconds (Warm-up)
## Chain 3: 40.821 seconds (Sampling)
## Chain 3: 64.958 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 8.2e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.82 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 4: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 4: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 10.707 seconds (Warm-up)
## Chain 4: 31.332 seconds (Sampling)
## Chain 4: 42.039 seconds (Total)
## Chain 4:
## get the model coefficients
print(fit_stan, pars = c("alpha", "beta_bill_length", "beta_bill_depth", "beta_flipper_length", "beta_island[1]", "beta_island[2]", "beta_island[3]", "sigma"), probs = c(0.1, 0.5, 0.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1;
## post-warmup draws per chain=1500, total post-warmup draws=6000.
##
## mean se_mean sd 10% 50% 90% n_eff
## alpha -4828.81 6.57 395.39 -5339.72 -4817.96 -4318.81 3624
## beta_bill_length 8.22 0.07 4.66 2.18 8.38 14.14 4798
## beta_bill_depth -6.74 0.12 8.09 -17.21 -6.60 3.51 4467
## beta_flipper_length 43.74 0.03 2.04 41.17 43.71 46.39 3507
## beta_island[1] 10.91 0.12 10.01 -2.09 10.96 23.64 6518
## beta_island[2] -10.56 0.13 9.98 -23.36 -10.47 2.13 6242
## beta_island[3] -1.00 0.12 9.71 -13.53 -0.99 11.48 6770
## sigma 395.23 0.21 16.13 374.58 394.97 416.15 6041
## Rhat
## alpha 1
## beta_bill_length 1
## beta_bill_depth 1
## beta_flipper_length 1
## beta_island[1] 1
## beta_island[2] 1
## beta_island[3] 1
## sigma 1
##
## Samples were drawn using NUTS(diag_e) at Fri May 8 22:32:14 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Let’s compare the coefficients to those produces from our
rstanarm model.
coef(fit_rstanarm)
## (Intercept) bill_length_mm bill_depth_mm flipper_length_mm
## -5688.1002376 6.5768456 3.4201207 47.6022980
## islandDream islandTorgersen
## -9.2553166 -0.6649035
beta_island[1], beta_island[2], and
beta_island[3] for Biscoe, Dream, and Torgersen,
respectfully.rstanarm models## get the first row of data to make a prediction on
new_x <- dat[1, ]
## STAN prediction
-4828.81 + 8.22*new_x$bill_length_mm - 6.74*new_x$bill_depth_mm + 43.74*new_x$flipper_length_mm + 10.91*0 - 10.56*0 - 1.00*1
## [1] 3282.494
## rstanarm prediction
predict(fit_rstanarm, newdata = new_x)
## 1
## 3232.261
rstanarm and
lm, in base R, and have the first level of the categorical
feature (Biscoe) be placed in the intercept? One way to do this is to
directly specify the model matrix.X <- model.matrix(body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm + island, data = dat)
head(X)
## (Intercept) bill_length_mm bill_depth_mm flipper_length_mm islandDream
## 1 1 39.1 18.7 181 0
## 2 1 39.5 17.4 186 0
## 3 1 40.3 18.0 195 0
## 4 1 36.7 19.3 193 0
## 5 1 39.3 20.6 190 0
## 6 1 38.9 17.8 181 0
## islandTorgersen
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
## put the data and model matrix into a list
new_data_list <- list(
N = nrow(dat), # number of observations
X = X, # model matrix
y = dat$body_mass_g, # outcome variable
K = ncol(X) # get the number of model features including the intercept
)
## write the model
new_stan_code <- "//Stan Model
data{
int<lower=1> N; // Total number of obsevations
int<lower=1> K; // number of features
vector[N] y; // outcome variable
matrix[N, K] X; // the specified model matrix
}
parameters {
vector[K] beta; // All parameters in the model matrix
real<lower=0> sigma; // Standard deviation of the error term
}
model {
// Intercept prior
beta[1] ~ normal(0, 1000);
// Other coefficients
beta[2:K] ~ normal(0, 10);
// prior for the model SD
sigma ~ cauchy(0, 3);
// likelihod
y ~ normal(X * beta, sigma);
}
"
new_fit_stan <- stan(model_code = new_stan_code,
data = new_data_list,
iter = 2000,
warmup = 500,
chains = 4,
seed = 2244)
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.6)’
## using SDK: ‘MacOSX15.2.sdk’
## clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include '/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/opt/R/x86_64/include -fPIC -falign-functions=64 -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Core:19:
## /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
## 679 | #include <cmath>
## | ^~~~~~~
## 1 error generated.
## make: *** [foo.o] Error 1
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000105 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.05 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 1: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 1.634 seconds (Warm-up)
## Chain 1: 3.111 seconds (Sampling)
## Chain 1: 4.745 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 1.9e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 2: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 1.382 seconds (Warm-up)
## Chain 2: 2.392 seconds (Sampling)
## Chain 2: 3.774 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 1.9e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 3: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 3: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 1.632 seconds (Warm-up)
## Chain 3: 2.487 seconds (Sampling)
## Chain 3: 4.119 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 2e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 4: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 4: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 1.835 seconds (Warm-up)
## Chain 4: 2.401 seconds (Sampling)
## Chain 4: 4.236 seconds (Total)
## Chain 4:
print(new_fit_stan, digits = 3)
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1;
## post-warmup draws per chain=1500, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75%
## beta[1] -4856.358 6.713 383.490 -5613.692 -5113.096 -4859.899 -4588.403
## beta[2] 7.865 0.069 4.616 -1.254 4.826 7.871 10.974
## beta[3] -7.099 0.116 7.707 -22.189 -12.288 -7.204 -1.838
## beta[4] 44.016 0.035 1.989 40.082 42.658 44.035 45.369
## beta[5] -10.587 0.129 9.823 -30.222 -17.045 -10.580 -4.040
## beta[6] -1.285 0.131 9.783 -20.339 -8.167 -1.256 5.329
## sigma 396.716 0.215 15.660 367.504 385.843 396.221 406.956
## lp__ -2186.843 0.035 1.824 -2191.154 -2187.904 -2186.533 -2185.501
## 97.5% n_eff Rhat
## beta[1] -4122.143 3263 1.000
## beta[2] 16.897 4512 1.001
## beta[3] 7.765 4418 0.999
## beta[4] 47.909 3271 1.000
## beta[5] 8.746 5783 1.001
## beta[6] 18.227 5567 1.000
## sigma 429.034 5301 1.000
## lp__ -2184.199 2719 1.000
##
## Samples were drawn using NUTS(diag_e) at Fri May 8 22:34:14 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Again, we compare the coefficients from our STAN to those of the
rstanarm model.
## rstanarm coefficients
coef_rstanarm <- colMeans(as.matrix(fit_rstanarm))
## STAN coefficients
posterior_new_stan <- as.data.frame(new_fit_stan)
coef_new_stan <- colMeans(posterior_new_stan[, 1:7])
tibble(
parameter = c(colnames(X), "sigma"),
rstanarm = coef_rstanarm,
new_stan = coef_new_stan
)
## # A tibble: 7 × 3
## parameter rstanarm new_stan
## <chr> <dbl> <dbl>
## 1 (Intercept) -5690. -4856.
## 2 bill_length_mm 6.52 7.87
## 3 bill_depth_mm 3.44 -7.10
## 4 flipper_length_mm 47.5 44.0
## 5 islandDream -9.20 -10.6
## 6 islandTorgersen -0.755 -1.28
## 7 sigma 393. 397.
bill_depth_mm coefficient is negative in the new
STAN model but positive in the rstanarm model. and there is
some difference in the coefficients for the Dream and Torgersen
coefficients.These differences in coefficients won’t affect the predictions from the model, as they will come out relativeliy similar.
## STAN prediction
-4856 + 7.87*new_x$bill_length_mm - 7.1*new_x$bill_depth_mm + 44.02*new_x$flipper_length_mm - 10.59*0 - 1.28*1
## [1] 3285.287
## rstanarm predictions
predict(fit_rstanarm, newdata = new_x)
## 1
## 3232.261
rstanarm automatically centers all of the input variables.
If we center all of the input variables we will get much closer, so
let’s try that!## Center the variables
dat <- dat %>%
mutate(across(.cols = bill_length_mm:body_mass_g, ~scale(.x, center = TRUE, scale = FALSE), .names = "{col}_c"))
## Create the model matrix with centered values
X <- model.matrix(body_mass_g_c ~ bill_length_mm_c + bill_depth_mm_c + flipper_length_mm_c + island, data = dat)
head(X)
## (Intercept) bill_length_mm_c bill_depth_mm_c flipper_length_mm_c islandDream
## 1 1 -4.892793 1.5351351 -19.966967 0
## 2 1 -4.492793 0.2351351 -14.966967 0
## 3 1 -3.692793 0.8351351 -5.966967 0
## 4 1 -7.292793 2.1351351 -7.966967 0
## 5 1 -4.692793 3.4351351 -10.966967 0
## 6 1 -5.092793 0.6351351 -19.966967 0
## islandTorgersen
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
Create a new data list with the centered variables
## put the data and model matrix into a list
new_data_list <- list(
N = nrow(dat), # number of observations
X = X, # model matrix
y = as.numeric(dat$body_mass_g), # centered outcome variable
K = ncol(X) # get the number of model features including the intercept
)
## write the model
new_stan_code <- "//Stan Model
data{
int<lower=1> N; // Total number of obsevations
int<lower=1> K; // number of features
vector[N] y; // outcome variable
matrix[N, K] X; // the specified model matrix
}
parameters {
vector[K] beta; // All parameters in the model matrix
real<lower=0> sigma; // Standard deviation of the error term
}
model {
// Intercept prior
beta[1] ~ normal(0, 1000);
// Other coefficients
beta[2:K] ~ normal(0, 10);
// prior for the model SD
sigma ~ cauchy(0, 3);
// likelihod
y ~ normal(X * beta, sigma);
}
"
## fit the model
new_fit_stan <- stan(model_code = new_stan_code,
data = new_data_list,
iter = 2000,
warmup = 500,
chains = 4,
seed = 2244)
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 4.8e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 1: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 1.341 seconds (Warm-up)
## Chain 1: 0.3 seconds (Sampling)
## Chain 1: 1.641 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 1.9e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 2: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 1.148 seconds (Warm-up)
## Chain 2: 0.182 seconds (Sampling)
## Chain 2: 1.33 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 1.6e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 3: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 3: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 1.091 seconds (Warm-up)
## Chain 3: 0.175 seconds (Sampling)
## Chain 3: 1.266 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 1.7e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 4: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 4: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.976 seconds (Warm-up)
## Chain 4: 0.168 seconds (Sampling)
## Chain 4: 1.144 seconds (Total)
## Chain 4:
print(new_fit_stan, digits = 3)
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1;
## post-warmup draws per chain=1500, total post-warmup draws=6000.
##
## mean se_mean sd 2.5% 25% 50% 75%
## beta[1] 4208.839 0.229 21.727 4166.262 4193.931 4208.724 4224.214
## beta[2] 6.456 0.060 4.558 -2.592 3.343 6.479 9.578
## beta[3] 3.533 0.093 7.829 -11.675 -1.876 3.581 8.947
## beta[4] 47.526 0.027 2.068 43.521 46.142 47.503 48.923
## beta[5] -9.209 0.101 9.795 -28.299 -15.946 -9.148 -2.543
## beta[6] -0.768 0.103 9.843 -20.024 -7.463 -0.796 5.940
## sigma 392.775 0.150 15.191 364.337 382.302 392.137 402.763
## lp__ -2181.873 0.034 1.815 -2186.273 -2182.872 -2181.581 -2180.520
## 97.5% n_eff Rhat
## beta[1] 4250.511 8990 1.000
## beta[2] 15.369 5791 1.000
## beta[3] 18.691 7120 1.000
## beta[4] 51.610 5675 1.000
## beta[5] 9.755 9421 1.000
## beta[6] 18.615 9211 1.000
## sigma 423.653 10200 1.000
## lp__ -2179.281 2809 1.001
##
## Samples were drawn using NUTS(diag_e) at Fri May 8 22:34:20 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Compare the coefficients to the rstanarm model
# rstanarm
coef_rstanarm <- colMeans(as.matrix(fit_rstanarm))
# stan
posterior_new_stan <- as.data.frame(new_fit_stan)
coef_new_stan <- colMeans(posterior_new_stan[, 1:7])
tibble(
parameter = c(colnames(X), "sigma"),
rstanarm = coef_rstanarm,
new_stan = coef_new_stan
)
## # A tibble: 7 × 3
## parameter rstanarm new_stan
## <chr> <dbl> <dbl>
## 1 (Intercept) -5690. 4209.
## 2 bill_length_mm_c 6.52 6.46
## 3 bill_depth_mm_c 3.44 3.53
## 4 flipper_length_mm_c 47.5 47.5
## 5 islandDream -9.20 -9.21
## 6 islandTorgersen -0.755 -0.768
## 7 sigma 393. 393.
rstanarm model.
The coefficients, however, are the relatively the same and we draw
similar inferences about the difference between the Dream and Torgersen
islands and Biscoe (represented in the intercept).## Get the first row of the data again so that you have the centered variables
new_x <- dat[1, ]
## STAN prediction
4209 + 6.46*as.numeric(new_x$bill_length_mm_c) + 3.53*as.numeric(new_x$bill_depth_mm_c) + 47.5*as.numeric(new_x$flipper_length_mm_c) - 9.21*0 - 0.768*1
## [1] 3233.613
## rstanarm prediction
predict(fit_rstanarm, newdata = new_x)
## 1
## 3232.261
Nearly identical!
One short tangent here before we move on to an alternative way of handling categorical features in STAN.
The model matrix approach works really well if you are building fixed effects models as it is really easy to specify the model matrix and entering it into the data list rather than remembering to specify every variable and their corresponding variable type. That said, with this simplicity comes the limitation that I alluded to above. If we are building more complex models or a heirarchical model, this approach is going to struggle. So, getting comfortable coding each aspect of the model (which we do below) will be important too.
In the model{} chunk above we placed the same prior
on all of our slope features, beta[2:K] ~ normal(0, 10);.
This is fine because that is what we specified in rstanarm.
However, if you are fitting a fixed effects model and you want to place
a different prior on different coefficients, you can do this by simply
indexing them like so:
beta[1] ~ normal(0, 1000); // intercept
beta[2:4] ~ normal(0, 2); // continuous predictors
beta[5:6] ~ normal(0, 20); // island dummy vars
The final approach to handling categorical features is to use feature
indexing. Instead of specifying the model matrix, we are going to
specify our 3 island levels and then in the model{} chunk,
before setting priors, we are going to set the first island level,
Biscoe, to 0 so that it is the reference level in the intercept. We
again use the centered input features to be consistent with
rstanarm.
data_list <- list(
N = nrow(dat),
body_mass = dat$body_mass_g, # outcome variable
bill_length = as.numeric(dat$bill_length_mm_c),
bill_depth = as.numeric(dat$bill_depth_mm_c),
flipper_length = as.numeric(dat$flipper_length_mm_c),
island = dat$island_num,
K = length(unique(dat$island)) # get the number of island groups
)
stan_code2 <- "//Stan Model
data{
int<lower=1> N; // Total number of obsevations
vector[N] body_mass; // outcome variable
vector[N] bill_length;
vector[N] bill_depth;
vector[N] flipper_length;
int<lower=1> K; // number of islands
int<lower=1, upper = K> island[N]; // island numeric group index for 1:K
}
parameters {
real alpha; // Intercept
real beta_bill_length; // coef for bill_length
real beta_bill_depth; // coef for bill_depth
real beta_flipper_length; // coef for flipper_length
vector[K-1] beta_island; // Coef for island (no intercept)
real<lower=0> sigma; // Standard deviation of the error term
}
model {
// create a new vector for the categorical variable
vector[K] beta_ref;
// set the first level of the new categorical variable to 0
beta_ref[1] = 0; // reference level set in the intercept
// reset the other categorical variables so that it is now 0, 1, 2 instead of 1, 2, 3
for (k in 2:K) {
beta_ref[k] = beta_island[k - 1];
}
// Priors
alpha ~ normal(0, 1000); // Weakly informative prior for the intercept
beta_bill_length ~ normal(0, 10); // Weakly informative prior for bill_length
beta_bill_depth ~ normal(0, 10); // Weakly informative prior for bill_depth
beta_flipper_length ~ normal(0, 10); // Weakly informative prior for flipper_length
beta_island ~ normal(0, 10); // Weakly informative prior for island
sigma ~ cauchy(0, 3); // Weakly informative prior for the standard deviation
// Likelihood
for(n in 1:N)
body_mass[n] ~ normal(alpha + beta_bill_length * bill_length[n] + beta_bill_depth * bill_depth[n] + beta_flipper_length * flipper_length[n] + beta_ref[island[n]], sigma);
}
"
## Compile & run the model
fit_stan2 <- stan(model_code = stan_code2,
data = data_list,
iter = 2000,
warmup = 500,
chains = 4,
seed = 2244)
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.6)’
## using SDK: ‘MacOSX15.2.sdk’
## clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/Rcpp/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/unsupported" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/src/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppParallel/include/" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG -DBOOST_DISABLE_ASSERTS -DBOOST_PENDING_INTEGER_LOG2_HPP -DSTAN_THREADS -DUSE_STANC3 -DSTRICT_R_HEADERS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -D_HAS_AUTO_PTR_ETC=0 -include '/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp' -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1 -I/opt/R/x86_64/include -fPIC -falign-functions=64 -Wall -g -O2 -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Core:19:
## /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
## 679 | #include <cmath>
## | ^~~~~~~
## 1 error generated.
## make: *** [foo.o] Error 1
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000169 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.69 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 1: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 5.882 seconds (Warm-up)
## Chain 1: 0.875 seconds (Sampling)
## Chain 1: 6.757 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 6.6e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.66 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 2: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 5.054 seconds (Warm-up)
## Chain 2: 1.051 seconds (Sampling)
## Chain 2: 6.105 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 7.4e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.74 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 3: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 3: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 5.403 seconds (Warm-up)
## Chain 3: 0.984 seconds (Sampling)
## Chain 3: 6.387 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 7.1e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.71 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 4: Iteration: 700 / 2000 [ 35%] (Sampling)
## Chain 4: Iteration: 900 / 2000 [ 45%] (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%] (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%] (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%] (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 8.67 seconds (Warm-up)
## Chain 4: 1.194 seconds (Sampling)
## Chain 4: 9.864 seconds (Total)
## Chain 4:
print(fit_stan2, probs = c(0.1, 0.5, 0.9))
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1;
## post-warmup draws per chain=1500, total post-warmup draws=6000.
##
## mean se_mean sd 10% 50% 90% n_eff
## alpha 4208.22 0.24 22.10 4179.34 4208.17 4236.20 8373
## beta_bill_length 6.49 0.05 4.61 0.65 6.46 12.49 7111
## beta_bill_depth 3.50 0.09 8.00 -6.84 3.62 13.66 7597
## beta_flipper_length 47.50 0.03 2.05 44.89 47.52 50.15 6163
## beta_island[1] -9.24 0.11 9.83 -21.80 -9.22 3.40 8673
## beta_island[2] -0.73 0.10 9.87 -13.40 -0.77 11.98 10145
## sigma 392.99 0.17 15.23 373.91 392.66 412.39 7901
## lp__ -2181.94 0.04 1.84 -2184.42 -2181.67 -2179.88 2688
## Rhat
## alpha 1
## beta_bill_length 1
## beta_bill_depth 1
## beta_flipper_length 1
## beta_island[1] 1
## beta_island[2] 1
## sigma 1
## lp__ 1
##
## Samples were drawn using NUTS(diag_e) at Fri May 8 22:36:16 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
Again, we compare the STAN coefficients to the rstanarm
coefficients
posterior_stan2 <- as.data.frame(fit_stan2)
coef_stan2 <- colMeans(posterior_stan2[, 1:7])
tibble(
parameter = c(colnames(X), "sigma"),
rstanarm = coef_rstanarm,
stan2 = coef_stan2
)
## # A tibble: 7 × 3
## parameter rstanarm stan2
## <chr> <dbl> <dbl>
## 1 (Intercept) -5690. 4208.
## 2 bill_length_mm_c 6.52 6.49
## 3 bill_depth_mm_c 3.44 3.50
## 4 flipper_length_mm_c 47.5 47.5
## 5 islandDream -9.20 -9.24
## 6 islandTorgersen -0.755 -0.726
## 7 sigma 393. 393.
rstanarm.
However, like the model matrix approach above, the intercept is
positive. This is simply representing the average body mass of a penguin
in the population, which is directly interpretable (unlike the negative
intercept in the rstanarm model).mean(dat$body_mass_g)
## [1] 4207.057
This was a long one! We covered a lot of ground in coding categorical variables into our STAN models. We also discussed a few different approaches to achieving the same task, which gives us a number of options going forward, when we work with more complicated data structures.