Previous Parts in the Series

Intro

In Part 3 of this series we are going to look at including categorical predictors in our STAN model. Unlike using the lm() function or one of the STAN wrappers, like rstanarm or brms, STAN gets tricky with categorical values because it doesn’t understand character or factor variable types. To make this work in STAN we need to convert the categorical variable into an integer group index.

Load Data

We will continue to use the Palmer Penguins data set.

suppressPackageStartupMessages({
  suppressWarnings({
    suppressMessages({
      library(tidyverse)
      library(palmerpenguins)
      library(rstanarm)
      library(arm)
      library(rstan)
      library(ggridges)
      library(bayesplot)
    })
  })
})


dat <- penguins %>%
  drop_na()

dat %>%
  head()

## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           36.7          19.3               193        3450
## 5 Adelie  Torgersen           39.3          20.6               190        3650
## 6 Adelie  Torgersen           38.9          17.8               181        3625
## # ℹ 2 more variables: sex <fct>, year <int>

Fitting the model in rstanarm

We will code the model in rstanarm to see how the categorical variables are handled and then attempt to recreate the same analysis in STAN.

We will specify some priors on our model features – not becuase I think these are good priors (they are weakly informative), but rather, to just get reps specifying priors within STAN and not just relying on the default settings.

You’ll notice that when we set the priors for the betas that we have 4 predictor variables but five priors. This is because the island variable has 3 levels. One of those levels is the reference level that is in part of the intercept and the other two levels obtain beta coefficients that represent their difference to the reference level (the intercept). Unless specified by the user, the default in R is that the levels are assigned alphabetically.

Look at the categorical variable levels

levels(dat$island)

## [1] "Biscoe"    "Dream"     "Torgersen"

str(dat$island)

##  Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...

We can easily convert the categorical variable into its numeric indexes:

dat$island_num <- as.integer(factor(dat$island))

table(dat$island, dat$island_num)

##            
##               1   2   3
##   Biscoe    163   0   0
##   Dream       0 123   0
##   Torgersen   0   0  47

Fit the model

prior_betas <- normal(location = c(0, 0, 0, 0, 0), scale = c(10, 10, 10, 10, 10))

fit_rstanarm <- stan_glm(body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm + island, 
                         prior_intercept = normal(0, 1000),
                         prior = prior_betas,
                         prior_aux = cauchy(0, 3),
                         iter = 2000,
                         seed = 2244,
                         cores = 2,
                         data = dat)

print(fit_rstanarm, digits = 3)

Fitting the model in STAN

Now we will create the same model in STAN.

Create the data list

Recall that STAN accepts data in list format
We use the island_num variable that we created above since STAN can’t handle categorical features. So, Island is now taking on values of 1, 2, 3 for Biscoe, Dream, and Torgersen, respectfully.

# Store data for the model in a list
data_list <- list(
  N = nrow(dat),
  body_mass = dat$body_mass_g, # outcome variable
  bill_length = dat$bill_length_mm,
  bill_depth = dat$bill_depth_mm,
  flipper_length = dat$flipper_length_mm,
  island = dat$island_num,
  K = length(unique(dat$island)) # get the number of island groups
)

Write the STAN model

In the model{} chunk we will specify the same priors that we did in rstanarm for consistency.

stan_code <- "//Stan Model

data{
  int<lower=1> N;        // Total number of obsevations
  vector[N] body_mass;    // outcome variable
  vector[N] bill_length;
  vector[N] bill_depth;
  vector[N] flipper_length;
  int<lower=1> K;   // number of islands
  int<lower=1, upper = K> island[N];  // island numeric group index for 1:K
}

parameters {
  real alpha;                        // Intercept
  real beta_bill_length;            // coef for bill_length
  real beta_bill_depth;            // coef for bill_depth
  real beta_flipper_length;       // coef for flipper_length
  vector[K] beta_island;         // Coef for island (no intercept)
  real<lower=0> sigma;          // Standard deviation of the error term
}


model {
  // Priors
  alpha ~ normal(0, 1000);                     // Weakly informative prior for the intercept
  beta_bill_length ~ normal(0, 10);          // Weakly informative prior for bill_length
  beta_bill_depth ~ normal(0, 10);          // Weakly informative prior for bill_depth
  beta_flipper_length ~ normal(0, 10);     // Weakly informative prior for flipper_length
  beta_island ~ normal(0, 10);            // Weakly informative prior for island
  sigma ~ cauchy(0, 3);                 // Weakly informative prior for the standard deviation

  // Likelihood
  for(n in 1:N)
   body_mass[n] ~ normal(alpha + beta_bill_length * bill_length[n] + beta_bill_depth * bill_depth[n] + beta_flipper_length * flipper_length[n] + beta_island[island[n]], sigma);
  }
"

Compile the model

## Compile & run the model
fit_stan <- stan(model_code = stan_code,
            data = data_list,
            iter = 2000,
            warmup = 500,
            chains = 4,
            seed = 2244)

## Trying to compile a simple C file

## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.6)’
## using SDK: ‘MacOSX15.2.sdk’
## clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/x86_64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Core:19:
## /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
##   679 | #include <cmath>
##       |          ^~~~~~~
## 1 error generated.
## make: *** [foo.o] Error 1
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000371 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 3.71 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 1: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 1: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 9.03 seconds (Warm-up)
## Chain 1:                14.937 seconds (Sampling)
## Chain 1:                23.967 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 8.6e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.86 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 2: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 2: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 8.991 seconds (Warm-up)
## Chain 2:                14.85 seconds (Sampling)
## Chain 2:                23.841 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 8.3e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.83 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 3: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 3: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 24.137 seconds (Warm-up)
## Chain 3:                40.821 seconds (Sampling)
## Chain 3:                64.958 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 8.2e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.82 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 4: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 4: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 10.707 seconds (Warm-up)
## Chain 4:                31.332 seconds (Sampling)
## Chain 4:                42.039 seconds (Total)
## Chain 4:

## get the model coefficients
print(fit_stan, pars = c("alpha", "beta_bill_length", "beta_bill_depth", "beta_flipper_length", "beta_island[1]", "beta_island[2]", "beta_island[3]", "sigma"), probs = c(0.1, 0.5, 0.9))

## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1; 
## post-warmup draws per chain=1500, total post-warmup draws=6000.
## 
##                         mean se_mean     sd      10%      50%      90% n_eff
## alpha               -4828.81    6.57 395.39 -5339.72 -4817.96 -4318.81  3624
## beta_bill_length        8.22    0.07   4.66     2.18     8.38    14.14  4798
## beta_bill_depth        -6.74    0.12   8.09   -17.21    -6.60     3.51  4467
## beta_flipper_length    43.74    0.03   2.04    41.17    43.71    46.39  3507
## beta_island[1]         10.91    0.12  10.01    -2.09    10.96    23.64  6518
## beta_island[2]        -10.56    0.13   9.98   -23.36   -10.47     2.13  6242
## beta_island[3]         -1.00    0.12   9.71   -13.53    -0.99    11.48  6770
## sigma                 395.23    0.21  16.13   374.58   394.97   416.15  6041
##                     Rhat
## alpha                  1
## beta_bill_length       1
## beta_bill_depth        1
## beta_flipper_length    1
## beta_island[1]         1
## beta_island[2]         1
## beta_island[3]         1
## sigma                  1
## 
## Samples were drawn using NUTS(diag_e) at Fri May  8 22:32:14 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Let’s compare the coefficients to those produces from our rstanarm model.

coef(fit_rstanarm)

##       (Intercept)    bill_length_mm     bill_depth_mm flipper_length_mm 
##     -5688.1002376         6.5768456         3.4201207        47.6022980 
##       islandDream   islandTorgersen 
##        -9.2553166        -0.6649035

Hmm…something looks weird? The issue is that STAN doesn’t take categorical features and thus does not index the first island level in the intercept. Therefore, we have a coefficient for each of the three islands, beta_island[1], beta_island[2], and beta_island[3] for Biscoe, Dream, and Torgersen, respectfully.
Despite this, the two models will still produce similar predictions. Let’s get the first row of data and make a prediction on it using the STAN and rstanarm models

## get the first row of data to make a prediction on
new_x <- dat[1, ]

## STAN prediction
-4828.81 + 8.22*new_x$bill_length_mm - 6.74*new_x$bill_depth_mm + 43.74*new_x$flipper_length_mm + 10.91*0 - 10.56*0 - 1.00*1

## [1] 3282.494

## rstanarm prediction
predict(fit_rstanarm, newdata = new_x)

##        1 
## 3232.261

The values are relatively close but not exact. Some of this is due to Markov chain sampling. But, we are in the ballpark.

Specifying the Model Matrix

What if we want to mimic the behavior of rstanarm and lm, in base R, and have the first level of the categorical feature (Biscoe) be placed in the intercept? One way to do this is to directly specify the model matrix.
Now, instead of putting each feature into our data list, we simply place our model matrix! This makes model specification very clean and easy. There is a limitation to this that we will talk about later, but for simple models this is a nice way of doing business.

X <- model.matrix(body_mass_g ~ bill_length_mm + bill_depth_mm + flipper_length_mm + island, data = dat)
head(X)

##   (Intercept) bill_length_mm bill_depth_mm flipper_length_mm islandDream
## 1           1           39.1          18.7               181           0
## 2           1           39.5          17.4               186           0
## 3           1           40.3          18.0               195           0
## 4           1           36.7          19.3               193           0
## 5           1           39.3          20.6               190           0
## 6           1           38.9          17.8               181           0
##   islandTorgersen
## 1               1
## 2               1
## 3               1
## 4               1
## 5               1
## 6               1

## put the data and model matrix into a list
new_data_list <- list(
  N = nrow(dat),          # number of observations
  X = X,                  # model matrix
  y = dat$body_mass_g,    # outcome variable
  K = ncol(X)             # get the number of model features including the intercept
)

## write the model
new_stan_code <- "//Stan Model

data{
  int<lower=1> N;        // Total number of obsevations
  int<lower=1> K;       // number of features
  vector[N] y;    // outcome variable
  matrix[N, K] X;        // the specified model matrix
}

parameters {
  vector[K] beta;                // All parameters in the model matrix
  real<lower=0> sigma;          // Standard deviation of the error term
}


model {
  // Intercept prior
  beta[1] ~ normal(0, 1000);

  // Other coefficients
  beta[2:K] ~ normal(0, 10);
  
  // prior for the model SD
  sigma ~ cauchy(0, 3);
  
  // likelihod
  y ~ normal(X * beta, sigma);
  }
"

new_fit_stan <- stan(model_code = new_stan_code,
            data = new_data_list,
            iter = 2000,
            warmup = 500,
            chains = 4,
            seed = 2244)

## Trying to compile a simple C file

## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.6)’
## using SDK: ‘MacOSX15.2.sdk’
## clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/x86_64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Core:19:
## /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
##   679 | #include <cmath>
##       |          ^~~~~~~
## 1 error generated.
## make: *** [foo.o] Error 1
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000105 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.05 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 1: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 1: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 1.634 seconds (Warm-up)
## Chain 1:                3.111 seconds (Sampling)
## Chain 1:                4.745 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 1.9e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 2: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 2: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 1.382 seconds (Warm-up)
## Chain 2:                2.392 seconds (Sampling)
## Chain 2:                3.774 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 1.9e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 3: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 3: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 1.632 seconds (Warm-up)
## Chain 3:                2.487 seconds (Sampling)
## Chain 3:                4.119 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 2e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 4: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 4: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 1.835 seconds (Warm-up)
## Chain 4:                2.401 seconds (Sampling)
## Chain 4:                4.236 seconds (Total)
## Chain 4:

print(new_fit_stan, digits = 3)

## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1; 
## post-warmup draws per chain=1500, total post-warmup draws=6000.
## 
##              mean se_mean      sd      2.5%       25%       50%       75%
## beta[1] -4856.358   6.713 383.490 -5613.692 -5113.096 -4859.899 -4588.403
## beta[2]     7.865   0.069   4.616    -1.254     4.826     7.871    10.974
## beta[3]    -7.099   0.116   7.707   -22.189   -12.288    -7.204    -1.838
## beta[4]    44.016   0.035   1.989    40.082    42.658    44.035    45.369
## beta[5]   -10.587   0.129   9.823   -30.222   -17.045   -10.580    -4.040
## beta[6]    -1.285   0.131   9.783   -20.339    -8.167    -1.256     5.329
## sigma     396.716   0.215  15.660   367.504   385.843   396.221   406.956
## lp__    -2186.843   0.035   1.824 -2191.154 -2187.904 -2186.533 -2185.501
##             97.5% n_eff  Rhat
## beta[1] -4122.143  3263 1.000
## beta[2]    16.897  4512 1.001
## beta[3]     7.765  4418 0.999
## beta[4]    47.909  3271 1.000
## beta[5]     8.746  5783 1.001
## beta[6]    18.227  5567 1.000
## sigma     429.034  5301 1.000
## lp__    -2184.199  2719 1.000
## 
## Samples were drawn using NUTS(diag_e) at Fri May  8 22:34:14 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Again, we compare the coefficients from our STAN to those of the rstanarm model.

## rstanarm coefficients
coef_rstanarm <- colMeans(as.matrix(fit_rstanarm))

## STAN coefficients
posterior_new_stan <- as.data.frame(new_fit_stan)
coef_new_stan <- colMeans(posterior_new_stan[, 1:7])

tibble(
  parameter = c(colnames(X), "sigma"),
  rstanarm = coef_rstanarm,
  new_stan = coef_new_stan
)

## # A tibble: 7 × 3
##   parameter          rstanarm new_stan
##   <chr>                 <dbl>    <dbl>
## 1 (Intercept)       -5690.    -4856.  
## 2 bill_length_mm        6.52      7.87
## 3 bill_depth_mm         3.44     -7.10
## 4 flipper_length_mm    47.5      44.0 
## 5 islandDream          -9.20    -10.6 
## 6 islandTorgersen      -0.755    -1.28
## 7 sigma               393.      397.

The coefficients look much closer now that the Biscoe Island is contained in the intercept.
The bill_depth_mm coefficient is negative in the new STAN model but positive in the rstanarm model. and there is some difference in the coefficients for the Dream and Torgersen coefficients.

These differences in coefficients won’t affect the predictions from the model, as they will come out relativeliy similar.

## STAN prediction
-4856 + 7.87*new_x$bill_length_mm - 7.1*new_x$bill_depth_mm + 44.02*new_x$flipper_length_mm - 10.59*0 - 1.28*1

## [1] 3285.287

## rstanarm predictions
predict(fit_rstanarm, newdata = new_x)

##        1 
## 3232.261

The reason for the coefficient differences is because rstanarm automatically centers all of the input variables. If we center all of the input variables we will get much closer, so let’s try that!

## Center the variables
dat <- dat %>%
  mutate(across(.cols = bill_length_mm:body_mass_g, ~scale(.x, center = TRUE, scale = FALSE), .names = "{col}_c"))

## Create the model matrix with centered values
X <- model.matrix(body_mass_g_c ~ bill_length_mm_c + bill_depth_mm_c + flipper_length_mm_c + island, data = dat)
head(X)

##   (Intercept) bill_length_mm_c bill_depth_mm_c flipper_length_mm_c islandDream
## 1           1        -4.892793       1.5351351          -19.966967           0
## 2           1        -4.492793       0.2351351          -14.966967           0
## 3           1        -3.692793       0.8351351           -5.966967           0
## 4           1        -7.292793       2.1351351           -7.966967           0
## 5           1        -4.692793       3.4351351          -10.966967           0
## 6           1        -5.092793       0.6351351          -19.966967           0
##   islandTorgersen
## 1               1
## 2               1
## 3               1
## 4               1
## 5               1
## 6               1

Create a new data list with the centered variables

## put the data and model matrix into a list
new_data_list <- list(
  N = nrow(dat),          # number of observations
  X = X,                  # model matrix
  y = as.numeric(dat$body_mass_g),  # centered outcome variable
  K = ncol(X)             # get the number of model features including the intercept
)

## write the model
new_stan_code <- "//Stan Model

data{
  int<lower=1> N;        // Total number of obsevations
  int<lower=1> K;       // number of features
  vector[N] y;    // outcome variable
  matrix[N, K] X;        // the specified model matrix
}

parameters {
  vector[K] beta;                // All parameters in the model matrix
  real<lower=0> sigma;          // Standard deviation of the error term
}


model {
  // Intercept prior
  beta[1] ~ normal(0, 1000);

  // Other coefficients
  beta[2:K] ~ normal(0, 10);
  
  // prior for the model SD
  sigma ~ cauchy(0, 3);
  
  // likelihod
  y ~ normal(X * beta, sigma);
  }
"

## fit the model
new_fit_stan <- stan(model_code = new_stan_code,
            data = new_data_list,
            iter = 2000,
            warmup = 500,
            chains = 4,
            seed = 2244)

## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 4.8e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 1: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 1: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 1.341 seconds (Warm-up)
## Chain 1:                0.3 seconds (Sampling)
## Chain 1:                1.641 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 1.9e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 2: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 2: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 1.148 seconds (Warm-up)
## Chain 2:                0.182 seconds (Sampling)
## Chain 2:                1.33 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 1.6e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 3: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 3: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 1.091 seconds (Warm-up)
## Chain 3:                0.175 seconds (Sampling)
## Chain 3:                1.266 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 1.7e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 4: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 4: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.976 seconds (Warm-up)
## Chain 4:                0.168 seconds (Sampling)
## Chain 4:                1.144 seconds (Total)
## Chain 4:

print(new_fit_stan, digits = 3)

## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1; 
## post-warmup draws per chain=1500, total post-warmup draws=6000.
## 
##              mean se_mean     sd      2.5%       25%       50%       75%
## beta[1]  4208.839   0.229 21.727  4166.262  4193.931  4208.724  4224.214
## beta[2]     6.456   0.060  4.558    -2.592     3.343     6.479     9.578
## beta[3]     3.533   0.093  7.829   -11.675    -1.876     3.581     8.947
## beta[4]    47.526   0.027  2.068    43.521    46.142    47.503    48.923
## beta[5]    -9.209   0.101  9.795   -28.299   -15.946    -9.148    -2.543
## beta[6]    -0.768   0.103  9.843   -20.024    -7.463    -0.796     5.940
## sigma     392.775   0.150 15.191   364.337   382.302   392.137   402.763
## lp__    -2181.873   0.034  1.815 -2186.273 -2182.872 -2181.581 -2180.520
##             97.5% n_eff  Rhat
## beta[1]  4250.511  8990 1.000
## beta[2]    15.369  5791 1.000
## beta[3]    18.691  7120 1.000
## beta[4]    51.610  5675 1.000
## beta[5]     9.755  9421 1.000
## beta[6]    18.615  9211 1.000
## sigma     423.653 10200 1.000
## lp__    -2179.281  2809 1.001
## 
## Samples were drawn using NUTS(diag_e) at Fri May  8 22:34:20 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Compare the coefficients to the rstanarm model

# rstanarm
coef_rstanarm <- colMeans(as.matrix(fit_rstanarm))

# stan
posterior_new_stan <- as.data.frame(new_fit_stan)
coef_new_stan <- colMeans(posterior_new_stan[, 1:7])

tibble(
  parameter = c(colnames(X), "sigma"),
  rstanarm = coef_rstanarm,
  new_stan = coef_new_stan
)

## # A tibble: 7 × 3
##   parameter            rstanarm new_stan
##   <chr>                   <dbl>    <dbl>
## 1 (Intercept)         -5690.    4209.   
## 2 bill_length_mm_c        6.52     6.46 
## 3 bill_depth_mm_c         3.44     3.53 
## 4 flipper_length_mm_c    47.5     47.5  
## 5 islandDream            -9.20    -9.21 
## 6 islandTorgersen        -0.755   -0.768
## 7 sigma                 393.     393.

Hmm….This looks interesting! The intercept is now positive for the new STAN model while it is negative for the rstanarm model. The coefficients, however, are the relatively the same and we draw similar inferences about the difference between the Dream and Torgersen islands and Biscoe (represented in the intercept).
The reason the intercept is different here is from the centering of variables in the STAN model. You’ll notice that this has no impact on model predictions (NOTE: you need to use centered values in the STAN model!), below

## Get the first row of the data again so that you have the centered variables
new_x <- dat[1, ]

## STAN prediction
4209 + 6.46*as.numeric(new_x$bill_length_mm_c)  + 3.53*as.numeric(new_x$bill_depth_mm_c) + 47.5*as.numeric(new_x$flipper_length_mm_c) - 9.21*0 - 0.768*1

## [1] 3233.613

## rstanarm prediction
predict(fit_rstanarm, newdata = new_x)

##        1 
## 3232.261

Nearly identical!

Side Tangent

One short tangent here before we move on to an alternative way of handling categorical features in STAN.

The model matrix approach works really well if you are building fixed effects models as it is really easy to specify the model matrix and entering it into the data list rather than remembering to specify every variable and their corresponding variable type. That said, with this simplicity comes the limitation that I alluded to above. If we are building more complex models or a heirarchical model, this approach is going to struggle. So, getting comfortable coding each aspect of the model (which we do below) will be important too.
In the model{} chunk above we placed the same prior on all of our slope features, beta[2:K] ~ normal(0, 10);. This is fine because that is what we specified in rstanarm. However, if you are fitting a fixed effects model and you want to place a different prior on different coefficients, you can do this by simply indexing them like so:

beta[1] ~ normal(0, 1000); // intercept beta[2:4] ~ normal(0, 2); // continuous predictors beta[5:6] ~ normal(0, 20); // island dummy vars

Using Feature Indexing

The final approach to handling categorical features is to use feature indexing. Instead of specifying the model matrix, we are going to specify our 3 island levels and then in the model{} chunk, before setting priors, we are going to set the first island level, Biscoe, to 0 so that it is the reference level in the intercept. We again use the centered input features to be consistent with rstanarm.

data_list <- list(
  N = nrow(dat),
  body_mass = dat$body_mass_g, # outcome variable
  bill_length = as.numeric(dat$bill_length_mm_c),
  bill_depth = as.numeric(dat$bill_depth_mm_c),
  flipper_length = as.numeric(dat$flipper_length_mm_c),
  island = dat$island_num,
  K = length(unique(dat$island)) # get the number of island groups
)


stan_code2 <- "//Stan Model

data{
  int<lower=1> N;        // Total number of obsevations
  vector[N] body_mass;    // outcome variable
  vector[N] bill_length;
  vector[N] bill_depth;
  vector[N] flipper_length;
  int<lower=1> K;   // number of islands
  int<lower=1, upper = K> island[N];  // island numeric group index for 1:K
}

parameters {
  real alpha;                        // Intercept
  real beta_bill_length;            // coef for bill_length
  real beta_bill_depth;            // coef for bill_depth
  real beta_flipper_length;       // coef for flipper_length
  vector[K-1] beta_island;         // Coef for island (no intercept)
  real<lower=0> sigma;          // Standard deviation of the error term
}


model {
  
  // create a new vector for the categorical variable
  vector[K] beta_ref;
  
  // set the first level of the new categorical variable to 0
  beta_ref[1] = 0;  // reference level set in the intercept
  
  // reset the other categorical variables so that it is now 0, 1, 2 instead of 1, 2, 3
    for (k in 2:K) {
    beta_ref[k] = beta_island[k - 1];
  }
  
  // Priors
  alpha ~ normal(0, 1000);                     // Weakly informative prior for the intercept
  beta_bill_length ~ normal(0, 10);          // Weakly informative prior for bill_length
  beta_bill_depth ~ normal(0, 10);          // Weakly informative prior for bill_depth
  beta_flipper_length ~ normal(0, 10);     // Weakly informative prior for flipper_length
  beta_island ~ normal(0, 10);            // Weakly informative prior for island
  sigma ~ cauchy(0, 3);                 // Weakly informative prior for the standard deviation

  // Likelihood
  for(n in 1:N)
   body_mass[n] ~ normal(alpha + beta_bill_length * bill_length[n] + beta_bill_depth * bill_depth[n] + beta_flipper_length * flipper_length[n] + beta_ref[island[n]], sigma);
  }
"


## Compile & run the model
fit_stan2 <- stan(model_code = stan_code2,
            data = data_list,
            iter = 2000,
            warmup = 500,
            chains = 4,
            seed = 2244)

## Trying to compile a simple C file

## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.6)’
## using SDK: ‘MacOSX15.2.sdk’
## clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/x86_64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Core:19:
## /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
##   679 | #include <cmath>
##       |          ^~~~~~~
## 1 error generated.
## make: *** [foo.o] Error 1
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000169 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.69 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 1: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 1: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 5.882 seconds (Warm-up)
## Chain 1:                0.875 seconds (Sampling)
## Chain 1:                6.757 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 6.6e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.66 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 2: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 2: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 5.054 seconds (Warm-up)
## Chain 2:                1.051 seconds (Sampling)
## Chain 2:                6.105 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 7.4e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.74 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 3: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 3: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 5.403 seconds (Warm-up)
## Chain 3:                0.984 seconds (Sampling)
## Chain 3:                6.387 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 7.1e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.71 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 4: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 4: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 8.67 seconds (Warm-up)
## Chain 4:                1.194 seconds (Sampling)
## Chain 4:                9.864 seconds (Total)
## Chain 4:

print(fit_stan2, probs = c(0.1, 0.5, 0.9))

## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1; 
## post-warmup draws per chain=1500, total post-warmup draws=6000.
## 
##                         mean se_mean    sd      10%      50%      90% n_eff
## alpha                4208.22    0.24 22.10  4179.34  4208.17  4236.20  8373
## beta_bill_length        6.49    0.05  4.61     0.65     6.46    12.49  7111
## beta_bill_depth         3.50    0.09  8.00    -6.84     3.62    13.66  7597
## beta_flipper_length    47.50    0.03  2.05    44.89    47.52    50.15  6163
## beta_island[1]         -9.24    0.11  9.83   -21.80    -9.22     3.40  8673
## beta_island[2]         -0.73    0.10  9.87   -13.40    -0.77    11.98 10145
## sigma                 392.99    0.17 15.23   373.91   392.66   412.39  7901
## lp__                -2181.94    0.04  1.84 -2184.42 -2181.67 -2179.88  2688
##                     Rhat
## alpha                  1
## beta_bill_length       1
## beta_bill_depth        1
## beta_flipper_length    1
## beta_island[1]         1
## beta_island[2]         1
## sigma                  1
## lp__                   1
## 
## Samples were drawn using NUTS(diag_e) at Fri May  8 22:36:16 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Again, we compare the STAN coefficients to the rstanarm coefficients

posterior_stan2 <- as.data.frame(fit_stan2)
coef_stan2 <- colMeans(posterior_stan2[, 1:7])

tibble(
  parameter = c(colnames(X), "sigma"),
  rstanarm = coef_rstanarm,
  stan2 = coef_stan2
)

## # A tibble: 7 × 3
##   parameter            rstanarm    stan2
##   <chr>                   <dbl>    <dbl>
## 1 (Intercept)         -5690.    4208.   
## 2 bill_length_mm_c        6.52     6.49 
## 3 bill_depth_mm_c         3.44     3.50 
## 4 flipper_length_mm_c    47.5     47.5  
## 5 islandDream            -9.20    -9.24 
## 6 islandTorgersen        -0.755   -0.726
## 7 sigma                 393.     393.

The results are nearly identical to the centered model matrix approach above and nearly identical to those of rstanarm. However, like the model matrix approach above, the intercept is positive. This is simply representing the average body mass of a penguin in the population, which is directly interpretable (unlike the negative intercept in the rstanarm model).

mean(dat$body_mass_g)

## [1] 4207.057

Wrapping up

This was a long one! We covered a lot of ground in coding categorical variables into our STAN models. We also discussed a few different approaches to achieving the same task, which gives us a number of options going forward, when we work with more complicated data structures.

STAN Part 3 - Handling Categorical Predictors

Patrick Ward

2026-05-03