Intro

STAN is a probabilistic programming language that is written in the C++ language. If you don’t know C++, that’s okay! People often will write their STAN models in a different language and then use code to complile the model in C++ and run in STAN. For example, we can run it from R using {rstan} or {cmdstanr} or we can write the code in Python using {cmdstanpy}.

In prior blog articles, we built Bayesian models using packages such as {rstanarm} and {brms}, which both run STAN code under the hood, but provide us with an easy to use syntax that is familiar in the R language. The analog to {rstanarm} and {brms} in Python is {bambi}.

So, why bother learning to code STAN models when we have these conveniet packages that already take the syntax we are used to and that we know and love? Well, sometimes with these packages we end up hitting a wall. The model we require is more complex than what is available to be run in these packages. Additionally, writing out the full model and specifying it, as we need to do when we code STAN, helps us really learn what is there, how the parameters are influenced by priors, and offers us a lot of flexibility to adjust things as we see fit.

The Basics

There are three non-negotiable blocks of code that are required to run a STAN model. There are additional blocks that we will add on later, but there are three blocks that always need to be there:

  1. data {}
  2. parameters {}
  3. model {}

data{}

  • The data block defines the inputs of the model
  • The data needs to be stored as a list as that is what is required to pass the data to the C++ langage
  • Within this block you need to define the number of observations, each variable, the limits of each variable, and the class of each variable
  • You have to end each line/argument with a ;

parameters{}

  • The parameters block is where we specify our model parameters (e.g., intercept, slope, and model sigman)
  • Within in this blog we also need to define the class of the parameters, such as real for a real number

model{}

  • In the model block, we define the actual model, read in the data from the data block, and fit the model based on our parameters
  • This block is where we can specify our priors and our likelihoods
  • If you don’t define priors STAN will use weakly informative priors by default

Example

Load the palmerpenguins data

suppressPackageStartupMessages({
  suppressWarnings({
    suppressMessages({
      library(tidyverse)
      library(palmerpenguins)
      library(rstan)
      library(ggridges)
      library(bayesplot)
    })
  })
})


dat <- penguins %>%
  drop_na()

dat %>%
  head()
## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           36.7          19.3               193        3450
## 5 Adelie  Torgersen           39.3          20.6               190        3650
## 6 Adelie  Torgersen           38.9          17.8               181        3625
## # ℹ 2 more variables: sex <fct>, year <int>

Store the data as a list for STAN

  • We will construct a simple linear model estimating Bill Length, bill_length_mm, from Flipper Length, flipper_length_mm
# Store data for the model in a list
data_list <- list(
  N = nrow(dat),
  bill = dat$bill_length_mm,
  flipper = dat$flipper_length_mm
)

# Look at our data list
data_list
## $N
## [1] 333
## 
## $bill
##   [1] 39.1 39.5 40.3 36.7 39.3 38.9 39.2 41.1 38.6 34.6 36.6 38.7 42.5 34.4 46.0
##  [16] 37.8 37.7 35.9 38.2 38.8 35.3 40.6 40.5 37.9 40.5 39.5 37.2 39.5 40.9 36.4
##  [31] 39.2 38.8 42.2 37.6 39.8 36.5 40.8 36.0 44.1 37.0 39.6 41.1 36.0 42.3 39.6
##  [46] 40.1 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6 35.7 41.3 37.6 41.1 36.4 41.6
##  [61] 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5 42.8 40.9 37.2 36.2 42.1 34.6
##  [76] 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9 35.7 41.1 34.0 39.6 36.2 40.8
##  [91] 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8 37.9 39.7 38.6 38.2 38.1 43.2 38.1
## [106] 45.6 39.7 42.2 39.6 42.7 38.6 37.3 35.7 41.1 36.2 37.7 40.2 41.4 35.2 40.6
## [121] 38.8 41.5 39.0 44.1 38.5 43.1 36.8 37.5 38.1 41.1 35.6 40.2 37.0 39.7 40.2
## [136] 40.6 32.1 40.7 37.3 39.0 39.2 36.6 36.0 37.8 36.0 41.5 46.1 50.0 48.7 50.0
## [151] 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5 48.4 45.8 49.3 42.0 49.2 46.2
## [166] 48.7 50.2 45.1 46.5 46.3 42.9 46.1 47.8 48.2 50.0 47.3 42.8 45.1 59.6 49.1
## [181] 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3 49.6 50.5 43.6 45.5 50.5 44.9 45.2
## [196] 46.6 48.5 45.1 50.1 46.5 45.0 43.8 45.5 43.2 50.4 45.3 46.2 45.7 54.3 45.8
## [211] 49.8 49.5 43.5 50.7 47.7 46.4 48.2 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1
## [226] 52.5 47.4 50.0 44.9 50.8 43.4 51.3 47.5 52.1 47.5 52.2 45.5 49.5 44.5 50.8
## [241] 49.4 46.9 48.4 51.1 48.5 55.9 47.2 49.1 46.8 41.7 53.4 43.3 48.1 50.5 49.8
## [256] 43.5 51.5 46.2 55.1 48.8 47.2 46.8 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7
## [271] 45.2 46.1 51.3 46.0 51.3 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2
## [286] 42.4 48.5 43.2 50.6 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7
## [301] 47.5 47.6 52.0 46.9 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5 49.8
## [316] 48.1 51.4 45.7 50.7 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8 45.7 55.8 43.5
## [331] 49.6 50.8 50.2
## 
## $flipper
##   [1] 181 186 195 193 190 181 195 182 191 198 185 195 197 184 194 174 180 189
##  [19] 185 180 187 183 187 172 180 178 178 188 184 195 196 190 180 181 184 182
##  [37] 195 186 196 185 190 182 190 191 186 188 190 200 187 191 186 193 181 194
##  [55] 185 195 185 192 184 192 195 188 190 198 190 190 196 197 190 195 191 184
##  [73] 187 195 189 196 187 193 191 194 190 189 189 190 202 205 185 186 187 208
##  [91] 190 196 178 192 192 203 183 190 193 184 199 190 181 197 198 191 193 197
## [109] 191 196 188 199 189 189 187 198 176 202 186 199 191 195 191 210 190 197
## [127] 193 199 187 190 191 200 185 193 193 187 188 190 192 185 190 184 195 193
## [145] 187 201 211 230 210 218 215 210 211 219 209 215 214 216 214 213 210 217
## [163] 210 221 209 222 218 215 213 215 215 215 215 210 220 222 209 207 230 220
## [181] 220 213 219 208 208 208 225 210 216 222 217 210 225 213 215 210 220 210
## [199] 225 217 220 208 220 208 224 208 221 214 231 219 230 229 220 223 216 221
## [217] 221 217 216 230 209 220 215 223 212 221 212 224 212 228 218 218 212 230
## [235] 218 228 212 224 214 226 216 222 203 225 219 228 215 228 215 210 219 208
## [253] 209 216 229 213 230 217 230 222 214 215 222 212 213 192 196 193 188 197
## [271] 198 178 197 195 198 193 194 185 201 190 201 197 181 190 195 181 191 187
## [289] 193 195 197 200 200 191 205 187 201 187 203 195 199 195 210 192 205 210
## [307] 187 196 196 196 201 190 212 187 198 199 201 193 203 187 197 191 203 202
## [325] 194 206 189 195 207 202 193 210 198

Write the STAN model

  • Again, we need the mandatory data{}, parameters{}. and model{} blocks to make STAN go. We will talk about the other blocks that can be filled out in future blog articles.

  • Remember, each line needs to end with a semicolon, ;

  • In R we use # for comments. In STAN we use two slashes, // to type our text comments and notes

  • Instantiation of the STAN model can be done a few ways:

    1. We can type it directly into R and place quotes around it
    2. We can create a separate STAN file that we then call using {rstan}
    3. We can create a separate STAN file that we then call using {rstan}
    4. We can create a separate STAN file that we then call using {cmdstanr}
    5. We can create a separate STAN file that we then call using {cmdstanpy}

For the purposes of today, I’ll run everything directly in the code chunk below so that we can see how things work. This is usually the way I tend to work. However, there are times where storing a separate file can be useful. In particular, the STAN model in its own STAN File can be used by R or Python, making your model flexible for different coding languages. So, in future blogs I will cover how to set up a STAN File and then call it with {rstan}, {cmdstanr}, and {cmdstanpy} (I’ll make a few Jupyer Notebooks to show how this works in Python).

# Write the Stan Model
stan_code <- "
// Define the data
data{
  int<lower=0> N;        // Total number of observations
  vector[N] bill;    // Outcome Variable = Bill Length
  vector[N] flipper;  // Independent Variable = Flipper Length
}

// Specify the parameters and their data types (these will all be real numbers)
parameters {
  real alpha;              // Intercept
  real beta;               // Slope
  real<lower=0> sigma;     // Standard deviation of the error term
}

// Specify the model and priors (NOTE: if no priors are specified STAN will use weakly informative priors by default)
model {

  // Priors
  alpha ~ normal(0, 10);   // Weakly informative prior for the intercept
  beta ~ normal(0, 1);    // Weakly informative prior for the slope
  sigma ~ exponential(1);  // Weakly informative prior for the standard deviation

  // Likelihood
  bill ~ normal(alpha + beta * flipper, sigma);
}"
  • We fit the model with a normal distribtion where mu is the deterministic part of the linear model, \(y = a + b*x\), and sigma represents the stochastic part of the model (the model error)

Fit the STAN model

There are several arguments in the stan() function. I’ll talk about them more in future posts as we discuss what adjusting them does. The arguments will look familiar to anyone that has fit STAN models using {rstanarm} or {brms}

fit_stan <- stan(model_code = stan_code,
            data = data_list,
            iter = 2000,
            warmup = 500,
            chains = 4,
            seed = 2026)
## Trying to compile a simple C file
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
## using C compiler: ‘Apple clang version 16.0.0 (clang-1600.0.26.6)’
## using SDK: ‘MacOSX15.2.sdk’
## clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/x86_64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
## In file included from <built-in>:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Dense:1:
## In file included from /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/Core:19:
## /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
##   679 | #include <cmath>
##       |          ^~~~~~~
## 1 error generated.
## make: *** [foo.o] Error 1
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 8e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.8 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 1: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 1: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 1: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 1: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 1: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 1: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 1: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 1: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.411 seconds (Warm-up)
## Chain 1:                0.881 seconds (Sampling)
## Chain 1:                1.292 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 1.8e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 2: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 2: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 2: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 2: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 2: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 2: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 2: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 2: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.574 seconds (Warm-up)
## Chain 2:                0.894 seconds (Sampling)
## Chain 2:                1.468 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 1.6e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 3: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 3: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 3: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 3: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 3: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 3: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 3: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 3: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.526 seconds (Warm-up)
## Chain 3:                0.887 seconds (Sampling)
## Chain 3:                1.413 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 1.6e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
## Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
## Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
## Chain 4: Iteration:  501 / 2000 [ 25%]  (Sampling)
## Chain 4: Iteration:  700 / 2000 [ 35%]  (Sampling)
## Chain 4: Iteration:  900 / 2000 [ 45%]  (Sampling)
## Chain 4: Iteration: 1100 / 2000 [ 55%]  (Sampling)
## Chain 4: Iteration: 1300 / 2000 [ 65%]  (Sampling)
## Chain 4: Iteration: 1500 / 2000 [ 75%]  (Sampling)
## Chain 4: Iteration: 1700 / 2000 [ 85%]  (Sampling)
## Chain 4: Iteration: 1900 / 2000 [ 95%]  (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.793 seconds (Warm-up)
## Chain 4:                0.964 seconds (Sampling)
## Chain 4:                1.757 seconds (Total)
## Chain 4:

Take a look at the model output

print(fit_stan)
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1; 
## post-warmup draws per chain=1500, total post-warmup draws=6000.
## 
##          mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
## alpha   -6.61    0.08 3.13  -12.65   -8.76   -6.66   -4.50   -0.44  1482    1
## beta     0.25    0.00 0.02    0.22    0.24    0.25    0.26    0.28  1488    1
## sigma    4.13    0.00 0.16    3.84    4.02    4.13    4.24    4.47  2421    1
## lp__  -643.69    0.03 1.26 -646.89 -644.26 -643.38 -642.77 -642.28  1795    1
## 
## Samples were drawn using NUTS(diag_e) at Sat Mar  7 13:11:41 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).
  • There is a lot going on here. In future blog articles we will talk more about what this output is saying and how we can interpret it, when we discuss model diagnostics
  • If you want to only look at our model parameters and thin this output down, you can call them specifically
print(fit_stan, pars = c("alpha", "beta", "sigma"), probs = c(0.1, 0.5, 0.9), digits = 4)
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=500; thin=1; 
## post-warmup draws per chain=1500, total post-warmup draws=6000.
## 
##          mean se_mean     sd      10%     50%     90% n_eff   Rhat
## alpha -6.6120  0.0813 3.1288 -10.5958 -6.6642 -2.5838  1482 1.0032
## beta   0.2518  0.0004 0.0155   0.2319  0.2520  0.2715  1488 1.0032
## sigma  4.1342  0.0033 0.1620   3.9315  4.1309  4.3415  2421 1.0000
## 
## Samples were drawn using NUTS(diag_e) at Sat Mar  7 13:11:41 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

extract and plot posterior samples for slope coefficient

post_slope <- extract(fit_stan,
        pars = "beta") %>%
  as.data.frame() %>%
  pull(beta) 

post_slope %>%
  hist()

quantile(post_slope, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
##        5%       25%       50%       75%       95% 
## 0.2257220 0.2412302 0.2520162 0.2624716 0.2776138

Get the posterior draw for all parameters

posterior_draws <- as.data.frame(fit_stan)
head(posterior_draws)
##       alpha      beta    sigma      lp__
## 1 -4.946153 0.2436300 4.163618 -642.3401
## 2 -8.169144 0.2600534 4.248282 -642.7445
## 3 -6.109667 0.2497968 4.022110 -642.4536
## 4 -5.072701 0.2432961 4.112863 -642.6053
## 5 -3.081222 0.2350112 4.167247 -642.9985
## 6 -7.130912 0.2539639 4.150874 -642.2852
colMeans(posterior_draws)
##        alpha         beta        sigma         lp__ 
##   -6.6120060    0.2518128    4.1342369 -643.6944364
apply(X = posterior_draws, MARGIN = 2, FUN = sd)
##      alpha       beta      sigma       lp__ 
## 3.12879034 0.01552891 0.16203310 1.25605985

Trace Plots of the MCMC

mcmc_trace(as.array(fit_stan), pars = c("alpha", "beta", "sigma"))

Density Plots of the Model Parameters

mcmc_dens_overlay(as.array(fit_stan), pars = c("alpha", "beta", "sigma"))

Wrapping Up

That’s a quick walk through setting up a STAN model and getting it running. In future blog posts we will extend this model to a more complex structure, talk about how to handle categorical predictors within STAN, discuss model diagnostics, discuss making predictions with a STAN model, and show how we can easily transition between R and Python with our Bayesian models with limited difficulty.