library(cmdstanr)
options(mc.cores = parallel::detectCores()) #Make it faster
<- cmdstan_model(file_stan) #Compiling
m_rcmdstan = m_rcmdstan$sample(data = data) #Sampling
s_rcmdstan = s_rcmdstan$draws(format = "df") #Extracting Samples as data.frame
df #shinystan::launch_shinystan(s_rcmdstan) #Shiny Stan
Stan primer
MCMC with Stan / Simple diagnostics
Quick summary
A quick summary of commands (for a more details see below).
Cmdstan
Tidy Bayes
Works with rstan and cmdstanr
library(tidybayes)
spread_draws(s, c(a,b)) #Extracts a and b from s (stan or cmdstan samples)
spread_draws(s, f[i]) #Extracts vector f and calls components i
Bayes plot
Works with rstan and cmdstanr
= s_rcmdstan$draws() #We need to call draws() 2023
s ::mcmc_trace(s)
bayesplot::ess_bulk(s) bayesplot
Rendering Quarto for exercises
Printing the model in markdown files (only important for rendering exercise sheets), see e.g. around here
Not showing the many details in MCMC simulation with Stan. | ```{r, echo=lsg, eval=lsg, message=FALSE, warning=FALSE, results=“hide”}
Stan Syntax
See https://mc-stan.org/docs/stan-users-guide/index.html
Note that in newer version, we need array[N] int<lower=0, upper=1> y;
instead of int<lower=0,upper=1> y[N];
, see https://mc-stan.org/docs/reference-manual/removals.html#postfix-brackets-array-syntax
Stan
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(javascript=FALSE) #Prevents freezes of RStudio (Jan 2023)
= stan_model(model_code = stan_code) #Compiling from string
m_rstan = stan_model(file='mymodel.stan') #Compiling from file
m_rstan = sampling(m_rstan, data=data) #Sampling from file
s_rstan
#No MCMC just evaluating parameters
sampling(model_1, data=dat, algorithm="Fixed_param", chain=1, iter=1)
In a more detail
In the following some notes on Stan. You might find the following resources helpful.
Other Cheat Sheets
Data used
Some Data for linear regression
= 4
N = c(-2.,-0.66666, 0.666, 2.)
x = c(-6.25027354, -2.50213382, -6.07525495, 7.92081243)
y = list(N=N, x=x, y=y) data
Getting samples from the posterior
There are currently (2023) two interfaces to stan from R. RStan (https://mc-stan.org/users/interfaces/rstan) which is a bit slower and does not use the latest stan compiler and rcmdstan (https://mc-stan.org/cmdstanr/articles/cmdstanr.html).
The technical steps
There are 3 steps:
- Defining the model
- Compiling the model. In this step C code is generated.
- Running the simulation / sampling from the posterior
- Extracting the samples from the posterior
Definition
To define a model, you can add a string or create a .stan
file. Another option is to use a Stan
markdown chunk and output.var=my_model
to the name of the model. Code completion and highlighting is working for files and code chunks.
= "data{
stan_code int<lower=0> N;
vector[N] y;
vector[N] x;
}
parameters{
real a; //Instead of using e.g. half Gaussian
real b;
real<lower=0> sigma;
}
model{
//y ~ normal(mu, sigma);
y ~ normal(a * x + b, sigma);
a ~ normal(3, 10);
b ~ normal(0, 10);
sigma ~ normal(0,10);
}"
Compiling
Compiling works with:
Compiling rstan
stan_model(model_code = stan_code)
orstan_model(file='mymodel.stan')
Compiling cmdstan
cmdstan_model(file_stan)
Compiling with rstan
library(rstan)
= stan_model(model_code = stan_code) m_rstan
Compiling with rcmdstan
There is no possibility to use a string for cmd_stan.
<- cmdstan_model(stan_file='stan/simple_lr.stan')
m_rcmdstan $print() #Displays the used model m_rcmdstan
Sampling / running the chains
For rstan: sampling(m_rstan, data=data)
For cmdstan: mod$sample(data = data_file, seed=123)
= m_rcmdstan$sample(data = data) s_rcmdstan
= sampling(m_rstan) s_rstan
Diagnostics of the chains
The package bayesplot can handle both interfaces
Trace
#traceplot(s_rstan, 'a')
#bayesplot::mcmc_trace(s_rstan)
::mcmc_trace(s_rcmdstan$draws()) #similar result
bayesplot
s_rstan# Rhat close to one and n_eff lager than half the number of draws; look fine
Key Numbers
Rhat
is something like the ratio of variation between the chains to withing the chainsn_eff
number of effective samples taking the autocorrelation into account (in cmd_stan the output is ess_bulk and ess_tail)
Shiny Stan
#Shiny Stan, quite overwhelming
library(shinystan)
launch_shinystan(s_rcmdstan)
#If not working
#See https://discourse.mc-stan.org/t/will-launch-shinystan-work-soon-for-cmdstanr/17269
= rstan::read_stan_csv(s_rcmdstan$output_files())
stanfit launch_shinystan(stanfit)
Posteriors (of the parameters)
Getting the samples can be done as
Rstan
extract(s_rstan)
cmdstan
s_rcmdstan$draws()
Another handy package is tidybayes which can handle the output of rstan
and rcmdstan
Tidybayes
library(tidybayes)
head(spread_draws(s_rcmdstan, c(a,b))) #Non-tidy a and b in one row
# A tibble: 6 × 5
.chain .iteration .draw a b
<int> <int> <int> <dbl> <dbl>
1 1 1 1 1.99 -1.76
2 1 2 2 3.04 2.89
3 1 3 3 4.95 -3.39
4 1 4 4 0.934 -2.30
5 1 5 5 2.36 0.303
6 1 6 6 5.31 -2.45
head(gather_draws(s_rcmdstan, c(a,b))) #The ggplot like syntax
# A tibble: 6 × 5
# Groups: .variable [1]
.chain .iteration .draw .variable .value
<int> <int> <int> <chr> <dbl>
1 1 1 1 a 1.99
2 1 2 2 a 3.04
3 1 3 3 a 4.95
4 1 4 4 a 0.934
5 1 5 5 a 2.36
6 1 6 6 a 5.31
#spread_draws(model_weight_sex, a[sex]) for multilevel models
Command Stan Functions
For more see: https://mc-stan.org/cmdstanr/articles/cmdstanr.html
<- s_rcmdstan$draws(format = "df")
df %>% select(-c(.chain, .iteration, .draw)) %>% cor() df
Warning: Dropping 'draws_df' class as required metadata was removed.
lp__ a b sigma
lp__ 1.0000000 0.12991773 -0.10625738 -0.6907346
a 0.1299177 1.00000000 -0.03857131 -0.1040964
b -0.1062574 -0.03857131 1.00000000 0.1078610
sigma -0.6907346 -0.10409641 0.10786104 1.0000000
Stan functions
#plot(samples)
#samples = extract(s_rstan)
stan_dens(s_rstan)
#Note that these are marginals!
“Manually” visualize the posterior
We extract samples from the posterior via extract
(in some installation the wrong extract function is taken in that case use rstan::extract
to use the right one). Visualize the posterior distribution of \(a\) from the samples.
# Extract samples
= rstan::extract(s_rstan)
post = length(post$a)
T hist(post$a,100, freq=F)
lines(density(post$a),col='red')
Pairs plot
Using the pairs plot, correlations in the variables can be found.
= bayesplot::nuts_params(s_rstan)
np_cp ::mcmc_pairs(s_rstan, np = np_cp,pars = c("a","b")) bayesplot
Posterior Predictive Plots
Task: Use the samples to create the following posterior predictive plots
Some background first: posterior predictive distribution: \[ p(y|x, D) = \int p(y|x,\theta) p(\theta|D) \; d\theta \] Instead of integration, we sample in two turns
- \(\theta_i \sim p(\theta|D)\)
- \(y_{ix} \sim p(y|x,\theta_i)\) #We do this for many x in practice
Creation of the posterior predictive samples by hand
You can either do this part, or use stan to create the posterior predictive samples \(y_{ix}\) from the samples \(\theta_i\) by hand.
Tip: Create two matrices yix
and muix
from the posterior samples of \(a,b,\sigma\) with dimension (rows = number of posterior samples and cols = number of x positions).
= -10:15 # The x-range 17 values from -1 to 15
xs = length(xs)
M = matrix(nrow=T, ncol = M) #Matrix from samples (number of posterior draws vs number of xs)
yix = matrix(nrow=T, ncol = M) #Matrix from mu (number of posterior draws vs number of xs)
muix for (i in 1:T){ #Samples from the posterior
= post$a[i] #Corresponds to samples from theta
a = post$b[i]
b = post$sigma[i]
sigma for (j in 1:M){ #Different values of X
= a * xs[j] + b
mu = a * xs[j] + b
muix[i,j] = rnorm(1, mu, sigma) # Single number drawn
yix[i,j]
}
}
if (FALSE){
plot(x, y, xlim=c(-10,15), ylim=c(-25,25), ylab='mu=a*x+b')
for (i in 1:100){
lines(xs, muix[i,],lwd=0.25,col='blue')
}
plot(x, y, xlim=c(-10,15), ylim=c(-25,25), ylab='ys')
for (i in 1:100){
points(xs, yix[i,], pch='.',col='red')
} }
After you created the matrices yix
and muix
you can use the following function to draw the lines for the quantiles.
plot(x, y, xlim=c(-10,15), ylim=c(-25,25), ylab='quantiles (y and mu)')
= function(x2, y_pred, col='blue'){
quant_lines = apply(y_pred, 2,quantile, probs=c(0.50))
m lines(x2, m,col=col)
= apply(y_pred, 2, quantile, probs=c(0.25))
q05 = apply(y_pred, 2, quantile, probs=c(0.75))
q95 lines(x2, q05,col=col)
lines(x2, q95,col=col)
}
quant_lines(xs,yix, col='red')
quant_lines(xs,muix, col='blue')
Creation of the posterior predictive samples with Stan
It’s also possible to draw posterior predictive samples. One can use the generated quantities
code block for that.
data{
int<lower=0> N;
vector[N] y;
vector[N] x;
//For the prediced distribution (new)
int<lower=0> N2;
vector[N2] x2;
}
generated quantities {
real Y_predict[N2];
for (i in 1:N2){
Y_predict = normal_rng(a * x2 + b, sigma);
}
}
= -10:15
x2 = length(x2)
N2 = rstan::stan(file = '~/Dropbox/__HTWG/DataAnalytics/_Current/lab/06_03_Bayes_3/Stan_Primer_model_pred.stan',
fit2 data=list(N=N,x=x, y=y, N2=N2,x2=x2),iter=10000)
= rstan::extract(fit2)
d2 = d2$Y_predict
y_pred dim(y_pred)
plot(x, y, xlim=c(-10,15), ylim=c(-25,25), ylab='quantiles (y and mu)')
quant_lines(x2,y_pred, col='red')
Notes on creating excercises
Inclusion of Stan code
In case of cmdrstan
the Stan code should be in an extra file to make the workflow easier and reuse compiled models. In case of rstan
the Stan code can be included in the Rmd file as:
Attic
RStudio bug 2023
Sometimes R gets slow when rendering with stan code. At least in my case
rstan_options(javascript=FALSE)
Not observed anymore (2024)