# Doing Bayesian Data Analysis

## Monday, October 23, 2017

## Tuesday, July 4, 2017

### DBDA2E Scripts in Stan

Many more scripts from DBDA2E have been re-written for

**Stan**, by

**Prof. Joseph Houpt**at Wright State University. Joe has posted his scripts at GitHub:

**https://github.com/jhoupt/DBDA2Estan**

To run Joe's Stan scripts you will need the usual other supporting scripts and data files from DBDA2E, available at the book's web site (see Step 5 of

**Software Installation**). Just copy Joe's scripts into your folder of scripts from the book.

Joe's GitHub site includes only the new scripts he modified from the book's JAGS scripts. The book had several Stan scripts already, and Joe has not re-written those. The book describes Stan in Chapter 14, and has several Stan scripts accompanying applications in later chapters. (Regarding discrete parameters: The book points out that Stan does not directly implement discrete parameters like JAGS, but the book does not discuss how to marginalize over discrete parameters instead. Some of Joe's scripts implement marginalization over discrete parameters.)

Big thanks to

**Joe Houpt**for writing these scripts and making them available!

*Notes regarding the figure at the top of this post:*

1. Yes, the "<-" operator is deprecated in Stan, and I avoid using it in R, but it still looks like an arrow, and programmers who are familiar with R and JAGS will recognize it.

2. The Stan icon comes from the Stan web page. If the Stan folks don't like my use of the icon here, please

**contact me**and I'll remove it or modify it as required.

**Reminders of some recent posts:**

**Looking for great teachers of Bayesian statistics****New article: Bayesian analysis for newcomers.****New article: The Bayesian New Statistics.**

## Thursday, June 29, 2017

### Difference of means for paired data: Model the mean of the differences or the joint distribution

I'm regularly asked about how to analyze the difference of means for paired data, such as pre-treatment and post-treatment scores (for the same subjects), or, say, blood pressures of spouses, etc. The usual approach is merely to consider the difference scores (e.g., post minus pre for each subject) and examine the mean of the differences. But we could also consider the joint distribution of the paired scores and describe the joint distribution as, say, a bivariate normal. In this post I'll show you how to do both. I'll illustrate with both positively correlated pairs and negatively correlated pairs.

The red crosses (above) mark the arithmetic means in the data.

The script for generating the plots above is appended at the end of this post. It is merely a variation of the

Notice that the posterior of the mean of the difference scores (from BEST) is essentially the same as the posterior of the difference of means from the previous analysis. The small discrepancies between the two analyses stem from two sources: First, there is MCMC wobble. Second, the BEST package models the data with a

We can describe the joint distribution as a bivariate normal, and then consider the posterior distribution of the difference of means, as shown below:

Notice that these negatively-correlated data have

Analyzing the mean of the differences scores yields the same result:

The R script for generating the plots is appended below, but first,

*(e.g., if the pre-treatment score is higher than average, then the post-treatment score tends to be higher than average; or, if one person's relationship satisfaction is higher than average, then his/her spouse's relationship satisfaction tends to be higher than average).***Suppose we have positively correlated pairs of scores****as shown below:***We can describe the joint distribution as a bivariate normal, and then consider the posterior distribution of the difference of means,*The red crosses (above) mark the arithmetic means in the data.

The script for generating the plots above is appended at the end of this post. It is merely a variation of the

**script for multivariate normals posted on this blog**a few days ago.**Here I'll use the***Alternatively, we could consider the single group of difference scores.***BEST package**, but we could also use the script Jags-Ymet-Xnom1grp-Mnormal-Example.R or Jags-Ymet-Xnom1grp-Mrobust-Example.R from**DBDA2E**.The result:Notice that the posterior of the mean of the difference scores (from BEST) is essentially the same as the posterior of the difference of means from the previous analysis. The small discrepancies between the two analyses stem from two sources: First, there is MCMC wobble. Second, the BEST package models the data with a

*t*distribution not a normal. In this case the simulated data are generated from a normal so the normality parameter is estimated to be large anyway.*(e.g., if one person's job satisfaction is***Suppose instead we have negatively correlated pairs of scores***higher*than average, then his/her spouse's job satisfaction tends to be*lower*than average).We can describe the joint distribution as a bivariate normal, and then consider the posterior distribution of the difference of means, as shown below:

Notice that these negatively-correlated data have

*exactly the same arithmetic difference of means and exactly the same variances*as the data in the previous example. The only difference is the correlation. Now, for negatively correlated pairs, the estimated difference of means is essentially the same as for the previous data (only MCMC wobble makes the mode discrepant), but the estimate is much less certain, with a much wider HDI.Analyzing the mean of the differences scores yields the same result:

The R script for generating the plots is appended below, but first,

**Reminders of some recent posts:****Looking for great teachers of Bayesian statistics****New article: Bayesian analysis for newcomers.****New article: The Bayesian New Statistics.**

**Appendix: R script used for this post:**

**#----------------------------------------------------------------------------****# Jags-DifferenceOfPairedScores.R**

# John Kruschke, June 2017.

# For further info, see:

# Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:

# A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.

# Optional generic preliminaries:

graphics.off() # This closes all of R's graphics windows.

rm(list=ls()) # Careful! This clears all of R's memory!

#--------------------------------------------------------------------------

# Load the data:

#--------------------------------------------------------------------------

# For real research, you would read in your data file here. This script expects

# the data to have two columns for the paired scores, one row per pair. No

# missing data.

#

# myData = read.csv("YourDataFileName.csv")

# y = myData[,c("ColumnNameOfPreScore","ColumnNameOfPostScore")]

# Generate simulated data:

library(MASS)

mu1 = 100.0

sigma1 = 15.0

mu2 = 108.0

sigma2 = 20.0

rho = 0.9

Sigma <- matrix(c( sigma1^2 , rho*sigma1*sigma2 ,

rho*sigma2*sigma1 , sigma2^2 ) , ncol=2 , byrow=TRUE )

set.seed(47405)

N = 100

yMat = mvrnorm(n=N, mu=c(mu1,mu2), Sigma=Sigma, empirical=TRUE)

colnames(yMat) = c("Y1","Y2")

write.csv( data.frame(yMat) , file="Jags-DifferenceOfPairedScores-Data.csv" )

# John Kruschke, June 2017.

# For further info, see:

# Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:

# A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.

# Optional generic preliminaries:

graphics.off() # This closes all of R's graphics windows.

rm(list=ls()) # Careful! This clears all of R's memory!

#--------------------------------------------------------------------------

# Load the data:

#--------------------------------------------------------------------------

# For real research, you would read in your data file here. This script expects

# the data to have two columns for the paired scores, one row per pair. No

# missing data.

#

# myData = read.csv("YourDataFileName.csv")

# y = myData[,c("ColumnNameOfPreScore","ColumnNameOfPostScore")]

# Generate simulated data:

library(MASS)

mu1 = 100.0

sigma1 = 15.0

mu2 = 108.0

sigma2 = 20.0

rho = 0.9

Sigma <- matrix(c( sigma1^2 , rho*sigma1*sigma2 ,

rho*sigma2*sigma1 , sigma2^2 ) , ncol=2 , byrow=TRUE )

set.seed(47405)

N = 100

yMat = mvrnorm(n=N, mu=c(mu1,mu2), Sigma=Sigma, empirical=TRUE)

colnames(yMat) = c("Y1","Y2")

write.csv( data.frame(yMat) , file="Jags-DifferenceOfPairedScores-Data.csv" )

**# Now read in simulated data as if it were a data file:**

myData = read.csv("Jags-DifferenceOfPairedScores-Data.csv")

y = myData[,c("Y1","Y2")]

#----------------------------------------------------------------------------

# The rest can remain unchanged, except for the specification of difference of

# means at the very end.

#

# The script below is a modification of the script Jags-MultivariateNormal.R,

# and there are some vestigial and unnecessary complexities. The previous script

# involved data with possibly more than two "y" columns, whereas the present

# script is only concerned with two "y" columns.

#----------------------------------------------------------------------------

# Load some functions used below:

source("DBDA2E-utilities.R") # Must be in R's current working directory.

# Install the ellipse package if not already:

want = c("ellipse")

have = want %in% rownames(installed.packages())

if ( any(!have) ) { install.packages( want[!have] ) }

# Standardize the data:

sdOrig = apply(y,2,sd)

meanOrig = apply(y,2,mean)

zy = apply(y,2,function(yVec){(yVec-mean(yVec))/sd(yVec)})

# Assemble data for sending to JAGS:

dataList = list(

zy = zy ,

Ntotal = nrow(zy) ,

Nvar = ncol(zy) ,

# Include original data info for transforming to original scale:

sdOrig = sdOrig ,

meanOrig = meanOrig ,

# For wishart (dwish) prior on inverse covariance matrix:

zRscal = ncol(zy) , # for dwish prior

zRmat = diag(x=1,nrow=ncol(zy)) # Rmat = diag(apply(y,2,var))

)

# Define the model:

modelString = "

model {

for ( i in 1:Ntotal ) {

zy[i,1:Nvar] ~ dmnorm( zMu[1:Nvar] , zInvCovMat[1:Nvar,1:Nvar] )

}

for ( varIdx in 1:Nvar ) { zMu[varIdx] ~ dnorm( 0 , 1/2^2 ) }

zInvCovMat ~ dwish( zRmat[1:Nvar,1:Nvar] , zRscal )

# Convert invCovMat to sd and correlation:

zCovMat <- inverse( zInvCovMat )

for ( varIdx in 1:Nvar ) { zSigma[varIdx] <- sqrt(zCovMat[varIdx,varIdx]) }

for ( varIdx1 in 1:Nvar ) { for ( varIdx2 in 1:Nvar ) {

zRho[varIdx1,varIdx2] <- ( zCovMat[varIdx1,varIdx2]

/ (zSigma[varIdx1]*zSigma[varIdx2]) )

} }

# Convert to original scale:

for ( varIdx in 1:Nvar ) {

sigma[varIdx] <- zSigma[varIdx] * sdOrig[varIdx]

mu[varIdx] <- zMu[varIdx] * sdOrig[varIdx] + meanOrig[varIdx]

}

for ( varIdx1 in 1:Nvar ) { for ( varIdx2 in 1:Nvar ) {

rho[varIdx1,varIdx2] <- zRho[varIdx1,varIdx2]

} }

}

" # close quote for modelString

writeLines( modelString , con="Jags-MultivariateNormal-model.txt" )

# Run the chains:

nChain = 3

nAdapt = 500

nBurnIn = 500

nThin = 10

nStepToSave = 20000

require(rjags)

jagsModel = jags.model( file="Jags-MultivariateNormal-model.txt" ,

data=dataList , n.chains=nChain , n.adapt=nAdapt )

update( jagsModel , n.iter=nBurnIn )

codaSamples = coda.samples( jagsModel ,

variable.names=c("mu","sigma","rho") ,

n.iter=nStepToSave/nChain*nThin , thin=nThin )

# Convergence diagnostics:

parameterNames = varnames(codaSamples) # get all parameter names

for ( parName in parameterNames ) {

diagMCMC( codaObject=codaSamples , parName=parName )

}

# Examine the posterior distribution:

mcmcMat = as.matrix(codaSamples)

chainLength = nrow(mcmcMat)

Nvar = ncol(y)

# Create subsequence of steps through chain for plotting:

stepVec = floor(seq(1,chainLength,length=20))

# Make plots of posterior distribution:

# Preparation -- define useful functions:

library(ellipse)

expandRange = function( x , exMult=0.2 ) {

lowVal = min(x)

highVal = max(x)

wid = max(x)-min(x)

return( c( lowVal - exMult*wid , highVal + exMult*wid ) )

}

for ( varIdx in 1:Nvar ) {

openGraph(width=7,height=3.5)

par( mar=c(3.5,3,2,1) , mgp=c(2.0,0.7,0) )

layout(matrix(1:2,nrow=1))

# Marginal posterior on means:

plotPost( mcmcMat[ , paste0("mu[",varIdx,"]") ] ,

xlab=paste0("mu[",varIdx,"]") ,

main=paste( "Mean of" , colnames(y)[varIdx] ) )

# Marginal posterior on standard deviations:

plotPost( mcmcMat[ , paste0("sigma[",varIdx,"]") ] ,

xlab=paste0("sigma[",varIdx,"]") ,

main=paste( "SD of" , colnames(y)[varIdx] ) )

}

for ( varIdx1 in 1:(Nvar-1) ) {

for ( varIdx2 in (varIdx1+1):Nvar ) {

openGraph(width=7,height=3.5)

par( mar=c(3.5,3,2,1) , mgp=c(2.0,0.7,0) )

layout(matrix(1:2,nrow=1))

# Marginal posterior on correlation coefficient

plotPost( mcmcMat[ , paste0("rho[",varIdx1,",",varIdx2,"]") ] ,

xlab=paste0("rho[",varIdx1,",",varIdx2,"]") ,

main=paste( "Corr. of" , colnames(y)[varIdx1] ,

"and" , colnames(y)[varIdx2] ) )

# Data with posterior ellipse

ellipseLevel = 0.90

plot( y[,c(varIdx1,varIdx2)] , asp=1 , # pch=19 ,

xlim=expandRange(y[,varIdx1],0.1) ,

ylim=expandRange(y[,varIdx2],0.1) ,

xlab=colnames(y)[varIdx1] , ylab=colnames(y)[varIdx2] ,

main=bquote("Data with posterior "*.(ellipseLevel)*" level contour") )

abline(0,1,lty="dashed")

# Posterior ellipses:

for ( stepIdx in stepVec ) {

points( ellipse( mcmcMat[ stepIdx ,

paste0("rho[",varIdx1,",",varIdx2,"]") ] ,

scale=mcmcMat[ stepIdx ,

c( paste0("sigma[",varIdx1,"]") ,

paste0("sigma[",varIdx2,"]") ) ] ,

centre=mcmcMat[ stepIdx ,

c( paste0("mu[",varIdx1,"]") ,

paste0("mu[",varIdx2,"]") ) ] ,

level=ellipseLevel ) ,

type="l" , col="skyblue" , lwd=1 )

}

# replot data:

points( y[,c(varIdx1,varIdx2)] )

points( mean(y[,varIdx1]) , mean(y[,varIdx2]) , pch="+" , col="red" , cex=2 )

}

}

# Show data descriptives on console:

cor( y )

apply(y,2,mean)

apply(y,2,sd)

#-----------------------------------------------------------------------------

# Difference of means. N.B.: THIS ONLY MAKES SENSE IF THE MEANS ARE ON THE SAME

# SCALE, E.G., BEFORE-TREATMENT AND AFTER-TREATMENT SCORES ON THE SAME MEASURE.

# Change the specification of indices as appropriate to your variable names and

# interests.

#-----------------------------------------------------------------------------

# Specify indices of desired means:

MuIdx1 = which(colnames(y)=="Y2")

MuIdx2 = which(colnames(y)=="Y1")

# Make plot of posterior difference:

openGraph(height=3.5,width=4)

par( mar=c(3.5,1,2,1) , mgp=c(2.0,0.7,0) )

plotPost( mcmcMat[ , paste0("mu[",MuIdx1,"]") ]

- mcmcMat[ , paste0("mu[",MuIdx2,"]") ],

xlab=paste0( "mu[",MuIdx1,"] - mu[",MuIdx2,"]" ) ,

main=bquote( mu[ .(colnames(y)[MuIdx1]) ]

- mu[ .(colnames(y)[MuIdx2]) ] ) ,

cex.main=1.5 , ROPE=c(-1.5,1.5) )

points( mean(y[,MuIdx1])-mean(y[,MuIdx2]) , 0 , pch="+" , col="red" , cex=2 )

#---------------------------------------------------------------------------

# Re-do using BEST package:

library(BEST)

BESTout = BESTmcmc( y[,MuIdx1] - y[,MuIdx2] ,

numSavedSteps=nStepToSave , thinSteps=nThin )

openGraph(height=3.5,width=4)

par( mar=c(3.5,1,2,1) , mgp=c(2.0,0.7,0) )

plot( BESTout , compVal=0.0 , ROPE=c(-1.5,1.5) ,

xlab=paste0( "mu[",MuIdx1,"] - mu[",MuIdx2,"]" ) ,

main=bquote( mu[ .(colnames(y)[MuIdx1]) ]

- mu[ .(colnames(y)[MuIdx2]) ] ) )

#---------------------------------------------------------------------------myData = read.csv("Jags-DifferenceOfPairedScores-Data.csv")

y = myData[,c("Y1","Y2")]

#----------------------------------------------------------------------------

# The rest can remain unchanged, except for the specification of difference of

# means at the very end.

#

# The script below is a modification of the script Jags-MultivariateNormal.R,

# and there are some vestigial and unnecessary complexities. The previous script

# involved data with possibly more than two "y" columns, whereas the present

# script is only concerned with two "y" columns.

#----------------------------------------------------------------------------

# Load some functions used below:

source("DBDA2E-utilities.R") # Must be in R's current working directory.

# Install the ellipse package if not already:

want = c("ellipse")

have = want %in% rownames(installed.packages())

if ( any(!have) ) { install.packages( want[!have] ) }

# Standardize the data:

sdOrig = apply(y,2,sd)

meanOrig = apply(y,2,mean)

zy = apply(y,2,function(yVec){(yVec-mean(yVec))/sd(yVec)})

# Assemble data for sending to JAGS:

dataList = list(

zy = zy ,

Ntotal = nrow(zy) ,

Nvar = ncol(zy) ,

# Include original data info for transforming to original scale:

sdOrig = sdOrig ,

meanOrig = meanOrig ,

# For wishart (dwish) prior on inverse covariance matrix:

zRscal = ncol(zy) , # for dwish prior

zRmat = diag(x=1,nrow=ncol(zy)) # Rmat = diag(apply(y,2,var))

)

# Define the model:

modelString = "

model {

for ( i in 1:Ntotal ) {

zy[i,1:Nvar] ~ dmnorm( zMu[1:Nvar] , zInvCovMat[1:Nvar,1:Nvar] )

}

for ( varIdx in 1:Nvar ) { zMu[varIdx] ~ dnorm( 0 , 1/2^2 ) }

zInvCovMat ~ dwish( zRmat[1:Nvar,1:Nvar] , zRscal )

# Convert invCovMat to sd and correlation:

zCovMat <- inverse( zInvCovMat )

for ( varIdx in 1:Nvar ) { zSigma[varIdx] <- sqrt(zCovMat[varIdx,varIdx]) }

for ( varIdx1 in 1:Nvar ) { for ( varIdx2 in 1:Nvar ) {

zRho[varIdx1,varIdx2] <- ( zCovMat[varIdx1,varIdx2]

/ (zSigma[varIdx1]*zSigma[varIdx2]) )

} }

# Convert to original scale:

for ( varIdx in 1:Nvar ) {

sigma[varIdx] <- zSigma[varIdx] * sdOrig[varIdx]

mu[varIdx] <- zMu[varIdx] * sdOrig[varIdx] + meanOrig[varIdx]

}

for ( varIdx1 in 1:Nvar ) { for ( varIdx2 in 1:Nvar ) {

rho[varIdx1,varIdx2] <- zRho[varIdx1,varIdx2]

} }

}

" # close quote for modelString

writeLines( modelString , con="Jags-MultivariateNormal-model.txt" )

# Run the chains:

nChain = 3

nAdapt = 500

nBurnIn = 500

nThin = 10

nStepToSave = 20000

require(rjags)

jagsModel = jags.model( file="Jags-MultivariateNormal-model.txt" ,

data=dataList , n.chains=nChain , n.adapt=nAdapt )

update( jagsModel , n.iter=nBurnIn )

codaSamples = coda.samples( jagsModel ,

variable.names=c("mu","sigma","rho") ,

n.iter=nStepToSave/nChain*nThin , thin=nThin )

# Convergence diagnostics:

parameterNames = varnames(codaSamples) # get all parameter names

for ( parName in parameterNames ) {

diagMCMC( codaObject=codaSamples , parName=parName )

}

# Examine the posterior distribution:

mcmcMat = as.matrix(codaSamples)

chainLength = nrow(mcmcMat)

Nvar = ncol(y)

# Create subsequence of steps through chain for plotting:

stepVec = floor(seq(1,chainLength,length=20))

# Make plots of posterior distribution:

# Preparation -- define useful functions:

library(ellipse)

expandRange = function( x , exMult=0.2 ) {

lowVal = min(x)

highVal = max(x)

wid = max(x)-min(x)

return( c( lowVal - exMult*wid , highVal + exMult*wid ) )

}

for ( varIdx in 1:Nvar ) {

openGraph(width=7,height=3.5)

par( mar=c(3.5,3,2,1) , mgp=c(2.0,0.7,0) )

layout(matrix(1:2,nrow=1))

# Marginal posterior on means:

plotPost( mcmcMat[ , paste0("mu[",varIdx,"]") ] ,

xlab=paste0("mu[",varIdx,"]") ,

main=paste( "Mean of" , colnames(y)[varIdx] ) )

# Marginal posterior on standard deviations:

plotPost( mcmcMat[ , paste0("sigma[",varIdx,"]") ] ,

xlab=paste0("sigma[",varIdx,"]") ,

main=paste( "SD of" , colnames(y)[varIdx] ) )

}

for ( varIdx1 in 1:(Nvar-1) ) {

for ( varIdx2 in (varIdx1+1):Nvar ) {

openGraph(width=7,height=3.5)

par( mar=c(3.5,3,2,1) , mgp=c(2.0,0.7,0) )

layout(matrix(1:2,nrow=1))

# Marginal posterior on correlation coefficient

plotPost( mcmcMat[ , paste0("rho[",varIdx1,",",varIdx2,"]") ] ,

xlab=paste0("rho[",varIdx1,",",varIdx2,"]") ,

main=paste( "Corr. of" , colnames(y)[varIdx1] ,

"and" , colnames(y)[varIdx2] ) )

# Data with posterior ellipse

ellipseLevel = 0.90

plot( y[,c(varIdx1,varIdx2)] , asp=1 , # pch=19 ,

xlim=expandRange(y[,varIdx1],0.1) ,

ylim=expandRange(y[,varIdx2],0.1) ,

xlab=colnames(y)[varIdx1] , ylab=colnames(y)[varIdx2] ,

main=bquote("Data with posterior "*.(ellipseLevel)*" level contour") )

abline(0,1,lty="dashed")

# Posterior ellipses:

for ( stepIdx in stepVec ) {

points( ellipse( mcmcMat[ stepIdx ,

paste0("rho[",varIdx1,",",varIdx2,"]") ] ,

scale=mcmcMat[ stepIdx ,

c( paste0("sigma[",varIdx1,"]") ,

paste0("sigma[",varIdx2,"]") ) ] ,

centre=mcmcMat[ stepIdx ,

c( paste0("mu[",varIdx1,"]") ,

paste0("mu[",varIdx2,"]") ) ] ,

level=ellipseLevel ) ,

type="l" , col="skyblue" , lwd=1 )

}

# replot data:

points( y[,c(varIdx1,varIdx2)] )

points( mean(y[,varIdx1]) , mean(y[,varIdx2]) , pch="+" , col="red" , cex=2 )

}

}

# Show data descriptives on console:

cor( y )

apply(y,2,mean)

apply(y,2,sd)

#-----------------------------------------------------------------------------

# Difference of means. N.B.: THIS ONLY MAKES SENSE IF THE MEANS ARE ON THE SAME

# SCALE, E.G., BEFORE-TREATMENT AND AFTER-TREATMENT SCORES ON THE SAME MEASURE.

# Change the specification of indices as appropriate to your variable names and

# interests.

#-----------------------------------------------------------------------------

# Specify indices of desired means:

MuIdx1 = which(colnames(y)=="Y2")

MuIdx2 = which(colnames(y)=="Y1")

# Make plot of posterior difference:

openGraph(height=3.5,width=4)

par( mar=c(3.5,1,2,1) , mgp=c(2.0,0.7,0) )

plotPost( mcmcMat[ , paste0("mu[",MuIdx1,"]") ]

- mcmcMat[ , paste0("mu[",MuIdx2,"]") ],

xlab=paste0( "mu[",MuIdx1,"] - mu[",MuIdx2,"]" ) ,

main=bquote( mu[ .(colnames(y)[MuIdx1]) ]

- mu[ .(colnames(y)[MuIdx2]) ] ) ,

cex.main=1.5 , ROPE=c(-1.5,1.5) )

points( mean(y[,MuIdx1])-mean(y[,MuIdx2]) , 0 , pch="+" , col="red" , cex=2 )

#---------------------------------------------------------------------------

# Re-do using BEST package:

library(BEST)

BESTout = BESTmcmc( y[,MuIdx1] - y[,MuIdx2] ,

numSavedSteps=nStepToSave , thinSteps=nThin )

openGraph(height=3.5,width=4)

par( mar=c(3.5,1,2,1) , mgp=c(2.0,0.7,0) )

plot( BESTout , compVal=0.0 , ROPE=c(-1.5,1.5) ,

xlab=paste0( "mu[",MuIdx1,"] - mu[",MuIdx2,"]" ) ,

main=bquote( mu[ .(colnames(y)[MuIdx1]) ]

- mu[ .(colnames(y)[MuIdx2]) ] ) )

#---------------------------------------------------------------------------

## Monday, June 26, 2017

### Bayesian estimation of correlations and differences of correlations with a multivariate normal

Within days of each other I received two emails asking about Bayesian estimation of correlations and differences of correlations. Hence this post, which presents an implementation in JAGS and R.

Traditionally, a joint distribution of several continuous/metric variables is modeled as a multivariate normal. We estimate the mean vector and covariance matrix of the multivariate normal. When standardized, the covariance matrix is the correlation matrix, so pairwise correlations can be inspected. Moreover,

As an example, I'll use the data regarding Scholastic Aptitude Test (SAT) scores from Guber (1999), explained in Chapter 18 of DBDA2E. We'll consider three data variables: SAT total score (SATT), spending per student (Spend), and percentage of students taking the exam (PrcntTake). Here's a plot of the data from different perspectives:

where the "z" prefixes of the variable names indicate that the data have been standardized inside JAGS. In the code above, zy[i,:] is the

The prior on each mean is a broad normal. The prior on the inverse covariance matrices is a generic Wishart distribution, which is a multidimensional generalization of a gamma distribution. The Wishart is a distribution over matrices. The Wishart is built into JAGS.

The R code appended below provides the complete script, including the model specification and the graphics commands. The script needs the utilities script from DBDA2E for opening graphics windows.

Here is some selected output from the program. In particular, the program produces plots of the posterior distributions of the pairwise correlations, presented with the data and level contours from the multivariate normal:

Finally, here is a plot of the posterior of the difference of two correlations:

You should consider differences of correlations only if it's really meaningful to do so. Just as with taking differences of standardized regression coefficients, taking differences of correlations is assuming that the correlations are "on the same scale" which only makes sense to the extent that one standard deviation of SAT is "the same" as one standard deviation of Spending and is "the same" as one standard deviation of Percentage Taking the Exam.

The full R script follows. Add a comment to this post if this script is useful to you. Enjoy!

Traditionally, a joint distribution of several continuous/metric variables is modeled as a multivariate normal. We estimate the mean vector and covariance matrix of the multivariate normal. When standardized, the covariance matrix is the correlation matrix, so pairwise correlations can be inspected. Moreover,

*it's meaningful to compare correlations, then we can also examine the posterior difference of pairwise correlations.***if**As an example, I'll use the data regarding Scholastic Aptitude Test (SAT) scores from Guber (1999), explained in Chapter 18 of DBDA2E. We'll consider three data variables: SAT total score (SATT), spending per student (Spend), and percentage of students taking the exam (PrcntTake). Here's a plot of the data from different perspectives:

*This is straightforward in JAGS, because JAGS has a multivariate normal distribution built in. The model specification includes the likelihood function:***Instead of doing linear regression of SAT on Spend and PrcntTake, we'll describe the joint distribution as a multivariate normal.****zy[i,1:Nvar] ~ dmnorm( zMu[1:Nvar] , zInvCovMat[1:Nvar,1:Nvar] )**where the "z" prefixes of the variable names indicate that the data have been standardized inside JAGS. In the code above, zy[i,:] is the

*i*^{th}data point (a vector), zMu[:] is the vector of estimated means, and zInvCovMat[:,:] is the*inverse*covariance matrix. Just as JAGS parameterizes the normal distribution by the*inverse*variance (i.e., the precision), JAGS parameterizes the multivariate normal by the inverse covariance.The prior on each mean is a broad normal. The prior on the inverse covariance matrices is a generic Wishart distribution, which is a multidimensional generalization of a gamma distribution. The Wishart is a distribution over matrices. The Wishart is built into JAGS.

The R code appended below provides the complete script, including the model specification and the graphics commands. The script needs the utilities script from DBDA2E for opening graphics windows.

Here is some selected output from the program. In particular, the program produces plots of the posterior distributions of the pairwise correlations, presented with the data and level contours from the multivariate normal:

Finally, here is a plot of the posterior of the difference of two correlations:

You should consider differences of correlations only if it's really meaningful to do so. Just as with taking differences of standardized regression coefficients, taking differences of correlations is assuming that the correlations are "on the same scale" which only makes sense to the extent that one standard deviation of SAT is "the same" as one standard deviation of Spending and is "the same" as one standard deviation of Percentage Taking the Exam.

The full R script follows. Add a comment to this post if this script is useful to you. Enjoy!

**# Jags-MultivariateNormal.R**

# John Kruschke, November 2015 - June 2017.

# For further info, see:

# Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:

# A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.

# Optional generic preliminaries:

graphics.off() # This closes all of R's graphics windows.

rm(list=ls()) # Careful! This clears all of R's memory!

#--------------------------------------------------------------------------

# Load the data:

#--------------------------------------------------------------------------

myData = read.csv("Guber1999data.csv") # must have file in curr. work. dir.

# y must have named columns, with no missing values!

y = myData[,c("SATT","Spend","PrcntTake")]

#----------------------------------------------------------------------------

# The rest can remain unchanged, except for the specification of difference of

# correlations at the very end.

#----------------------------------------------------------------------------

# Load some functions used below:

source("DBDA2E-utilities.R") # Must be in R's current working directory.

# Install the ellipse package if not already:

want = c("ellipse")

have = want %in% rownames(installed.packages())

if ( any(!have) ) { install.packages( want[!have] ) }

# Standardize the data:

sdOrig = apply(y,2,sd)

meanOrig = apply(y,2,mean)

zy = apply(y,2,function(yVec){(yVec-mean(yVec))/sd(yVec)})

# Assemble data for sending to JAGS:

dataList = list(

zy = zy ,

Ntotal = nrow(zy) ,

Nvar = ncol(zy) ,

# Include original data info for transforming to original scale:

sdOrig = sdOrig ,

meanOrig = meanOrig ,

# For wishart (dwish) prior on inverse covariance matrix:

zRscal = ncol(zy) , # for dwish prior

zRmat = diag(x=1,nrow=ncol(zy)) # Rmat = diag(apply(y,2,var))

)

# Define the model:

modelString = "

model {

for ( i in 1:Ntotal ) {

zy[i,1:Nvar] ~ dmnorm( zMu[1:Nvar] , zInvCovMat[1:Nvar,1:Nvar] )

}

for ( varIdx in 1:Nvar ) { zMu[varIdx] ~ dnorm( 0 , 1/2^2 ) }

zInvCovMat ~ dwish( zRmat[1:Nvar,1:Nvar] , zRscal )

# Convert invCovMat to sd and correlation:

zCovMat <- inverse( zInvCovMat )

for ( varIdx in 1:Nvar ) { zSigma[varIdx] <- sqrt(zCovMat[varIdx,varIdx]) }

for ( varIdx1 in 1:Nvar ) { for ( varIdx2 in 1:Nvar ) {

zRho[varIdx1,varIdx2] <- ( zCovMat[varIdx1,varIdx2]

/ (zSigma[varIdx1]*zSigma[varIdx2]) )

} }

# Convert to original scale:

for ( varIdx in 1:Nvar ) {

sigma[varIdx] <- zSigma[varIdx] * sdOrig[varIdx]

mu[varIdx] <- zMu[varIdx] * sdOrig[varIdx] + meanOrig[varIdx]

}

for ( varIdx1 in 1:Nvar ) { for ( varIdx2 in 1:Nvar ) {

rho[varIdx1,varIdx2] <- zRho[varIdx1,varIdx2]

} }

}

" # close quote for modelString

writeLines( modelString , con="Jags-MultivariateNormal-model.txt" )

# Run the chains:

nChain = 3

nAdapt = 500

nBurnIn = 500

nThin = 10

nStepToSave = 20000

require(rjags)

jagsModel = jags.model( file="Jags-MultivariateNormal-model.txt" ,

data=dataList , n.chains=nChain , n.adapt=nAdapt )

update( jagsModel , n.iter=nBurnIn )

codaSamples = coda.samples( jagsModel ,

variable.names=c("mu","sigma","rho") ,

n.iter=nStepToSave/nChain*nThin , thin=nThin )

# Convergence diagnostics:

parameterNames = varnames(codaSamples) # get all parameter names

for ( parName in parameterNames ) {

diagMCMC( codaObject=codaSamples , parName=parName )

}

# Examine the posterior distribution:

mcmcMat = as.matrix(codaSamples)

chainLength = nrow(mcmcMat)

Nvar = ncol(y)

# Create subsequence of steps through chain for plotting:

stepVec = floor(seq(1,chainLength,length=20))

# Make plots of posterior distribution:

# Preparation -- define useful functions:

library(ellipse)

expandRange = function( x , exMult=0.2 ) {

lowVal = min(x)

highVal = max(x)

wid = max(x)-min(x)

return( c( lowVal - exMult*wid , highVal + exMult*wid ) )

}

for ( varIdx in 1:Nvar ) {

openGraph(width=7,height=3.5)

par( mar=c(3.5,3,2,1) , mgp=c(2.0,0.7,0) )

layout(matrix(1:2,nrow=1))

# Marginal posterior on means:

plotPost( mcmcMat[ , paste0("mu[",varIdx,"]") ] ,

xlab=paste0("mu[",varIdx,"]") ,

main=paste( "Mean of" , colnames(y)[varIdx] ) )

# Marginal posterior on standard deviations:

plotPost( mcmcMat[ , paste0("sigma[",varIdx,"]") ] ,

xlab=paste0("sigma[",varIdx,"]") ,

main=paste( "SD of" , colnames(y)[varIdx] ) )

}

for ( varIdx1 in 1:(Nvar-1) ) {

for ( varIdx2 in (varIdx1+1):Nvar ) {

openGraph(width=7,height=3.5)

par( mar=c(3.5,3,2,1) , mgp=c(2.0,0.7,0) )

layout(matrix(1:2,nrow=1))

# Marginal posterior on correlation coefficient

plotPost( mcmcMat[ , paste0("rho[",varIdx1,",",varIdx2,"]") ] ,

xlab=paste0("rho[",varIdx1,",",varIdx2,"]") ,

main=paste( "Corr. of" , colnames(y)[varIdx1] ,

"and" , colnames(y)[varIdx2] ) )

# Data with posterior ellipse

ellipseLevel = 0.90

plot( y[,c(varIdx1,varIdx2)] , # pch=19 ,

xlim=expandRange(y[,varIdx1]) , ylim=expandRange(y[,varIdx2]) ,

xlab=colnames(y)[varIdx1] , ylab=colnames(y)[varIdx2] ,

main=bquote("Data with posterior "*.(ellipseLevel)*" level contour") )

# Posterior ellipses:

for ( stepIdx in stepVec ) {

points( ellipse( mcmcMat[ stepIdx ,

paste0("rho[",varIdx1,",",varIdx2,"]") ] ,

scale=mcmcMat[ stepIdx ,

c( paste0("sigma[",varIdx1,"]") ,

paste0("sigma[",varIdx2,"]") ) ] ,

centre=mcmcMat[ stepIdx ,

c( paste0("mu[",varIdx1,"]") ,

paste0("mu[",varIdx2,"]") ) ] ,

level=ellipseLevel ) ,

type="l" , col="skyblue" , lwd=1 )

}

# replot data:

points( y[,c(varIdx1,varIdx2)] )

}

}

# Show data descriptives on console:

cor( y )

apply(y,2,mean)

apply(y,2,sd)

#-----------------------------------------------------------------------------

# Difference of correlations. Change the specification of indices as

# appropriate to your variable names and interests.

#-----------------------------------------------------------------------------

# Specify indices of desired correlations:

rhoAidx1 = which(colnames(y)=="SATT")

rhoAidx2 = which(colnames(y)=="Spend")

rhoBidx1 = which(colnames(y)=="SATT")

rhoBidx2 = which(colnames(y)=="PrcntTake")

# Make plot of posterior difference:

openGraph(height=3.5,width=4)

par( mar=c(3.5,1,2,1) , mgp=c(2.0,0.7,0) )

plotPost( mcmcMat[ , paste0("rho[",rhoAidx1,",",rhoAidx2,"]") ]

- mcmcMat[ , paste0("rho[",rhoBidx1,",",rhoBidx2,"]") ],

xlab=paste0(

"rho[",rhoAidx1,",",rhoAidx2,"] - rho[",rhoBidx1,",",rhoBidx2,"]" ) ,

main=bquote( rho[ list( .(colnames(y)[rhoAidx1]) ,

.(colnames(y)[rhoAidx2]) ) ]

- rho[ list( .(colnames(y)[rhoBidx1]) ,

.(colnames(y)[rhoBidx2]) ) ] ) ,

cex.main=1.5 , ROPE=c(-0.1,0.1) )

#---------------------------------------------------------------------------# John Kruschke, November 2015 - June 2017.

# For further info, see:

# Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:

# A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.

# Optional generic preliminaries:

graphics.off() # This closes all of R's graphics windows.

rm(list=ls()) # Careful! This clears all of R's memory!

#--------------------------------------------------------------------------

# Load the data:

#--------------------------------------------------------------------------

myData = read.csv("Guber1999data.csv") # must have file in curr. work. dir.

# y must have named columns, with no missing values!

y = myData[,c("SATT","Spend","PrcntTake")]

#----------------------------------------------------------------------------

# The rest can remain unchanged, except for the specification of difference of

# correlations at the very end.

#----------------------------------------------------------------------------

# Load some functions used below:

source("DBDA2E-utilities.R") # Must be in R's current working directory.

# Install the ellipse package if not already:

want = c("ellipse")

have = want %in% rownames(installed.packages())

if ( any(!have) ) { install.packages( want[!have] ) }

# Standardize the data:

sdOrig = apply(y,2,sd)

meanOrig = apply(y,2,mean)

zy = apply(y,2,function(yVec){(yVec-mean(yVec))/sd(yVec)})

# Assemble data for sending to JAGS:

dataList = list(

zy = zy ,

Ntotal = nrow(zy) ,

Nvar = ncol(zy) ,

# Include original data info for transforming to original scale:

sdOrig = sdOrig ,

meanOrig = meanOrig ,

# For wishart (dwish) prior on inverse covariance matrix:

zRscal = ncol(zy) , # for dwish prior

zRmat = diag(x=1,nrow=ncol(zy)) # Rmat = diag(apply(y,2,var))

)

# Define the model:

modelString = "

model {

for ( i in 1:Ntotal ) {

zy[i,1:Nvar] ~ dmnorm( zMu[1:Nvar] , zInvCovMat[1:Nvar,1:Nvar] )

}

for ( varIdx in 1:Nvar ) { zMu[varIdx] ~ dnorm( 0 , 1/2^2 ) }

zInvCovMat ~ dwish( zRmat[1:Nvar,1:Nvar] , zRscal )

# Convert invCovMat to sd and correlation:

zCovMat <- inverse( zInvCovMat )

for ( varIdx in 1:Nvar ) { zSigma[varIdx] <- sqrt(zCovMat[varIdx,varIdx]) }

for ( varIdx1 in 1:Nvar ) { for ( varIdx2 in 1:Nvar ) {

zRho[varIdx1,varIdx2] <- ( zCovMat[varIdx1,varIdx2]

/ (zSigma[varIdx1]*zSigma[varIdx2]) )

} }

# Convert to original scale:

for ( varIdx in 1:Nvar ) {

sigma[varIdx] <- zSigma[varIdx] * sdOrig[varIdx]

mu[varIdx] <- zMu[varIdx] * sdOrig[varIdx] + meanOrig[varIdx]

}

for ( varIdx1 in 1:Nvar ) { for ( varIdx2 in 1:Nvar ) {

rho[varIdx1,varIdx2] <- zRho[varIdx1,varIdx2]

} }

}

" # close quote for modelString

writeLines( modelString , con="Jags-MultivariateNormal-model.txt" )

# Run the chains:

nChain = 3

nAdapt = 500

nBurnIn = 500

nThin = 10

nStepToSave = 20000

require(rjags)

jagsModel = jags.model( file="Jags-MultivariateNormal-model.txt" ,

data=dataList , n.chains=nChain , n.adapt=nAdapt )

update( jagsModel , n.iter=nBurnIn )

codaSamples = coda.samples( jagsModel ,

variable.names=c("mu","sigma","rho") ,

n.iter=nStepToSave/nChain*nThin , thin=nThin )

# Convergence diagnostics:

parameterNames = varnames(codaSamples) # get all parameter names

for ( parName in parameterNames ) {

diagMCMC( codaObject=codaSamples , parName=parName )

}

# Examine the posterior distribution:

mcmcMat = as.matrix(codaSamples)

chainLength = nrow(mcmcMat)

Nvar = ncol(y)

# Create subsequence of steps through chain for plotting:

stepVec = floor(seq(1,chainLength,length=20))

# Make plots of posterior distribution:

# Preparation -- define useful functions:

library(ellipse)

expandRange = function( x , exMult=0.2 ) {

lowVal = min(x)

highVal = max(x)

wid = max(x)-min(x)

return( c( lowVal - exMult*wid , highVal + exMult*wid ) )

}

for ( varIdx in 1:Nvar ) {

openGraph(width=7,height=3.5)

par( mar=c(3.5,3,2,1) , mgp=c(2.0,0.7,0) )

layout(matrix(1:2,nrow=1))

# Marginal posterior on means:

plotPost( mcmcMat[ , paste0("mu[",varIdx,"]") ] ,

xlab=paste0("mu[",varIdx,"]") ,

main=paste( "Mean of" , colnames(y)[varIdx] ) )

# Marginal posterior on standard deviations:

plotPost( mcmcMat[ , paste0("sigma[",varIdx,"]") ] ,

xlab=paste0("sigma[",varIdx,"]") ,

main=paste( "SD of" , colnames(y)[varIdx] ) )

}

for ( varIdx1 in 1:(Nvar-1) ) {

for ( varIdx2 in (varIdx1+1):Nvar ) {

openGraph(width=7,height=3.5)

par( mar=c(3.5,3,2,1) , mgp=c(2.0,0.7,0) )

layout(matrix(1:2,nrow=1))

# Marginal posterior on correlation coefficient

plotPost( mcmcMat[ , paste0("rho[",varIdx1,",",varIdx2,"]") ] ,

xlab=paste0("rho[",varIdx1,",",varIdx2,"]") ,

main=paste( "Corr. of" , colnames(y)[varIdx1] ,

"and" , colnames(y)[varIdx2] ) )

# Data with posterior ellipse

ellipseLevel = 0.90

plot( y[,c(varIdx1,varIdx2)] , # pch=19 ,

xlim=expandRange(y[,varIdx1]) , ylim=expandRange(y[,varIdx2]) ,

xlab=colnames(y)[varIdx1] , ylab=colnames(y)[varIdx2] ,

main=bquote("Data with posterior "*.(ellipseLevel)*" level contour") )

# Posterior ellipses:

for ( stepIdx in stepVec ) {

points( ellipse( mcmcMat[ stepIdx ,

paste0("rho[",varIdx1,",",varIdx2,"]") ] ,

scale=mcmcMat[ stepIdx ,

c( paste0("sigma[",varIdx1,"]") ,

paste0("sigma[",varIdx2,"]") ) ] ,

centre=mcmcMat[ stepIdx ,

c( paste0("mu[",varIdx1,"]") ,

paste0("mu[",varIdx2,"]") ) ] ,

level=ellipseLevel ) ,

type="l" , col="skyblue" , lwd=1 )

}

# replot data:

points( y[,c(varIdx1,varIdx2)] )

}

}

# Show data descriptives on console:

cor( y )

apply(y,2,mean)

apply(y,2,sd)

#-----------------------------------------------------------------------------

# Difference of correlations. Change the specification of indices as

# appropriate to your variable names and interests.

#-----------------------------------------------------------------------------

# Specify indices of desired correlations:

rhoAidx1 = which(colnames(y)=="SATT")

rhoAidx2 = which(colnames(y)=="Spend")

rhoBidx1 = which(colnames(y)=="SATT")

rhoBidx2 = which(colnames(y)=="PrcntTake")

# Make plot of posterior difference:

openGraph(height=3.5,width=4)

par( mar=c(3.5,1,2,1) , mgp=c(2.0,0.7,0) )

plotPost( mcmcMat[ , paste0("rho[",rhoAidx1,",",rhoAidx2,"]") ]

- mcmcMat[ , paste0("rho[",rhoBidx1,",",rhoBidx2,"]") ],

xlab=paste0(

"rho[",rhoAidx1,",",rhoAidx2,"] - rho[",rhoBidx1,",",rhoBidx2,"]" ) ,

main=bquote( rho[ list( .(colnames(y)[rhoAidx1]) ,

.(colnames(y)[rhoAidx2]) ) ]

- rho[ list( .(colnames(y)[rhoBidx1]) ,

.(colnames(y)[rhoBidx2]) ) ] ) ,

cex.main=1.5 , ROPE=c(-0.1,0.1) )

#---------------------------------------------------------------------------

## Sunday, June 11, 2017

### Posterior distribution of predictions in multiple linear regression

In

There are trade-offs doing it in JAGS or afterwards in R. If you do it in JAGS, then you know that the model you're using to generate predictions is exactly the model you're using to describe the data, because you only specify the model once. If you do it after JAGS in R, then you have to code the model all over again in R, and

Please see the

The code produces output plots such as the following (along with numerical values displayed on the command line):

N.B.: The plots only display the first three significant digits; see the numerical output at the command line for more digits.

Enjoy!

**a previous post**I showed how to compute posterior predictions from multiple linear regression*inside JAGS*. In this post I show how to do it after JAGS, in R.There are trade-offs doing it in JAGS or afterwards in R. If you do it in JAGS, then you know that the model you're using to generate predictions is exactly the model you're using to describe the data, because you only specify the model once. If you do it after JAGS in R, then you have to code the model all over again in R, and

**face the peril**of making mistakes because JAGS and R parameterize distributions differently (e.g., in dnorm and dt are parameterized differently in JAGS than in R). On the other hand, if you generate the predictions in JAGS, then you have to specify all the to-be-probed*x*values in advance, along with the data to be fit. If instead you generate predictions in R (after JAGS), then you can re-use the already-generated MCMC chain for to-be-probed*x*values as much as you like, without having to run MCMC all over again. In this post, we face the peril of specifying the model in R.Please see the

**previous post**for background information about this specific application to predicting SAT scores from*spending per student*and*percentage of students taking the exam*. To run the R code below, please open the high-level script Jags-Ymet-XmetMulti-Mrobust-Example.R. Run the script "out of the box" so it uses the SAT data (Guber1999data.csv) and the two predictors mentioned in the previous sentence. Then, at the end of the script, append the following R code:**# Remind self of the predictors:**

show( xName )

# Specify probe values of the predictors. Columns must match predictors actually

# used, in order! Specify as many probe rows as you like:

xProbe = matrix( c( 4.0 , 10 , # Spend , PrcntTake

9.0 , 10 , # Spend , PrcntTake

4.0 , 90 , # Spend , PrcntTake

9.0 , 10 ) , # Spend , PrcntTake

ncol=length(xName) , byrow=TRUE )

# Convert mcmc coda object to matrix:

mcmcMat = as.matrix( mcmcCoda )

# Show column names of mcmcMat. Notice they include beta0, beta[1], beta[2],

# ..., which can be extracted using grep("^beta",colnames(mcmcMat)):

colnames( mcmcMat )

grep("^beta",colnames(mcmcMat),value=TRUE)

# Now, go through rows of xProbe matrix and display predicted mu and y:

for ( xProbeIdx in 1:nrow(xProbe) ) {

# Predicted mu is b0+b1*x1+b2*x2+... at every step in chain:

muPred = ( mcmcMat[,grep("^beta",colnames(mcmcMat))]

%*% matrix(c(1,xProbe[xProbeIdx,]),ncol=1) )

# Predicted y is is pred mu plus t-distrib noise at every step in chain:

yPred = muPred + mcmcMat[,"sigma"] * rt(nrow(mcmcMat),df=mcmcMat[,"nu"])

# Make plot of posterior distribution of prediction. Notice that plotPost()

# also returns numerical values to console.

openGraph(width=3.5,height=5)

layout(matrix(1:3,nrow=3),heights=c(1.5,2,2))

par( mar=c(4,1,1,1) , mgp=c(2.0,0.7,0) , xpd=NA )

plot(0,0,xaxt='n',yaxt='n',bty='n',pch='',ylab='',xlab='')

text(0,0, adj=c(0.5,0.7),

paste0(xName,"=",xProbe[xProbeIdx,],collapse="\n"), cex=1.25 )

xLim = quantile( yPred , probs=c(0.005,0.995) )

muPredInfo = plotPost( muPred ,xlab="Predicted mu" , xlim=xLim ,

main="" )

show( muPredInfo )

yPredInfo = plotPost( yPred , xlab="Predicted y" , xlim=xLim ,

main="" )

show( yPredInfo )

}show( xName )

# Specify probe values of the predictors. Columns must match predictors actually

# used, in order! Specify as many probe rows as you like:

xProbe = matrix( c( 4.0 , 10 , # Spend , PrcntTake

9.0 , 10 , # Spend , PrcntTake

4.0 , 90 , # Spend , PrcntTake

9.0 , 10 ) , # Spend , PrcntTake

ncol=length(xName) , byrow=TRUE )

# Convert mcmc coda object to matrix:

mcmcMat = as.matrix( mcmcCoda )

# Show column names of mcmcMat. Notice they include beta0, beta[1], beta[2],

# ..., which can be extracted using grep("^beta",colnames(mcmcMat)):

colnames( mcmcMat )

grep("^beta",colnames(mcmcMat),value=TRUE)

# Now, go through rows of xProbe matrix and display predicted mu and y:

for ( xProbeIdx in 1:nrow(xProbe) ) {

# Predicted mu is b0+b1*x1+b2*x2+... at every step in chain:

muPred = ( mcmcMat[,grep("^beta",colnames(mcmcMat))]

%*% matrix(c(1,xProbe[xProbeIdx,]),ncol=1) )

# Predicted y is is pred mu plus t-distrib noise at every step in chain:

yPred = muPred + mcmcMat[,"sigma"] * rt(nrow(mcmcMat),df=mcmcMat[,"nu"])

# Make plot of posterior distribution of prediction. Notice that plotPost()

# also returns numerical values to console.

openGraph(width=3.5,height=5)

layout(matrix(1:3,nrow=3),heights=c(1.5,2,2))

par( mar=c(4,1,1,1) , mgp=c(2.0,0.7,0) , xpd=NA )

plot(0,0,xaxt='n',yaxt='n',bty='n',pch='',ylab='',xlab='')

text(0,0, adj=c(0.5,0.7),

paste0(xName,"=",xProbe[xProbeIdx,],collapse="\n"), cex=1.25 )

xLim = quantile( yPred , probs=c(0.005,0.995) )

muPredInfo = plotPost( muPred ,xlab="Predicted mu" , xlim=xLim ,

main="" )

show( muPredInfo )

yPredInfo = plotPost( yPred , xlab="Predicted y" , xlim=xLim ,

main="" )

show( yPredInfo )

}

The code produces output plots such as the following (along with numerical values displayed on the command line):

N.B.: The plots only display the first three significant digits; see the numerical output at the command line for more digits.

Enjoy!

**Reminders of some recent posts:**## Monday, May 29, 2017

### New version of BEST package (Bayesian estimation of two groups)

**Mike Meredith**has updated the

**BEST package**again!

```
CHANGES in v.0.5.0 (2017-05-28)
* 'BESTmcmc' uses 'rjags' directly, instead of 'jagsUI' wrappers. This resolves 'set.seed' issues, but values returned will not be the same as with previous versions.
* Function 'hdi' removed; imports HDInterval::hdi instead.
```

Mike created and updates the BEST package entirely on his own. He has implemented some wonderful functionality and protections in the BEST package that are not in the original R code I created to accompany the

**BEST article**.

*Thanks Mike!*

**Reminders of some recent posts:**

## Thursday, May 25, 2017

### Looking for great teachers of Bayesian statistics

I'm looking for great teachers of Bayesian statistics!

Thanks!

P.S. This applies to instructors anywhere on the planet; just need to be fluent in English for teaching.

- Have you taught a course in Bayesian statistics?
- Have you used DBDA2E?
- Are you an enthusiastic teacher with very high evaluations?
- Do you have an abiding interest in a range of applied statistical methods?

*(johnkruschke@gmail.com). Even if you think I should already know about you, please send me an email now so I can be sure to put you on the list. If you know of someone else who has those qualities -- maybe a colleague, or maybe a teacher in a class you've taken -- please tell me about that person.***please send me an email**Thanks!

P.S. This applies to instructors anywhere on the planet; just need to be fluent in English for teaching.

Subscribe to:
Posts (Atom)