Multilevel Models with Compositional Predictors
Source:vignettes/BcompositionMLM.Rmd
BcompositionMLM.Rmd
In this vignette, we discuss how to use multilevelcoda
to specify multilevel models where compositional data are used as
predictors.
The following table outlines the packages used and a brief description of their purpose.
Package  Purpose 

multilevelcoda 
calculate between and within composition variables, calculate substitutions and plots 
brms 
fit Bayesian multilevel models using Stan as a backend 
bayestestR 
compute Bayes factors used to compare models 
doFuture 
parallel processing to speed up run times 
library(multilevelcoda)
library(brms)
library(bayestestR)
library(doFuture)
options(digits = 3) # reduce number of digits shown
For the examples, we make use of three built in datasets:
Dataset  Purpose 

mcompd 
compositional sleep and wake variables and additional predictors/outcomes (simulated) 
sbp 
a prespecified sequential binary partition, used in calculating compositional predictors 
psub 
all possible pairwise substitutions between compositional variables, used for substitution analyses 
The following table shows a few rows of data from
mcompd
.
ID  Time  Stress  TST  WAKE  MVPA  LPA  SB  Age  Female 

185  1  3.67  542  99.0  297.4  460  41.4  29.7  0 
185  2  7.21  458  49.4  117.3  653  162.3  29.7  0 
185  3  2.84  271  41.1  488.7  625  14.5  29.7  0 
184  12  2.36  286  52.7  106.9  906  89.2  22.3  1 
184  13  1.18  281  18.8  403.0  611  126.3  22.3  1 
184  14  0.00  397  26.5  39.9  587  389.8  22.3  1 
The following table shows the sequential binary partition being used
in sbp
. Columns correspond to the composition variables
(TST, WAKE, MVPA, LPA, SB). Rows correspond to distinct ILR
coordinates.
TST  WAKE  MVPA  LPA  SB 

1  1  1  1  1 
1  1  0  0  0 
0  0  1  1  1 
0  0  0  1  1 
The following table shows how all the possible binary substitutions contrasts are setup. Time substitutions work by taking time from the 1 variable and adding time to the +1 variable.
TST  WAKE  MVPA  LPA  SB 

1  1  0  0  0 
1  0  1  0  0 
1  0  0  1  0 
1  0  0  0  1 
1  1  0  0  0 
0  1  1  0  0 
0  1  0  1  0 
0  1  0  0  1 
1  0  1  0  0 
0  1  1  0  0 
0  0  1  1  0 
0  0  1  0  1 
1  0  0  1  0 
0  1  0  1  0 
0  0  1  1  0 
0  0  0  1  1 
1  0  0  0  1 
0  1  0  0  1 
0  0  1  0  1 
0  0  0  1  1 
Multilevel model with compositional predictors
Compositions and isometric log ratio (ILR) coordinates.
Compositional data are often expressed as a set of isometric log
ratio (ILR) coordinates in regression models. We can use the
compilr()
function to calculate both between and
withinlevel ILR coordinates for use in subsequent models as
predictors.
Notes: compilr()
also calculates total ILR
coordinates to be used as outcomes (or predictors) in models, if the
decomposition into a between and withinlevel ILR coordinates was not
desired.
The compilr()
function for multilevel data requires four
arguments:
Argument  Description 

data 
A long data set containing all variables needed to fit the multilevel models, 
including the repeated measure compositional predictors and outcomes, along with any additional covariates.  
sbp 
A Sequential Binary Partition to calculate \(ilr\) coordinates. 
parts 
The name of the compositional components in data . 
idvar 
The grouping factor on data to compute the
betweenperson and withinperson composition and \(ilr\) coordinates. 
total 
Optional argument to specify the amount to which the compositions should be closed. 
Fitting model
We now will use output from the compilr()
to fit our
brms
model, using the brmcoda()
. Here is a
model predicting Stress
from between and withinperson
sleepwake behaviours (expressed as ILR coordinates).
Notes: make sure you pass the correct names of the ILR
coordinates to brms
model.
m < brmcoda(compilr = cilr,
formula = Stress ~ bilr1 + bilr2 + bilr3 + bilr4 +
wilr1 + wilr2 + wilr3 + wilr4 + (1  ID),
cores = 8, seed = 123, backend = "cmdstanr")
Here is a summary()
of the model results.
summary(m)
#> Family: gaussian
#> Links: mu = identity; sigma = identity
#> Formula: Stress ~ bilr1 + bilr2 + bilr3 + bilr4 + wilr1 + wilr2 + wilr3 + wilr4 + (1  ID)
#> Data: tmp (Number of observations: 3540)
#> Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total postwarmup draws = 4000
#>
#> GroupLevel Effects:
#> ~ID (Number of levels: 266)
#> Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS Tail_ESS
#> sd(Intercept) 1.00 0.06 0.88 1.13 1.00 1333 2488
#>
#> PopulationLevel Effects:
#> Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept 2.57 0.48 1.63 3.48 1.00 1303 1965
#> bilr1 0.17 0.32 0.46 0.77 1.01 1013 1876
#> bilr2 0.41 0.34 0.25 1.07 1.01 1090 1781
#> bilr3 0.13 0.22 0.28 0.55 1.00 1177 1885
#> bilr4 0.04 0.28 0.58 0.49 1.00 1319 2018
#> wilr1 0.34 0.12 0.58 0.10 1.00 3182 3218
#> wilr2 0.05 0.14 0.21 0.31 1.00 3750 3215
#> wilr3 0.11 0.08 0.25 0.05 1.00 3444 3129
#> wilr4 0.24 0.10 0.04 0.43 1.00 4051 3219
#>
#> Family Specific Parameters:
#> Estimate Est.Error l95% CI u95% CI Rhat Bulk_ESS Tail_ESS
#> sigma 2.37 0.03 2.31 2.43 1.00 4599 2577
#>
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
Results show that the first and forth withinperson ILR coordinate
was associated with stress. The interpretation of these outputs depends
on how you construct your sequential binary partition. For the builtin
sequential binary partition sbp
(shown previously), the
resulting interpretation would be as follows:
ILR  Interpretation 

bilr1 
Betweenperson sleep (TST & WAKE ) vs
wake (MVPA , LPA , & SB )
behaviours 
bilr2 
Betweenperson TST vs WAKE

bilr3 
Betweenperson MVPA vs (LPA and
SB ) 
bilr4 
Betweenperson LPA vs SB

wilr1 
Withinperson Sleep (TST &
WAKE ) vs wake (MVPA , LPA , &
SB ) behaviours 
wilr2 
Withinperson TST vs WAKE

wilr3 
Withinperson MVPA vs (LPA and
SB ) 
wilr4 
Withinperson LPA vs SB

Due to the nature of withinperson ILR coordinates, it is often
challenging to interpret these results in great details. For example,
the significant coefficient for wilr1
shows that the
withinperson change in sleep behaviours (sleep duration and time awake
in bed combined), relative to wake behaviours (moderate to vigorous
physical activity, light physical activity, and sedentary behaviour) on
a given day, was associated with stress. However, as there are several
behaviours involved in this coordinate, we don’t know the withinperson
change in which of them drives the association. It could be the change
in sleep, such that people sleep more than their own average on a given
day, but it could also be the change in time awake. Further, we don’t
know about the specific changes in time spent across behaviours. That
is, if people slept more, what behaviour did they spend less time
in?
One approach to gain further insights into these relationships, and the changes in outcomes associated with changes in specific time across compositionl components is the substitution model. We will discuss the substitution model later in this vignette.
Bayes Factor for significance testing
In the frequentist approach, we usually compare the fits of models
using anova()
. In Bayesian, this can be done by comparing
the marginal likelihoods of two models. Bayes Factors (BFs) are indices
of relative evidence of one model over another. In the context of
compositional multilevel modelling, Bayes Factors provide two main
useful functions:
 Testing single parameters within a model
 Comparing models
We may utilize Bayes factors to answer the following question: “Which model (i.e., set of ILR predictors) is more likely to have produced the observed data?”
Let’s fit a series of model with brmcoda()
to predict
Stress
from sleepwake composition. For precise Bayes
factors, we will use 40,000 posterior draws for each model.
Notes : To use Bayes factors, brmsfit
models
must be fitted with an additional nondefault argument
save_pars = save_pars(all = TRUE)
.
# intercept only model
m0 < brmcoda(compilr = cilr,
formula = Stress ~ 1 + (1  ID),
iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
backend = "cmdstanr", save_pars = save_pars(all = TRUE))
# betweenperson composition only model
m1 < brmcoda(compilr = cilr,
formula = Stress ~ bilr1 + bilr2 + bilr3 + bilr4 + (1  ID),
iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
backend = "cmdstanr", save_pars = save_pars(all = TRUE))
# withinperson composition only model
m2 < brmcoda(compilr = cilr,
formula = Stress ~ wilr1 + wilr2 + wilr3 + wilr4 + (1  ID),
iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
backend = "cmdstanr", save_pars = save_pars(all = TRUE))
# full model
m < brmcoda(compilr = cilr,
formula = Stress ~ bilr1 + bilr2 + bilr3 + bilr4 +
wilr1 + wilr2 + wilr3 + wilr4 + (1  ID),
iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
backend = "cmdstanr", save_pars = save_pars(all = TRUE))
We can now compare these models with the
bayesfactor_models()
function, using the interceptonly
model as reference.
comparison < bayesfactor_models(m$Model, m1$Model, m2$Model, denominator = m0$Model)
comparison
#> Bayes Factors for Model Comparison
#>
#> Model BF
#> [1] bilr1 + bilr2 + bilr3 + bilr4 + wilr1 + wilr2 + wilr3 + wilr4 + (1  ID) 3.95
#> [2] bilr1 + bilr2 + bilr3 + bilr4 + (1  ID) 0.325
#> [3] wilr1 + wilr2 + wilr3 + wilr4 + (1  ID) 11.00
#>
#> * Against Denominator: [4] 1 + (1  ID)
#> * Bayes Factor Type: marginal likelihoods (bridgesampling)
We can see that model with only withinperson composition is the best model  with \(BF\) = 11.00 compared to the null (intercept only).
Let’s compare these models against the full model.
update(comparison, reference = 1)
#> Bayes Factors for Model Comparison
#>
#> Model BF
#> [2] bilr1 + bilr2 + bilr3 + bilr4 + (1  ID) 0.082
#> [3] wilr1 + wilr2 + wilr3 + wilr4 + (1  ID) 2.79
#> [4] 1 + (1  ID) 0.253
#>
#> * Against Denominator: [1] bilr1 + bilr2 + bilr3 + bilr4 + wilr1 + wilr2 + wilr3 + wilr4 + (1  ID)
#> * Bayes Factor Type: marginal likelihoods (bridgesampling)
Again, our data favours the withinperson composition only model over the full model, giving 2.79 times more support.
Substitution model
When examining the relationships between compositional data and an
outcome, we often are also interested in the changes in an outcomes when
a fixed duration of time is reallocated from one compositional component
to another, while the other components remain constant. These changes
can be examined using the compositional isotemporal substitution model.
In multilevelcoda
, we extend this model to multilevel
approach to test both betweenperson and withinperson changes. All
substitution models can be computed using the
substitution()
function, with the following arguments:
Argument  Description 

object 
A fitted brmcoda object 
base 
A data.frame or data.table of possible
substitution of variables. 
This data set can be computed using function
possub


delta 
A integer, numeric value or vector indicating the amount of change in compositional parts for substitution 
level 
A character value or vector to specify whether the change in
composition should be at between person and/or
within person levels 
type 
A character value or vector to specify whether the estimated change
in outcome should be conditional or
marginal

regrid 
Optional reference grid consisting of combinations of covariates over which predictions are made. If not provided, the default reference grid is used. 
summary 
A logical value to indicate whether the prediction at each level of the reference grid or an average of them should be returned. 
... 
Additional arguments to be passed to
describe_posterior

Betweenperson substitution model
The below example examines the changes in stress for different pairwise substitution of sleepwake behaviours for 5 minutes, at betweenperson level.
bsubm < substitution(object = m, delta = 5,
level = "between", ref = "grandmean")
The output contains multiple data sets of results for all compositional components. Here are the results for changes in stress when sleep (TST) is substituted for 5 minutes, averaged across levels of covariates.
Mean  CI_low  CI_high  Delta  From  To  Level  Reference 

0.02  0.01  0.05  5  WAKE  TST  between  grandmean 
0.00  0.01  0.02  5  MVPA  TST  between  grandmean 
0.01  0.01  0.02  5  LPA  TST  between  grandmean 
0.01  0.01  0.02  5  SB  TST  between  grandmean 
None of the results are significant, given that the credible intervals did not cross 0, showing that increasing sleep (TST) at the expense of any other behaviours was not associated in changes in stress. Notice there is no column indicating the levels of convariates, indicating that these results have been averaged.
Withinperson substitution model
Let’s now take a look at how stress changes when different pairwise of sleepwake behaviours are substituted for 5 minutes, at withinperson level.
# Withinperson substitution
wsubm < substitution(object = m, delta = 5,
level = "within", ref = "grandmean")
Results for 5 minute substitution.
Mean  CI_low  CI_high  Delta  From  To  Level  Reference 

0.02  0.00  0.03  5  WAKE  TST  within  grandmean 
0.00  0.01  0.00  5  MVPA  TST  within  grandmean 
0.00  0.01  0.00  5  LPA  TST  within  grandmean 
0.00  0.01  0.00  5  SB  TST  within  grandmean 
At withinperson level, there were significant results for substitution of sleep (TST) and time awake in bed (WAKE) for 5 minutes, but not other behaviours. Increasing sleep at the expense of time spent awake in bed predicted 0.02 higher stress [95% CI 0.00, 0.03], on a given day.
More interesting substitution models
You can learn more about different types of substitution models
at
Compositional
Multilevel Substitution Models.