Hierarchical Random Effects in Meta-Analyses: Do they change stuff?

Apr 23, 2025

Introduction
Base Model
- Two Level Model: Paper
- Three Level Model: Paper and Experiment
Moderator Model from the Paper
- Two Level Model: Paper
- Three Level Model: Paper and Experiment

Introduction

Each effect size is nested within an experiment which is in turn nested within a paper (this includes unpublished reports, theses, and the likes). It can be assumed that effect sizes within these nested structures are not independent. Here we explore whether and how accounting for this possible correlation affects both a random effects base model and a moderator analysis. As example we chose InWordDB.

Base Model

Standard random effects model, no moderators. First we run the model without accounting for any hierarchical structure (as reported in the publication by Bergmann & Cristia 2015; note that differences in effect size estimation are due to an updated dataset here that also includes nonnative studies, as compared to the paper).

library(here)
load(here("shinyapps", "site_data", "Rdata", "metalab.Rdata"))
inworddb <- droplevels(metalab_data[metalab_data$short_name=="inworddb", ])

StandardMod <- rma(g_calc, g_var_calc, data = inworddb)
summary(StandardMod)

## 
## Random-Effects Model (k = 299; tau^2 estimator: REML)
## 
##    logLik   deviance        AIC        BIC       AICc   
## -144.6900   289.3799   293.3799   300.7741   293.4206   
## 
## tau^2 (estimated amount of total heterogeneity): 0.1023 (SE = 0.0115)
## tau (square root of estimated tau^2 value):      0.3199
## I^2 (total heterogeneity / total variability):   76.99%
## H^2 (total variability / sampling variability):  4.35
## 
## Test for Heterogeneity:
## Q(df = 298) = 1143.3988, p-val < .0001
## 
## Model Results:
## 
## estimate      se    zval    pval   ci.lb   ci.ub      
##   0.2099  0.0218  9.6275  <.0001  0.1672  0.2526  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two Level Model: Paper

We first add a level for the paper a given effect size was reported in. These effect sizes presumably stem from a batch of studies that were conducted in the same lab in a very similar fashion and by the same set of experimenters, introducing possible correlations.

PerPaperMod <- rma.mv(g_calc, g_var_calc, random = ~ 1 | short_cite, data = inworddb)
summary(PerPaperMod)

## 
## Multivariate Meta-Analysis Model (k = 299; method: REML)
## 
##    logLik   Deviance        AIC        BIC       AICc   
## -268.1654   536.3308   540.3308   547.7250   540.3715   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed      factor 
## sigma^2    0.0367  0.1916     68     no  short_cite 
## 
## Test for Heterogeneity:
## Q(df = 298) = 1143.3988, p-val < .0001
## 
## Model Results:
## 
## estimate      se    zval    pval   ci.lb   ci.ub      
##   0.1832  0.0264  6.9494  <.0001  0.1315  0.2349  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Three Level Model: Paper and Experiment

Nested within paper, we introduce a level for experiment number. Experiments can report several effect sizes, for example when infants are run in conditions; slight variations of the same study which are presumed to be even more similar than effect sizes within a paper. A caveat is that conventions on what counts as experiment and what counts as conditions within an experiment might differ across papers.

PerExpPaperMod <- rma.mv(g_calc, g_var_calc, random = ~ factor(expt_num) |  short_cite, data = inworddb)
summary(PerExpPaperMod)

## 
## Multivariate Meta-Analysis Model (k = 299; method: REML)
## 
##    logLik   Deviance        AIC        BIC       AICc   
## -211.2785   422.5570   428.5570   439.6483   428.6386   
## 
## Variance Components:
## 
## outer factor: short_cite       (nlvls = 68)
## inner factor: factor(expt_num) (nlvls = 15)
## 
##             estim    sqrt  fixed 
## tau^2      0.0635  0.2521     no 
## rho        0.2483             no 
## 
## Test for Heterogeneity:
## Q(df = 298) = 1143.3988, p-val < .0001
## 
## Model Results:
## 
## estimate      se    zval    pval   ci.lb   ci.ub      
##   0.1837  0.0265  6.9291  <.0001  0.1318  0.2357  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To summarize, all these models differ in their effect size estimates, but do not change the statistical outcome. The effect remains small but significantly above 0. Adding the level of experiment number did not dramatically change the result.

Moderator Model from the Paper

#Centering mean age
inworddb$ageC <- inworddb$mean_age-mean(inworddb$mean_age)

StandardMod <- rma(g_calc, g_var_calc, mod = ageC, data = inworddb)
summary(StandardMod)

## 
## Mixed-Effects Model (k = 299; tau^2 estimator: REML)
## 
##    logLik   deviance        AIC        BIC       AICc   
## -144.5660   289.1320   295.1320   306.2132   295.2139   
## 
## tau^2 (estimated amount of residual heterogeneity):     0.1027 (SE = 0.0116)
## tau (square root of estimated tau^2 value):             0.3205
## I^2 (residual heterogeneity / unaccounted variability): 77.04%
## H^2 (unaccounted variability / sampling variability):   4.36
## R^2 (amount of heterogeneity accounted for):            0.00%
## 
## Test for Residual Heterogeneity:
## QE(df = 297) = 1141.6702, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## QM(df = 1) = 0.3885, p-val = 0.5331
## 
## Model Results:
## 
##          estimate      se    zval    pval    ci.lb   ci.ub      
## intrcpt    0.2099  0.0218  9.6125  <.0001   0.1671  0.2527  *** 
## mods       0.0002  0.0003  0.6233  0.5331  -0.0004  0.0007      
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Two Level Model: Paper

PerPaperMod <- rma.mv(g_calc, g_var_calc, mod = ageC, random = ~ 1 | short_cite, data = inworddb)
summary(PerPaperMod)

## 
## Multivariate Meta-Analysis Model (k = 299; method: REML)
## 
##    logLik   Deviance        AIC        BIC       AICc   
## -263.7595   527.5190   533.5190   544.6002   533.6009   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed      factor 
## sigma^2    0.0414  0.2035     68     no  short_cite 
## 
## Test for Residual Heterogeneity:
## QE(df = 297) = 1141.6702, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## QM(df = 1) = 9.6131, p-val = 0.0019
## 
## Model Results:
## 
##          estimate      se    zval    pval   ci.lb   ci.ub      
## intrcpt    0.1806  0.0277  6.5200  <.0001  0.1263  0.2349  *** 
## mods       0.0007  0.0002  3.1005  0.0019  0.0002  0.0011   ** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Three Level Model: Paper and Experiment

PerExpPaperMod <- rma.mv(g_calc, g_var_calc, mod = ageC, random = ~ factor(expt_num) |  short_cite, data = inworddb)
summary(PerExpPaperMod)

## 
## Multivariate Meta-Analysis Model (k = 299; method: REML)
## 
##    logLik   Deviance        AIC        BIC       AICc   
## -209.1634   418.3268   426.3268   441.1017   426.4638   
## 
## Variance Components:
## 
## outer factor: short_cite       (nlvls = 68)
## inner factor: factor(expt_num) (nlvls = 15)
## 
##             estim    sqrt  fixed 
## tau^2      0.0663  0.2575     no 
## rho        0.2952             no 
## 
## Test for Residual Heterogeneity:
## QE(df = 297) = 1141.6702, p-val < .0001
## 
## Test of Moderators (coefficient 2):
## QM(df = 1) = 4.6265, p-val = 0.0315
## 
## Model Results:
## 
##          estimate      se    zval    pval   ci.lb   ci.ub      
## intrcpt    0.1819  0.0276  6.5924  <.0001  0.1279  0.2360  *** 
## mods       0.0005  0.0003  2.1509  0.0315  0.0000  0.0010    * 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To summarize, introducing hierarchical structure changed the outcome for the moderator test. Age (centered) now has a small, but significant, effect on effect sizes. This is not the case when ignoring the nested structure of effect sizes. This result mirrors the reported analyses in Bergmann & Cristia (2015) that there is a small, positive effect of age when only considering papers which test at least two age groups in the same set-up.