MAIC_anchored.Rmd
The first section of this document goes over an anchored example of matching-adjusted indirect comparison (MAIC) analysis using the MAIC package in R where the endpoint of interest is time-to-event (e.g. overall survival). An “anchored” indirect comparison is when the two studies in comparison share a common treatment arm (i.e. placebo). For more details regarding MAIC, we ask the readers to refer to the unanchored example.
We use the method described in Bender et al. to simulate survival times. [1] The following R code generates a dataset with a treatment indicator, a prognostic variable (\(x_1\)), and two effect modifiers (\(x_2\) and \(x_3\)). We assume that \(x_1\) is not effect modifying, \(x_2\) and \(x_3\) are also prognostic, and all three covariates are binary for simplicity. We simulate baseline hazard from a Weibull distribution and censoring times from an exponential distribution.
# Code is adapted from https://stats.stackexchange.com/questions/135124/how-to-create-a-toy-survival-time-to-event-data-with-right-censoring
# N: sample size
# lambda: scale parameter in baseline hazard function
# rho: shape parameter in baseline hazard function
# rateC: rate parameter of the exponential distribution of C
# beta: vector of coefficients for prognostic factors
# gamma: vector of coefficients for effect modifiers
# d0: coefficient for the treatment effect when X = 0
# xcutoff: vector of points where the continuous variables are cut to make them to binary predictors.
# The bigger the number less frequent the covariates are equal to 1.
simulWeib <- function(N, lambda, rho, rateC, beta, gamma, d0, xcutoff, seed = 1)
{
set.seed(seed) #seed number that is used to make simulation replicable
# generate 3 covariates with some correlation
Sigma <- outer(1:3, 1:3, function(x,y) 0.5^abs(x-y)) #variance covariance
x <- mvrnorm(n = N, rep(0, 3), Sigma)
for(i in 1:3){
x[,i] <- ifelse(x[,i] > xcutoff[i], 1, 0) # make it a binary predictor
}
colnames(x) <- paste0("x", 1:3)
treat <- sample(x=c(0, 1), size=N, replace=TRUE, prob=c(0.5, 0.5))
# linear predictor
lp <- x %*% beta + x %*% gamma * treat + d0 * treat
# Weibull latent event times
v <- runif(n=N)
Tlat <- (-log(v) / (lambda * exp(lp)))^(1 / rho)
# censoring times
C <- rexp(n=N, rate=rateC)
# follow-up times and event indicators
Time <- pmin(Tlat, C)
Event <- as.numeric(Tlat <= C)
# data set
data.frame(USUBJID=1:N, Time=Time, Event=Event,
x1=x[,1], x2 = x[,2], x3 = x[,3], treat = treat)
}
We use notation from National Institute for Health and Care Excellence (NICE) Decision Support Unit (DSU) Technical Support Document (TSD) 18 to denote treatment effects for different population. [2] We denote treatment effect of \(B\) vs \(A\) in target population \(P\) when \(X = 0\) as \(d_{AB(P)}(0)\). Let us generate the first study with only treatments \(A\) and \(B\) and we will denote this study population as \(P=AB\). In this study, we assume that \(d_{AB(AB)}(0)=-0.4\)
# Generate first study
N <- 10000
beta <- c(0.3, 0.2, 0.1)
gamma <- c(0, 0.2, 0.1) # first variable has no effect modification
d0 <- -0.4
xcutoff <- c(0, 0.5, 0.5)
dataAB <- simulWeib(N = N, lambda = 0.1, rho = 1, rateC = 0.01, beta = beta, gamma = gamma, d0 = d0, xcutoff = xcutoff)
dataAB$ARM <- factor(ifelse(dataAB$treat == 1, "B", "A"), levels = c("A", "B"))
In the second study, we slightly change how the covariates (i.e. effect modifiers) are distributed. We use different cutoffs for the covariates to make the baseline characteristics different from the \(AB\) study. In this study, we assume that \(d_{AC(AC)}(0)=-0.7\).
# Generate second study
beta <- c(0.2, 0.2, 0.2)
gamma <- c(0, 0.2, 0.2)
d0 <- -0.7
xcutoff <- c(0.5, 0, -0.5)
dataAC <- simulWeib(N = N, lambda = 0.1, rho = 1, rateC = 0.01, beta = beta, gamma = gamma, d0 = d0, xcutoff = xcutoff)
dataAC$ARM <- factor(ifelse(dataAC$treat == 1, "C", "A"), levels = c("A", "C"))
What is the effect of average treatment effect of B vs A when the target population \(P=AC\)? We denote this by \(d_{AB(AC)}\) and generate it by the data generating mechanism. We use the same effect estimates (beta, gamma, d0) for the treatment effect of B vs A, but use the covariate distribution from AC trial (xcutoff). Then we use cox regression to estimate \(d_{AB(AC)}\).
beta <- c(0.3, 0.2, 0.1)
gamma <- c(0, 0.2, 0.1) # first variable has no effect modification
d0 <- -0.4
xcutoff <- c(0.5, 0, -0.5)
dataAB_AC <- simulWeib(N = N, lambda = 0.1, rho = 1, rateC = 0.01, beta = beta, gamma = gamma, d0 = d0, xcutoff = xcutoff)
dataAB_AC$ARM <- factor(ifelse(dataAB_AC$treat == 1, "B", "A"), levels = c("A", "B"))
unweighted_cox <- coxph(Surv(Time, Event==1) ~ treat, data = dataAB_AC)
d_AB_AC <- summary(unweighted_cox)
d_AB_AC
#> Call:
#> coxph(formula = Surv(Time, Event == 1) ~ treat, data = dataAB_AC)
#>
#> n= 10000, number of events= 9193
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> treat -0.22588 0.79782 0.02101 -10.75 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> treat 0.7978 1.253 0.7656 0.8313
#>
#> Concordance= 0.527 (se = 0.003 )
#> Likelihood ratio test= 115.6 on 1 df, p=<2e-16
#> Wald test = 115.6 on 1 df, p=<2e-16
#> Score (logrank) test = 116.1 on 1 df, p=<2e-16
By using the data generating mechanism, we find that the estimate \(\hat{d}_{AB(AC)}=\) -0.2259.
Although we have full data for the \(AC\) dataset, let us assume that we only have aggregate level data for the \(AC\) study. Our task is to estimate \(d_{AB(AC)}\) with individual patient data of \(AB\) study and aggregate level summary of \(AC\) study through utilizing MAIC. Below, we calculate aggregate average of covariates for each population.
colMeans(dataAB[,c("x1", "x2", "x3")])
#> x1 x2 x3
#> 0.5046 0.3064 0.3113
colMeans(dataAC[,c("x1", "x2", "x3")])
#> x1 x2 x3
#> 0.3117 0.5042 0.6907
We apply the same steps as the unanchored example to find the weights to adjust for differences in effect modifiers in \(AB\) and \(AC\) trials.
intervention_input <- as_tibble(dataAB)
target_pop <- tibble(N = N, Treatment = "Comparator",
x1 = mean(dataAC$x1), x2 = mean(dataAC$x2),
x3 = mean(dataAC$x3))
all_cov <- c("x1", "x2", "x3")
for(i in all_cov){
intervention_input[[paste0(i, "_centered")]] <- intervention_input[[i]] - as.numeric(target_pop[,i])
}
# Need to update match_cov parameter if we change covariates we are using to match
match_cov <- c("x2", "x3")
cent_match_cov <- paste0(match_cov, "_centered")
select <- dplyr::select #deal with select function clashes between dplyr and MASS packages
intervention_data <- intervention_input %>%
select(USUBJID, ARM, Time, Event, all_of(match_cov), all_of(cent_match_cov))
# remove any NA; or deal with missing data problem via multiple imputation if simply removing NA is problematic
#intervention_data <- intervention_data[complete.cases(intervention_data),]
est_weights <- estimate_weights(intervention_data = intervention_data,
matching_vars = cent_match_cov)
intervention_data$wt <- est_weights$analysis_data$wt
wt_diagnostics(as.data.frame(est_weights$analysis_data), vars = match_cov)
#> $ESS
#> [1] 5813.605
#>
#> $Summary_of_weights
#> type mean sd median min max
#> 1 Weights 0.7285302 0.6182536 0.3008227 0.3008227 1.882276
#> 2 Rescaled weights 1.0000000 0.8486314 0.4129173 0.4129173 2.583662
#>
#> $Weight_profiles
#> x2 x3 wt wt_rs
#> 1 0 0 0.3008227 0.4129173
#> 2 1 0 0.4269752 0.5860776
#> 3 0 1 1.3261454 1.8203027
#> 4 1 1 1.8822756 2.5836618
check_weights(est_weights$analysis_data, match_cov, target_pop)
#> ARM ESS x2 x3
#> 1 Intervention 10000 0.3064000 0.3113000
#> 2 Intervention_weighted 5814 0.5041816 0.6907061
#> 3 Comparator 10000 0.5042000 0.6907000
Now that weights has been calculated, we can calculate the weighted hazard ratios.
unweighted_cox <- coxph(Surv(Time, Event==1) ~ ARM, data = intervention_data)
weighted_cox <- coxph(Surv(Time, Event==1) ~ ARM, data = intervention_data, weights = wt)
HR_AB <- find_RR(unweighted_cox)
HR_AB_MAIC <- find_RR(weighted_cox)
We find that the unweighted hazard ratio is -0.3004 and the weighted hazard ratio adjusting for the imbalance of effect modifiers to match the AC trial is -0.2257. Using the weights, hazard ratio is closer to the estimated hazard ratio from data generating mechanism, which was -0.2259.
If AC trial was the comparator trial and only aggregate level summary was reported, we might only have information about the hazard ratio and confidence interval from the publication.
unweighted_cox <- coxph(Surv(Time, Event==1) ~ treat, data = dataAC)
RR_AC <- find_RR(unweighted_cox)
RR_AC
#> $HR
#> treat
#> 0.6377071
#>
#> $HR_SE
#> treat
#> 0.01365808
#>
#> $HR_CI
#> lower upper
#> 0.6114918 0.6650463
#>
#> $logHR
#> treat
#> -0.4498762
#>
#> $logHR_SE
#> [1] 0.02141749
For instance, we would only know that for AC trial, hazard ratio was 0.64 and 95% confidence interval was [0.61, 0.67]. However, what we need for the indirect treatment comparison is log hazard ratio and its standard error. We use the following function to calculate these estimates.
HR_AC <- RR_AC$HR
HR_CI_AC <- RR_AC$HR_CI
HR_AC <- find_SE_fromCI(Estimate = HR_AC, CI_lower = HR_CI_AC[1], CI_upper = HR_CI_AC[2], CI_perc = 0.95)
HR_AC
#> $HR
#> treat
#> 0.6377071
#>
#> $HR_SE
#> upper
#> 0.01365808
#>
#> $HR_CI
#> lower upper
#> 0.6114918 0.6650463
#>
#> $logHR
#> treat
#> -0.4498762
#>
#> $logHR_SE
#> upper
#> 0.02141749
In the previous section, what we have estimated was \(d_{AB(AC)}\), but what we are more interested is the indirect treatment comparison of C vs B. We can calculate \(\hat{d}_{BC(AC)}\) from \(\hat{d}_{BC(AC)}\) = \(\hat{d}_{AC(AC)}\) - \(\hat{d}_{AB(AC)}\). We can calculate this indirect effect of C vs B through the function below.
# Naive indirect effect of C vs B
find_ITC(AB = HR_AB, AC = HR_AC)
#> $HR
#> treat
#> 0.8611307
#>
#> $HR_SE
#> upper
#> 0.02590627
#>
#> $HR_CI
#> lower upper
#> 0.8118233 0.9134329
#>
#> $logHR
#> treat
#> -0.1495089
#>
#> $logHR_SE
#> upper
#> 0.03008401
# Matching adjusted indirect comparison of C vs B
find_ITC(AB = HR_AB_MAIC, AC = HR_AC)
#> $HR
#> treat
#> 0.7991365
#>
#> $HR_SE
#> upper
#> 0.02604207
#>
#> $HR_CI
#> lower upper
#> 0.7496909 0.8518433
#>
#> $logHR
#> treat
#> -0.2242235
#>
#> $logHR_SE
#> upper
#> 0.03258776
Similarly as in the unanchored example, we can estimate a 95% confidence interval (CI) from the bootstrap samples. We will use Percentile CIs, but we could have also used Bias-corrected and accelerated CIs.
library(boot)
R <- 100
HR_bootstraps <- boot(data = intervention_data,
statistic = bootstrap_HR_anchored,
R=R, # number of bootstrap samples
matching = cent_match_cov, # matching variables
ref_treatment = "A", # reference treatment name
model = Surv(Time, Event==1) ~ ARM # model to fit
)
RR_AB_MAIC_BOOT <- list(logHR = median(HR_bootstraps$t), logHR_SE = sd(HR_bootstraps$t))
RR_BC_MAIC_BOOT <- find_ITC(AB = RR_AB_MAIC_BOOT, AC = RR_AC)
RR_BC_MAIC_BOOT
#> $HR
#> treat
#> 0.8000753
#>
#> $HR_SE
#> treat
#> 0.02780164
#>
#> $HR_CI
#> lower upper
#> 0.7473992 0.8564639
#>
#> $logHR
#> treat
#> -0.2230495
#>
#> $logHR_SE
#> [1] 0.03474878