Title: | Microbiome Regression-Based Kernel Association Tests |
---|---|
Description: | Test for overall association between microbiome composition data and phenotypes via phylogenetic kernels. The phenotype can be univariate continuous or binary (Zhao et al. (2015) <doi:10.1016/j.ajhg.2015.04.003>), survival outcomes (Plantinga et al. (2017) <doi:10.1186/s40168-017-0239-9>), multivariate (Zhan et al. (2017) <doi:10.1002/gepi.22030>) and structured phenotypes (Zhan et al. (2017) <doi:10.1111/biom.12684>). The package can also use robust regression (unpublished work) and integrated quantile regression (Wang et al. (2021) <doi:10.1093/bioinformatics/btab668>). In each case, the microbiome community effect is modeled nonparametrically through a kernel function, which can incorporate phylogenetic tree information. |
Authors: | Anna Plantinga [aut, cre], Nehemiah Wilson [aut, ctb], Haotian Zheng [aut, ctb], Tianying Wang [aut, ctb], Xiang Zhan [aut, ctb], Michael Wu [aut], Ni Zhao [aut, ctb], Jun Chen [aut] |
Maintainer: | Anna Plantinga <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2.3 |
Built: | 2024-10-29 04:32:55 UTC |
Source: | https://github.com/cran/MiRKAT |
Simulated DEPENDENT data with BINOMIAL traits for correlated regression-based analysis (i.e. CSKAT, GLMMMiRKAT)
data(bindata)
data(bindata)
A list containing three data objects for correlated microbiome data with binary response variable (described below).
Simulated OTU data for correlated regression-based analysis; 59 rows and 730 columns, rows being patients and columns being OTUs
Simulated metadata for correlated regression-based analysis; 59 rows and 4 columns, rows being patients and columns being the outcome variable, subject identifier, and covariates to possibly account for in any regression modeling
Simulated rooted phylogenetic tree with 730 tips and 729 nodes
Compute the adjusted score statistic and p-value
CSKAT(lmer.obj, Ks)
CSKAT(lmer.obj, Ks)
lmer.obj |
A fitted lme4 object (model under H0) |
Ks |
A kernel matrix or list of kernels, quantifying the similarities between samples. |
Association p-values
Adjusted score statistics
Nehemiah Wilson, Anna Plantinga, Xiang Zhan, Jun Chen.
Zhan X, et al. (2018) A small-sample kernel association test for correlated data with application to microbiome association studies. Genet Epidemiol.
Construct kernel matrix from distance matrix.
D2K(D)
D2K(D)
D |
An n by n matrix giving pairwise distances or dissimilarites, where n is sample size. |
Converts a distance matrix (matrix of pairwise distances) into a kernel matrix for microbiome data. The kernel matrix is constructed as , where D is the pairwise distance matrix, I is the identity
matrix, and 1 is a vector of ones.
represents element-wise square.
To ensure that is positive semi-definite, a positive semi-definiteness correction is conducted
An n by n kernel or similarity matrix corresponding to the distance matrix given.
Ni Zhao
Zhao, Ni, et al. "Testing in microbiome-profiling studies with MiRKAT, the microbiome regression-based kernel association test
library(GUniFrac) #Load in data and create a distance matrix data(throat.tree) data(throat.otu.tab) unifracs <- GUniFrac(throat.otu.tab, throat.tree, alpha=c(1))$unifracs D1 <- unifracs[,,"d_1"] #Function call K <- D2K(D1)
library(GUniFrac) #Load in data and create a distance matrix data(throat.tree) data(throat.otu.tab) unifracs <- GUniFrac(throat.otu.tab, throat.tree, alpha=c(1))$unifracs D1 <- unifracs[,,"d_1"] #Function call K <- D2K(D1)
GLMMMiRKAT utilizes a generalized linear mixed model to allow dependence among samples.
GLMMMiRKAT( y, X = NULL, Ks, id = NULL, time.pt = NULL, model, method = "perm", slope = FALSE, omnibus = "perm", nperm = 5000 )
GLMMMiRKAT( y, X = NULL, Ks, id = NULL, time.pt = NULL, model, method = "perm", slope = FALSE, omnibus = "perm", nperm = 5000 )
y |
A numeric vector of Gaussian (e.g., body mass index), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. |
X |
A vector or matrix of numeric covariates, if applicable (default = NULL). |
Ks |
A list of n-by-n OTU kernel matrices or one singular n-by-n OTU kernel matrix, where n is sample size. |
id |
A vector of cluster (e.g., family or subject including repeated measurements) IDs. Defaults to NULL since it is unnecessary for the CSKAT call. |
time.pt |
A vector of time points for the longitudinal studies. 'time.pt' is not required (i.e., 'time.pt = NULL') for the random intercept model. Default is time.pt = NULL. |
model |
A string declaring which model ("gaussian", "binomial" or "poisson") is to be used; should align with whether a Gaussian, Binomial, or Poisson trait is being inputted for the y argument. |
method |
A string declaring which method ("perm" or "davies) will be used to calculate the p-value. Davies is only available for Gaussian traits. Defaults to "perm". |
slope |
An indicator to include random slopes in the model (slope = TRUE) or not (slope = FALSE). 'slope = FALSE' is for the random intercept model. 'slope = TRUE' is for the random slope model. For the random slope model (slope = TRUE), 'time.pt' is required. |
omnibus |
A string equal to either "Cauchy" or "permutation" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or residual permutation to generate the omnibus p-value. |
nperm |
The number of permutations used to calculate the p-values and omnibus p-value. Defaults to 5000. |
Missing data is not permitted. Please remove all individuals with missing y, X, and Ks prior to input for analysis.
y and X (if not NULL) should be numerical matrices or vectors with the same number of rows.
Ks should either be a list of n by n kernel matrices (where n is sample size) or a single kernel matrix. If you have distance matrices from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches.
If model="gaussian" and method="davies", CSKAT is called. CSKAT utilizes the same omnibus test as GLMMMiRKAT. See ?CSKAT for more details.
The "method" argument only determines kernel-specific p-values are generated. When Ks is a list of multiple kernels, an omnibus p-value is computed via permutation.
Returns a p-value for each inputted kernel matrix, as well as an overall omnibus p-value if more than one kernel matrix is inputted
p_values |
p-value for each individual kernel matrix |
omnibus_p |
overall omnibus p-value calculated by permutation for the adaptive GLMMMiRKAT analysis |
Hyunwook Koh
Koh H, Li Y, Zhan X, Chen J, Zhao N. (2019) A distance-based kernel association test based on the generalized linear mixed model for correlated microbiome studies. Front. Genet. 458(10), 1-14.
## Example with Gaussian (e.g., body mass index) traits ## For non-Gaussian traits, see vignette. # Import example microbiome data with Gaussian traits data(nordata) otu.tab <- nordata$nor.otu.tab meta <- nordata$nor.meta # Create kernel matrices and run analysis if (requireNamespace("vegan")) { library(vegan) D_BC = as.matrix(vegdist(otu.tab, 'bray')) K_BC = D2K(D_BC) GLMMMiRKAT(y = meta$y, X = cbind(meta$x1, meta$x2), id = meta$id, Ks = K_BC, model = "gaussian", nperm = 500) } else { # Computation time is longer for phylogenetic kernels tree <- nordata$nor.tree unifracs <- GUniFrac::GUniFrac(otu.tab, tree, alpha=c(1))$unifracs D_W <- unifracs[,,"d_1"] K_W = D2K(D_W) GLMMMiRKAT(y = meta$y, X = cbind(meta$x1, meta$x2), id = meta$id, Ks = K_W, model = "gaussian", nperm = 500) }
## Example with Gaussian (e.g., body mass index) traits ## For non-Gaussian traits, see vignette. # Import example microbiome data with Gaussian traits data(nordata) otu.tab <- nordata$nor.otu.tab meta <- nordata$nor.meta # Create kernel matrices and run analysis if (requireNamespace("vegan")) { library(vegan) D_BC = as.matrix(vegdist(otu.tab, 'bray')) K_BC = D2K(D_BC) GLMMMiRKAT(y = meta$y, X = cbind(meta$x1, meta$x2), id = meta$id, Ks = K_BC, model = "gaussian", nperm = 500) } else { # Computation time is longer for phylogenetic kernels tree <- nordata$nor.tree unifracs <- GUniFrac::GUniFrac(otu.tab, tree, alpha=c(1))$unifracs D_W <- unifracs[,,"d_1"] K_W = D2K(D_W) GLMMMiRKAT(y = meta$y, X = cbind(meta$x1, meta$x2), id = meta$id, Ks = K_W, model = "gaussian", nperm = 500) }
Small-sample SKAT for correlated (continuous) data ('c' stands for 'correlated'). Computes the adjusted score statistic and p-value.
inner.CSKAT(lmer.obj, K)
inner.CSKAT(lmer.obj, K)
lmer.obj |
A fitted lme4 object (model under H0) |
K |
the kernel matrix, which quantifies the similarities between samples |
association p-value
adjusted score statistic
Zhan X, et al. (2018) A small-sample kernel association test for correlated data with application to microbiome association studies. Genet Epidemiol., submitted.
Function called when user calls function KRV. For each kernel matrix inputted into KRV, KRV runs inner.KRV on that kernel with the inputted kernel.y outcome matrix.
inner.KRV( y = NULL, X = NULL, adjust.type, kernel.otu, kernel.y, returnKRV = FALSE, returnR2 = FALSE )
inner.KRV( y = NULL, X = NULL, adjust.type, kernel.otu, kernel.y, returnKRV = FALSE, returnR2 = FALSE )
y |
A numeric n by p matrix of p continuous phenotype variables and sample size n (default = NULL). If it is NULL, a phenotype kernel matrix must be entered for "kernel.y". Defaults to NULL. |
X |
A numeric n by q matrix, containing q additional covariates (default = NULL). If NULL, an intercept only model is used. If the first column of X is not uniformly 1, then an intercept column will be added. |
adjust.type |
Possible values are "none" (default if X is null), "phenotype" to adjust only the y variable (only possible if y is a numeric phenotype matrix rather than a pre-computed kernel), or "both" to adjust both the X and Y kernels. |
kernel.otu |
A numeric OTU n by n kernel matrix or a list of matrices, where n is the sample size. It can be constructed from microbiome data, such as by transforming from a distance metric. |
kernel.y |
Either a numerical n by n kernel matrix for phenotypes or a method to compute the kernel of phenotype. Methods are "Gaussian" or "linear". A Gaussian kernel (kernel.y="Gaussian") can capture the general relationship between microbiome and phenotypes; a linear kernel (kernel.y="linear") may be preferred if the underlying relationship is close to linear. |
returnKRV |
A logical indicating whether to return the KRV statistic. Defaults to FALSE. |
returnR2 |
A logical indicating whether to return the R-squared coefficient. Defaults to FALSE. |
y and X (if not NULL) should all be numerical matrices or vectors with the same number of rows.
Ks should be a list of n by n matrices or a single matrix. If you have distance metric from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches.
Missing data is not permitted. Please remove all individuals with missing y, X, Ks prior to analysis
Parameter "method" only concerns how kernel specific p-values are generated. When Ks is a list of multiple kernels, omnibus p-value is computed through permutation from each individual p-value, which are calculated through method of choice.
Returns a p-value for the candidate kernel matrix
pv |
p-value for the candidate kernel matrix |
KRV |
KRV statistic for the candidate kernel matrix. Only returned if returnKRV = TRUE. |
R2 |
R-squared for the candidate kernel matrix. Only returned if returnR2 = TRUE. |
Haotian Zheng, Xiang Zhan, Ni Zhao
Zhan, X., Plantinga, A., Zhao, N., and Wu, M.C. A Fast Small-Sample Kernel Independence Test for Microbiome Community-Level Association Analysis. Biometrics. 2017 Mar 10. doi: 10.1111/biom.12684.
Kernel RV coefficient test to evaluate the overall association between microbiome composition and high-dimensional or structured phenotype or genotype.
KRV( y = NULL, X = NULL, adjust.type = NULL, kernels.otu, kernel.y, omnibus = "kernel_om", returnKRV = FALSE, returnR2 = FALSE )
KRV( y = NULL, X = NULL, adjust.type = NULL, kernels.otu, kernel.y, omnibus = "kernel_om", returnKRV = FALSE, returnR2 = FALSE )
y |
A numeric n by p matrix of p continuous phenotype variables and sample size n (default = NULL). If it is NULL, a phenotype kernel matrix must be entered for "kernel.y". Defaults to NULL. |
X |
A numeric n by q matrix, containing q additional covariates (default = NULL). If NULL, an intercept only model is used. If the first column of X is not uniformly 1, then an intercept column will be added. |
adjust.type |
Possible values are "none" (default if X is null), "phenotype" to adjust only the y variable (only possible if y is a numeric phenotype matrix rather than a pre-computed kernel), or "both" to adjust both the X and Y kernels. |
kernels.otu |
A numeric OTU n by n kernel matrix or a list of matrices, where n is the sample size. It can be constructed from microbiome data, such as by transforming from a distance metric. |
kernel.y |
Either a numerical n by n kernel matrix for phenotypes or a method to compute the kernel of phenotype. Methods are "Gaussian" or "linear". A Gaussian kernel (kernel.y="Gaussian") can capture the general relationship between microbiome and phenotypes; a linear kernel (kernel.y="linear") may be preferred if the underlying relationship is close to linear. |
omnibus |
A string equal to either "Cauchy" or "kernel_om" (or unambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or an omnibus kernel to generate the omnibus p-value. |
returnKRV |
A logical indicating whether to return the KRV statistic. Defaults to FALSE. |
returnR2 |
A logical indicating whether to return the R-squared coefficient. Defaults to FALSE. |
kernels.otu should be a list of numerical n by n kernel matrices, or a single n by n kernel matrix, where n is sample size.
When kernel.y is a method ("Gaussian" or "linear") to compute the kernel of phenotype, y should be a numerical phenotype matrix, and X (if not NULL) should be a numeric matrix of covariates. Both y and X should have n rows.
When kernel.y is a kernel matrix for the phenotype, there is no need to provide X and y, and they will be ignored if provided. In this case, kernel.y and kernel.otu should both be numeric matrices with the same number of rows and columns.
Missing data is not permitted. Please remove all individuals with missing kernel.otu, y (if not NULL), X (if not NULL), and kernel.y (if a matrix is entered) prior to analysis.
If only one candidate kernel matrix is considered, returns a list containing the p-value for the candidate kernel matrix. If more than one candidate kernel matrix is considered, returns a list of two elements:
p_values |
P-value for each candidate kernel matrix |
omnibus_p |
Omnibus p-value |
KRV |
A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE |
R2 |
A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE |
Nehemiah Wilson, Haotian Zheng, Xiang Zhan, Ni Zhao
Zheng, Haotian, Zhan, X., Plantinga, A., Zhao, N., and Wu, M.C. A Fast Small-Sample Kernel Independence Test for Microbiome Community-Level Association Analysis. Biometrics. 2017 Mar 10. doi: 10.1111/biom.12684.
Liu, Hongjiao, Ling, W., Hua, X., Moon, J.Y., Williams-Nguyen, J., Zhan, X., Plantinga, A.M., Zhao, N., Zhang, A., Durazo-Arzivu, R.A., Knight, R., Qi, Q., Burk, R.D., Kaplan, R.C., and Wu, M.C. Kernel-based genetic association analysis for microbiome phenotypes identifies host genetic drivers of beta-diversity. 2021+
library(GUniFrac) library(MASS) data(throat.tree) data(throat.otu.tab) data(throat.meta) ## Simulate covariate data set.seed(123) n = nrow(throat.otu.tab) Sex <- throat.meta$Sex Smoker <- throat.meta$SmokingStatus anti <- throat.meta$AntibioticUsePast3Months_TimeFromAntibioticUsage Male = (Sex == "Male")**2 Smoker = (Smoker == "Smoker") **2 Anti = (anti != "None")^2 cova = cbind(1, Male, Smoker, Anti) ## Simulate microbiome data otu.tab.rff <- Rarefy(throat.otu.tab)$otu.tab.rff unifracs <- GUniFrac(otu.tab.rff, throat.tree, alpha=c(0, 0.5, 1))$unifracs # Distance matrices D.weighted = unifracs[,,"d_1"] D.unweighted = unifracs[,,"d_UW"] # Kernel matrices K.weighted = D2K(D.weighted) K.unweighted = D2K(D.unweighted) if (requireNamespace("vegan")) { library(vegan) D.BC = as.matrix(vegdist(otu.tab.rff, method="bray")) K.BC = D2K(D.BC) } # Simulate phenotype data rho = 0.2 Va = matrix(rep(rho, (2*n)^2), 2*n, 2*n)+diag(1-rho, 2*n) Phe = mvrnorm(n, rep(0, 2*n), Va) K.y = Phe %*% t(Phe) # phenotype kernel # Simulate genotype data G = matrix(rbinom(n*10, 2, 0.1), n, 10) K.g = G %*% t(G) # genotype kernel ## Unadjusted analysis (microbiome and phenotype) KRV(y = Phe, kernels.otu = K.weighted, kernel.y = "Gaussian") # numeric y KRV(kernels.otu = K.weighted, kernel.y = K.y) # kernel y ## Adjusted analysis (phenotype only) KRV(kernels.otu = K.weighted, y = Phe, kernel.y = "linear", X = cova, adjust.type = "phenotype") if (requireNamespace("vegan")) { ## Adjusted analysis (adjust both kernels; microbiome and phenotype) KRV(kernels.otu = K.BC, kernel.y = K.y, X = cova, adjust.type='both') ## Adjusted analysis (adjust both kernels; microbiome and genotype) KRV(kernels.otu = K.BC, kernel.y = K.g, X = cova, adjust.type='both') }
library(GUniFrac) library(MASS) data(throat.tree) data(throat.otu.tab) data(throat.meta) ## Simulate covariate data set.seed(123) n = nrow(throat.otu.tab) Sex <- throat.meta$Sex Smoker <- throat.meta$SmokingStatus anti <- throat.meta$AntibioticUsePast3Months_TimeFromAntibioticUsage Male = (Sex == "Male")**2 Smoker = (Smoker == "Smoker") **2 Anti = (anti != "None")^2 cova = cbind(1, Male, Smoker, Anti) ## Simulate microbiome data otu.tab.rff <- Rarefy(throat.otu.tab)$otu.tab.rff unifracs <- GUniFrac(otu.tab.rff, throat.tree, alpha=c(0, 0.5, 1))$unifracs # Distance matrices D.weighted = unifracs[,,"d_1"] D.unweighted = unifracs[,,"d_UW"] # Kernel matrices K.weighted = D2K(D.weighted) K.unweighted = D2K(D.unweighted) if (requireNamespace("vegan")) { library(vegan) D.BC = as.matrix(vegdist(otu.tab.rff, method="bray")) K.BC = D2K(D.BC) } # Simulate phenotype data rho = 0.2 Va = matrix(rep(rho, (2*n)^2), 2*n, 2*n)+diag(1-rho, 2*n) Phe = mvrnorm(n, rep(0, 2*n), Va) K.y = Phe %*% t(Phe) # phenotype kernel # Simulate genotype data G = matrix(rbinom(n*10, 2, 0.1), n, 10) K.g = G %*% t(G) # genotype kernel ## Unadjusted analysis (microbiome and phenotype) KRV(y = Phe, kernels.otu = K.weighted, kernel.y = "Gaussian") # numeric y KRV(kernels.otu = K.weighted, kernel.y = K.y) # kernel y ## Adjusted analysis (phenotype only) KRV(kernels.otu = K.weighted, y = Phe, kernel.y = "linear", X = cova, adjust.type = "phenotype") if (requireNamespace("vegan")) { ## Adjusted analysis (adjust both kernels; microbiome and phenotype) KRV(kernels.otu = K.BC, kernel.y = K.y, X = cova, adjust.type='both') ## Adjusted analysis (adjust both kernels; microbiome and genotype) KRV(kernels.otu = K.BC, kernel.y = K.g, X = cova, adjust.type='both') }
Test for association between microbiome composition and a continuous or dichotomous outcome by incorporating phylogenetic or nonphylogenetic distance between different microbiomes.
MiRKAT( y, X = NULL, Ks, out_type = "C", method = "davies", omnibus = "permutation", nperm = 999, returnKRV = FALSE, returnR2 = FALSE )
MiRKAT( y, X = NULL, Ks, out_type = "C", method = "davies", omnibus = "permutation", nperm = 999, returnKRV = FALSE, returnR2 = FALSE )
y |
A numeric vector of the a continuous or dichotomous outcome variable. |
X |
A numeric matrix or data frame, containing additional covariates that you want to adjust for. If NULL, a intercept only model is used. Defaults to NULL. |
Ks |
A list of n by n kernel matrices or a single n by n kernel matrix, where n is the sample size. It can be constructed from microbiome data through distance metric or other approaches, such as linear kernels or Gaussian kernels. |
out_type |
An indicator of the outcome type ("C" for continuous, "D" for dichotomous). |
method |
Method used to compute the kernel specific p-value. "davies" represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq. We adopt an exact variance component tests because most of the studies concerning microbiome compositions have modest sample size. "moment" represents an approximation method that matches the first two moments. "permutation" represents a permutation approach for p-value calculation. Defaults to "davies". |
omnibus |
A string equal to either "cauchy" or "permutation" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or residual permutation to generate the omnibus p-value. |
nperm |
The number of permutations if method = "permutation" or when multiple kernels are considered. If method = "davies" or "moment", nperm is ignored. Defaults to 999. |
returnKRV |
A logical indicating whether to return the KRV statistic (a measure of effect size). Defaults to FALSE. |
returnR2 |
A logical indicating whether to return R-squared. Defaults to FALSE. |
y and X (if not NULL) should all be numeric matrices or vectors with the same number of rows.
Ks should be a list of n by n matrices or a single matrix. If you have distance metric(s) from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches.
Missing data is not permitted. Please remove all individuals with missing y, X, Ks prior to analysis
Parameter "method" only concerns with how kernel specific p-values are generated. When Ks is a list of multiple kernels, omnibus p-value is computed through permutation from each individual p-values, which are calculated through method of choice.
Returns a list containing the following elements:
p_values |
P-value for each candidate kernel matrix |
omnibus_p |
Omnibus p-value considering multiple candidate kernel matrices |
KRV |
Kernel RV statistic (a measure of effect size). Only returned if returnKRV = TRUE. |
R2 |
R-squared. Only returned if returnR2 = TRUE. |
Ni Zhao
Zhao, N., Chen, J.,Carroll, I. M., Ringel-Kulka, T., Epstein, M.P., Zhou, H., Zhou, J. J., Ringel, Y., Li, H. and Wu, M.C. (2015)). Microbiome Regression-based Kernel Association Test (MiRKAT). American Journal of Human Genetics, 96(5):797-807
Chen, J., Chen, W., Zhao, N., Wu, M~C.and Schaid, D~J. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. 40: 5-19. doi: 10.1002/gepi.21934
Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society. Series C , 29, 323-333.
Satterthwaite, F. (1946). An approximate distribution of estimates of variance components. Biom. Bull. 2, 110-114.
Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA; NHLBI GO Exome Sequencing Project-ESP Lung Project Team, Christiani DC, Wurfel MM, Lin X. (2012) Optimal unified approach for rare variant association testing with application to small sample case-control whole-exome sequencing studies. American Journal of Human Genetics, 91, 224-237.
Zhou, J. J. and Zhou, H.(2015) Powerful Exact Variance Component Tests for the Small Sample Next Generation Sequencing Studies (eVCTest), in submission.
library(GUniFrac) data(throat.tree) data(throat.otu.tab) data(throat.meta) unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) covar = cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male")) # Continuous phenotype n = nrow(throat.meta) y = rnorm(n) MiRKAT(y, X = covar, Ks = Ks, out_type="C", method = "davies") # Binary phenotype y = as.numeric(runif(n) < 0.5) MiRKAT(y, X = covar, Ks = Ks, out_type="D")
library(GUniFrac) data(throat.tree) data(throat.otu.tab) data(throat.meta) unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) covar = cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male")) # Continuous phenotype n = nrow(throat.meta) y = rnorm(n) MiRKAT(y, X = covar, Ks = Ks, out_type="C", method = "davies") # Binary phenotype y = as.numeric(runif(n) < 0.5) MiRKAT(y, X = covar, Ks = Ks, out_type="D")
Called by MiRKAT if the outcome variable is dichotomous (out_type="D")
This function is called by the exported function MiRKAT if the argument "out_type" of MiRKAT is equal to "D" (for dichotomous).
Each argument of MiRKAT_continuous is given the value of the corresponding argument given by the user to MiRKAT.
Function not exported
MiRKAT_binary( y, X = NULL, Ks, method = "davies", family = "binomial", omnibus = "permutation", nperm = 999, returnKRV = FALSE, returnR2 = FALSE )
MiRKAT_binary( y, X = NULL, Ks, method = "davies", family = "binomial", omnibus = "permutation", nperm = 999, returnKRV = FALSE, returnR2 = FALSE )
y |
A numeric vector of the dichotomous outcome variable |
X |
A numerical matrix or data frame, containing additional covariates that you want to adjust for (Default = NULL). If it is NULL, a intercept only model was fit. |
Ks |
A list of n by n kernel matrices (or a single n by n kernel matrix), where n is the sample size. It can be constructed from microbiome data through distance metric or other approaches, such as linear kernels or Gaussian kernels. |
method |
A string telling R which method to use to compute the kernel specific p-value (default = "davies"). "davies" represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq. We adopt an exact variance component tests because most of the studies concerning microbiome compositions have modest sample size. "moment" represents an approximation method that matches the first two moments. "permutation" represents a permutation approach for p-value calculation. |
family |
A string describing the error distribution and link function to be used in the linear model. |
omnibus |
A string equal to either "Cauchy" or "permutation" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or residual permutation to generate the omnibus p-value. |
nperm |
the number of permutations if method = "permutation" or when multiple kernels are considered. if method = "davies" or "moment", nperm is ignored. |
returnKRV |
A logical indicating whether to return the KRV statistic. Defaults to FALSE. |
returnR2 |
A logical indicating whether to return the R-squared coefficient. Defaults to FALSE. |
If only one candidate kernel matrix is considered, returns a list containing the p-value for the candidate kernel matrix. If more than one candidate kernel matrix is considered, returns a list with two elements: the individual p-values for each candidate kernel matrix, and the omnibus p-value.
p_values |
p-value for each candidate kernel matrix |
omnibus_p |
omnibus p-value if multiple kernel matrices are considered |
KRV |
A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE |
R2 |
A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE |
Ni Zhao
Zhao, N., Chen, J.,Carroll, I. M., Ringel-Kulka, T., Epstein, M.P., Zhou, H., Zhou, J. J., Ringel, Y., Li, H. and Wu, M.C. (2015)). Microbiome Regression-based Kernel Association Test (MiRKAT). American Journal of Human Genetics, 96(5):797-807
Chen, J., Chen, W., Zhao, N., Wu, M~C.and Schaid, D~J. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. 40: 5-19. doi: 10.1002/gepi.21934
Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society. Series C , 29, 323-333.
Satterthwaite, F. (1946). An approximate distribution of estimates of variance components. Biom. Bull. 2, 110-114.
Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA; NHLBI GO Exome Sequencing Project-ESP Lung Project Team, Christiani DC, Wurfel MM, Lin X. (2012) Optimal unified approach for rare variant association testing with application to small sample case-control whole-exome sequencing studies. American Journal of Human Genetics, 91, 224-237.
Zhou, J. J. and Zhou, H.(2015) Powerful Exact Variance Component Tests for the Small Sample Next Generation Sequencing Studies (eVCTest), in submission.
Inner function for MiRKAT; computes MiRKAT for continuous outcomes. Called by MiRKAT if out_type="C"
MiRKAT_continuous( y, X = NULL, Ks, method, omnibus, nperm = 999, returnKRV = FALSE, returnR2 = FALSE )
MiRKAT_continuous( y, X = NULL, Ks, method, omnibus, nperm = 999, returnKRV = FALSE, returnR2 = FALSE )
y |
A numeric vector of the continuous outcome variable |
X |
A numeric matrix or data frame containing additional covariates (default = NULL). If NULL, an intercept only model is used. |
Ks |
A list of n by n kernel matrices (or a single n by n kernel matrix), where n is the sample size. It can be constructed from microbiome data through distance metric or other approaches, such as linear kernels or Gaussian kernels. |
method |
A method to compute the kernel specific p-value (Default= "davies"). "davies" represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq. We adopt an exact variance component tests because most of the studies concerning microbiome compositions have modest sample size. "moment" represents an approximation method that matches the first two moments. "permutation" represents a permutation approach for p-value calculation. |
omnibus |
A string equal to either "Cauchy" or "permutation" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or residual permutation to generate the omnibus p-value. |
nperm |
the number of permutations if method = "permutation" or when multiple kernels are considered. If method = "davies" or "moment", nperm is ignored. Defaults to 999. |
returnKRV |
A logical indicating whether to return the KRV statistic. Defaults to FALSE. |
returnR2 |
A logical indicating whether to return the R-squared coefficient. Defaults to FALSE. |
This function is called by the exported function "MiRKAT" when the argument of MiRKAT, out_type, is set equal to "C".
Each argument of MiRKAT_continuous is given the value of the corresponding argument given by the user to MiRKAT.
Function not exported
If only one candidate kernel matrix is considered, returns a list containing the p-value for the candidate kernel matrix. If more than one candidate kernel matrix is considered, returns a list of two elements: the individual p-values for each candidate kernel matrix, and the omnibus p-value
p_values |
p-value for each candidate kernel matrix |
omnibus_p |
omnibus p-value considering multiple candidate kernel matrices |
KRV |
A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE |
R2 |
A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE |
Ni Zhao
Zhao, N., Chen, J.,Carroll, I. M., Ringel-Kulka, T., Epstein, M.P., Zhou, H., Zhou, J. J., Ringel, Y., Li, H. and Wu, M.C. (2015)). Microbiome Regression-based Kernel Association Test (MiRKAT). American Journal of Human Genetics, 96(5):797-807
Chen, J., Chen, W., Zhao, N., Wu, M~C.and Schaid, D~J. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. 40: 5-19. doi: 10.1002/gepi.21934
Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society. Series C , 29, 323-333.
Satterthwaite, F. (1946). An approximate distribution of estimates of variance components. Biom. Bull. 2, 110-114.
Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA; NHLBI GO Exome Sequencing Project-ESP Lung Project Team, Christiani DC, Wurfel MM, Lin X. (2012) Optimal unified approach for rare variant association testing with application to small sample case-control whole-exome sequencing studies. American Journal of Human Genetics, 91, 224-237.
Zhou, J. J. and Zhou, H.(2015) Powerful Exact Variance Component Tests for the Small Sample Next Generation Sequencing Studies (eVCTest), in submission.
Integrated quantile regression-based kernel association test.
MiRKAT.iQ(Y, X, K, weight = c(0.25, 0.25, 0.25, 0.25))
MiRKAT.iQ(Y, X, K, weight = c(0.25, 0.25, 0.25, 0.25))
Y |
A numeric vector of the continuous outcome variable. |
X |
A numeric matrix for additional covariates that you want to adjust for. |
K |
A list of n by n kernel matrices at a single n by n kernel matrix, where n is the sample size. |
weight |
A length 4 vector specifying the weight for Cauchy combination, corresponding to wilcoxon/normal/inverselehmann/lehmann functions. The sum of the weight should be 1. |
Returns a list containing the p values for single kernels, or the omnibus p-value if multiple candidate kernel matrices are provided.
Tianying Wang, Xiang Zhan.
Wang T, et al. (2021) Testing microbiome association using integrated quantile regression models. Bioinformatics (to appear).
library(GUniFrac) library(quantreg) library(PearsonDS) library(MiRKAT) data(throat.tree) data(throat.otu.tab) ## Create UniFrac and Bray-Curtis distance matrices unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } ## Convert to kernels Ks = lapply(Ds, FUN = function(d) D2K(d)) covar = cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male")) n = nrow(throat.meta) y = rnorm(n) result = MiRKAT.iQ(y, X = covar, K = Ks)
library(GUniFrac) library(quantreg) library(PearsonDS) library(MiRKAT) data(throat.tree) data(throat.otu.tab) ## Create UniFrac and Bray-Curtis distance matrices unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } ## Convert to kernels Ks = lapply(Ds, FUN = function(d) D2K(d)) covar = cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male")) n = nrow(throat.meta) y = rnorm(n) result = MiRKAT.iQ(y, X = covar, K = Ks)
A more robust version of MiRKAT utlizing a linear model that uses quantile regression.
MiRKAT.Q(y, X, Ks, omnibus = "kernel_om", returnKRV = FALSE, returnR2 = FALSE)
MiRKAT.Q(y, X, Ks, omnibus = "kernel_om", returnKRV = FALSE, returnR2 = FALSE)
y |
A numeric vector of the a continuous or dichotomous outcome variable. |
X |
A numerical matrix or data frame, containing additional covariates that you want to adjust for. Mustn't be NULL. |
Ks |
A list of n by n kernel matrices (or a single n by n kernel matrix), where n is the sample size. If you have distance metric from metagenomic data, each kernel can be constructed through function D2K. Each kernel can also be constructed through other mathematical approaches, such as linear or Gaussian kernels. |
omnibus |
A string equal to either "Cauchy" or "kernel_om" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or an omnibus kernel to generate the omnibus p-value. |
returnKRV |
A logical indicating whether to return the KRV statistic. Defaults to FALSE. |
returnR2 |
A logical indicating whether to return the R-squared coefficient. Defaults to FALSE. |
MiRKAT.Q creates a kernel matrix using the linear model created with the function rq, a quantile regression function, then does the KRV analysis on Ks and the newly formed kernel matrix representing the outome traits.
Missing data is not permitted. Please remove all individuals with missing y, X, Ks prior to analysis
Returns p-values for each individual kernel matrix, an omnibus p-value if multiple kernels were provided, and measures of effect size KRV and R2.
p_values |
labeled individual p-values for each kernel |
omnibus_p |
omnibus p_value, calculated as for the KRV test |
KRV |
A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE |
R2 |
A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE |
Weija Fu
library(GUniFrac) library(quantreg) # Generate data data(throat.tree) data(throat.otu.tab) data(throat.meta) unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) covar = scale(cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male"))) # Continuous phenotype n = nrow(throat.meta) y = rchisq(n, 2) + apply(covar, 1, sum) MiRKAT.Q(y, X = covar, Ks = Ks)
library(GUniFrac) library(quantreg) # Generate data data(throat.tree) data(throat.otu.tab) data(throat.meta) unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) covar = scale(cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male"))) # Continuous phenotype n = nrow(throat.meta) y = rchisq(n, 2) + apply(covar, 1, sum) MiRKAT.Q(y, X = covar, Ks = Ks)
A more robust version of MiRKAT utilizing a linear model by robust regression using an M estimator.
MiRKAT.R(y, X, Ks, omnibus = "kernel_om", returnKRV = FALSE, returnR2 = FALSE)
MiRKAT.R(y, X, Ks, omnibus = "kernel_om", returnKRV = FALSE, returnR2 = FALSE)
y |
A numeric vector of the a continuous or dichotomous outcome variable. |
X |
A numerical matrix or data frame, containing additional covariates that you want to adjust for Mustn't be NULL |
Ks |
list of n by n kernel matrices (or a single n by n kernel matrix), where n is the sample size. It can be constructed from microbiome data through distance metric or other approaches, such as linear kernels or Gaussian kernels. |
omnibus |
A string equal to either "Cauchy" or "kernel_om" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or an omnibus kernel to generate the omnibus p-value. |
returnKRV |
A logical indicating whether to return the KRV statistic. Defaults to FALSE. |
returnR2 |
A logical indicating whether to return the R-squared coefficient. Defaults to FALSE. |
MiRKAT.R creates a kernel matrix using the linear model created with the function rlm, a robust regression function, then does the KRV analysis on Ks and the newly formed kernel matrix representing the outome traits.
y and X should all be numerical matrices or vectors with the same number of rows, and mustn't be NULL.
Ks should be a list of n by n matrices or a single matrix. If you have distance metric from metagenomic data, each kernel can be constructed through function D2K. Each kernel may also be constructed through other mathematical approaches.
Missing data is not permitted. Please remove all individuals with missing y, X, Ks prior to analysis
Returns p-values for each individual kernel matrix, an omnibus p-value if multiple kernels were provided, and measures of effect size KRV and R2.
p_values |
labeled individual p-values for each kernel |
omnibus_p |
omnibus p_value, calculated as for the KRV test |
KRV |
A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE |
R2 |
A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE |
Weijia Fu
# Generate data library(GUniFrac) data(throat.tree) data(throat.otu.tab) data(throat.meta) unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) covar = cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male")) # Continuous phenotype n = nrow(throat.meta) y = rchisq(n, 2) MiRKAT.R(y, X = covar, Ks = Ks)
# Generate data library(GUniFrac) data(throat.tree) data(throat.otu.tab) data(throat.meta) unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) covar = cbind(throat.meta$Age, as.numeric(throat.meta$Sex == "Male")) # Continuous phenotype n = nrow(throat.meta) y = rchisq(n, 2) MiRKAT.R(y, X = covar, Ks = Ks)
Community level test for association between microbiome composition and survival outcomes (right-censored time-to-event data) using kernel matrices to compare similarity between microbiome profiles with similarity in survival times.
MiRKATS( obstime, delta, X = NULL, Ks, beta = NULL, perm = FALSE, omnibus = "permutation", nperm = 999, returnKRV = FALSE, returnR2 = FALSE )
MiRKATS( obstime, delta, X = NULL, Ks, beta = NULL, perm = FALSE, omnibus = "permutation", nperm = 999, returnKRV = FALSE, returnR2 = FALSE )
obstime |
A numeric vector of follow-up (survival/censoring) times. |
delta |
Event indicator: a vector of 0/1, where 1 indicates that the event was observed for a subject (so "obstime" is survival time), and 0 indicates that the subject was censored. |
X |
A vector or matrix of numeric covariates, if applicable (default = NULL). |
Ks |
A list of or a single numeric n by n kernel matrices or matrix (where n is the sample size). |
beta |
A vector of coefficients associated with covariates. If beta is NULL and covariates are present, coxph is used to calculate coefficients (default = NULL). |
perm |
Logical, indicating whether permutation should be used instead of analytic p-value calculation (default=FALSE). Not recommended for sample sizes of 100 or more. |
omnibus |
A string equal to either "Cauchy" or "permutation" (or nonambiguous abbreviations thereof), specifying whether to use the Cauchy combination test or residual permutation to generate the omnibus p-value. |
nperm |
Integer, number of permutations used to calculate p-value if perm==TRUE (default=1000) and to calculate omnibus p-value if omnibus=="permutation". |
returnKRV |
A logical indicating whether to return the KRV statistic. Defaults to FALSE. |
returnR2 |
A logical indicating whether to return the R-squared coefficient. Defaults to FALSE. |
obstime, delta, and X should all have n rows, and the kernel or distance matrix should be a single n by n matrix. If a distance matrix is entered (distance=TRUE), a kernel matrix will be constructed from the distance matrix.
Update in v1.1.0: MiRKATS also utilizes the OMiRKATS omnibus test if more than one kernel matrix is provided by the user. The OMiRKATS omnibus test calculates an overall p-value for the test via permutation.
Missing data is not permitted. Please remove individuals with missing data on y, X or in the kernel or distance matrix prior to using the function.
The Efron approximation is used for tied survival times.
Return value depends on the number of kernel matrices inputted. If more than one kernel matrix is given, MiRKATS returns two items; a vector of the labeled individual p-values for each kernel matrix, as well as an omnibus p-value from the Optimal-MiRKATS omnibus test. If only one kernel matrix is given, then only its p-value will be given, as no omnibus test will be needed.
p_values |
individual p-values for each inputted kernel matrix |
omnibus_p |
overall omnibus p-value |
KRV |
A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE |
R2 |
A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE |
Nehemiah Wilson, Anna Plantinga
Plantinga, A., Zhan, X., Zhao, N., Chen, J., Jenq, R., and Wu, M.C. MiRKAT-S: a distance-based test of association between microbiome composition and survival times. Microbiome, 2017:5-17. doi: 10.1186/s40168-017-0239-9
Zhao, N., Chen, J.,Carroll, I. M., Ringel-Kulka, T., Epstein, M.P., Zhou, H., Zhou, J. J., Ringel, Y., Li, H. and Wu, M.C. (2015)). Microbiome Regression-based Kernel Association Test (MiRKAT). American Journal of Human Genetics, 96(5):797-807
Chen, J., Chen, W., Zhao, N., Wu, M~C.and Schaid, D~J. (2016) Small Sample Kernel Association Tests for Human Genetic and Microbiome Association Studies. 40:5-19. doi: 10.1002/gepi.21934
Efron, B. (1977) "The efficiency of Cox's likelihood function for censored data." Journal of the American statistical Association 72(359):557-565.
Davies R.B. (1980) Algorithm AS 155: The Distribution of a Linear Combination of chi-2 Random Variables, Journal of the Royal Statistical Society Series C, 29:323-333
################################### # Generate data library(GUniFrac) # Throat microbiome data data(throat.tree) data(throat.otu.tab) unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) # Covariates and outcomes covar <- matrix(rnorm(120), nrow=60) S <- rexp(60, 3) # survival time C <- rexp(60, 1) # censoring time D <- (S<=C) # event indicator U <- pmin(S, C) # observed follow-up time MiRKATS(obstime = U, delta = D, X = covar, Ks = Ks, beta = NULL)
################################### # Generate data library(GUniFrac) # Throat microbiome data data(throat.tree) data(throat.otu.tab) unifracs = GUniFrac(throat.otu.tab, throat.tree, alpha = c(1))$unifracs if (requireNamespace("vegan")) { library(vegan) BC= as.matrix(vegdist(throat.otu.tab, method="bray")) Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], uw = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) # Covariates and outcomes covar <- matrix(rnorm(120), nrow=60) S <- rexp(60, 3) # survival time C <- rexp(60, 1) # censoring time D <- (S<=C) # event indicator U <- pmin(S, C) # observed follow-up time MiRKATS(obstime = U, delta = D, X = covar, Ks = Ks, beta = NULL)
Test for association between overall microbiome composition and multiple continuous outcomes.
MMiRKAT(Y, X = NULL, Ks, returnKRV = FALSE, returnR2 = FALSE)
MMiRKAT(Y, X = NULL, Ks, returnKRV = FALSE, returnR2 = FALSE)
Y |
A numerical n by p matrix of p continuous outcome variables, n being sample size. |
X |
A numerical n by q matrix or data frame, containing q additional covariates that you want to adjust for (Default = NULL). If it is NULL, an intercept only model is fit. |
Ks |
A list of numerical n by n kernel matrices, or a single n by n kernel matrix, where n is the sample size. Kernels can be constructed from distance matrices (such as Bray-Curtis or UniFrac distances) using the function D2K, or through other mathematical approaches. |
returnKRV |
A logical indicating whether to return the KRV statistic. Defaults to FALSE. |
returnR2 |
A logical indicating whether to return the R-squared coefficient. Defaults to FALSE. |
Missing data is not permitted. Please remove all individuals with missing Y, X, K prior to analysis
The method of generating kernel specific p-values is "davies", which represents an exact method that computes the p-value by inverting the characteristic function of the mixture chisq.
Returns a list of the MMiRKAT p-values for each inputted kernel matrix, labeled with the names of the kernels, if given.
p_values |
list of the p-values for each individual kernel matrix inputted |
KRV |
A vector of kernel RV statistics (a measure of effect size), one for each candidate kernel matrix. Only returned if returnKRV = TRUE |
R2 |
A vector of R-squared statistics, one for each candidate kernel matrix. Only returned if returnR2 = TRUE |
Nehemiah Wilson, Haotian Zheng, Xiang Zhan, Ni Zhao
Zheng, H., Zhan, X., Tong, X., Zhao, N., Maity,A., Wu, M.C., and Chen,J. A small-sample multivariate kernel machine test for microbiome association studies. Genetic Epidemiology, 41(3), 210-220. DOI: 10.1002/gepi.22030
library(GUniFrac) if(requireNamespace("vegan")) { library(vegan) } data(throat.tree) data(throat.otu.tab) data(throat.meta) unifracs <- GUniFrac(throat.otu.tab, throat.tree, alpha=c(0, 0.5, 1))$unifracs if(requireNamespace("vegan")) { BC= as.matrix(vegdist(throat.otu.tab , method="bray")) Ds = list(w = unifracs[,,"d_1"], u = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], u = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) n = nrow(throat.otu.tab) Y = matrix(rnorm(n*3, 0, 1), n, 3) covar = cbind(as.numeric(throat.meta$Sex == "Male"), as.numeric(throat.meta$PackYears)) MMiRKAT(Y = Y, X = covar, Ks = Ks)
library(GUniFrac) if(requireNamespace("vegan")) { library(vegan) } data(throat.tree) data(throat.otu.tab) data(throat.meta) unifracs <- GUniFrac(throat.otu.tab, throat.tree, alpha=c(0, 0.5, 1))$unifracs if(requireNamespace("vegan")) { BC= as.matrix(vegdist(throat.otu.tab , method="bray")) Ds = list(w = unifracs[,,"d_1"], u = unifracs[,,"d_UW"], BC = BC) } else { Ds = list(w = unifracs[,,"d_1"], u = unifracs[,,"d_UW"]) } Ks = lapply(Ds, FUN = function(d) D2K(d)) n = nrow(throat.otu.tab) Y = matrix(rnorm(n*3, 0, 1), n, 3) covar = cbind(as.numeric(throat.meta$Sex == "Male"), as.numeric(throat.meta$PackYears)) MMiRKAT(Y = Y, X = covar, Ks = Ks)
Simulated DEPENDENT data with GAUSSIAN traits for correlated regression-based analysis (i.e. CSKAT, GLMMMiRKAT)
data(nordata)
data(nordata)
A list containing three data objects for correlated microbiome data with continuous response variable (described below).
Simulated OTU data for correlated regression-based analysis; 59 rows and 730 columns, rows being patients and columns being OTUs
Simulated metadata for correlated regression-based analysis; 59 rows and 4 columns, rows being patients and columns being the outcome variable, subject identifier, and covariates to possibly account for in any regression modeling
Simulated rooted phylogenetic tree with 730 tips and 729 nodes
Simulated DEPENDENT data with POISSON (count) traits for correlated regression-based analysis (i.e. CSKAT, GLMMMiRKAT)
data(poisdata)
data(poisdata)
A list containing three data objects for correlated microbiome data with binary response variable (described below).
Simulated OTU data for correlated regression-based analysis; 59 rows and 730 columns, rows being patients and columns being OTUs
Simulated metadata for correlated regression-based analysis; 59 rows and 4 columns, rows being patients and columns being the outcome variable, subject identifier, and covariates to possibly account for in any regression modeling
Simulated rooted phylogenetic tree with 730 tips and 729 nodes
Simulation code can be seen in ?KRV Corresponding OTU matrix is stored in "throat.otu.tab"
data(throat.meta)
data(throat.meta)
A data frame with 59 rows and 16 columns, rows being participants and columns being different covariates to possibly be accounted for in any utilized linear models.
throat.meta is part of a microbiome data set for studying the effect of smoking on the upper respiratory tract microbiome. This data set comes from the throat microbiome of left body side. It contains 60 subjects consisting of 32 nonsmokers and 28 smokers.
Sequence of DNA that allows for the identification of the specific species of bacteria. See GUniFrac for more details
Sequence of DNA that aids in locating the Barcode Sequence. See GUniFrac for more details
whether or not each patient is a "Smoker" or a "NonSmoker"
Identifying integer label given to each patient
Labels each patient as being from this particular sample, so as possibly be able to use multiple samples at once
Part of body where our samples were taken from from in each participant
Which side of the body the samples were taken from
What kind of sample each one is; should all be patientsamples
Whether or not the patient has had a respiratory disease, and if so which one_severity of said disease_whether or not that disease is still active. If there has been no such disease in the patient's medical history, the patient's value is "healthy" in this column
Whether or not the patient has used antibiotics in the past month_if so, how long ago it was. If not antibiotics have been used in the past month, the patient's value is "None" in this column
Age of the patient
The sex of the patient
Unit of measurement measuring the intensity of smoking; average number of packs per day times the number of years the patient has been smoking. If patient has never smoked, their value is 0 for this column
Minutes since the patient's last cigarette
Minutes since the patient's last meal
See Charleston paper and other sources
Simulated code can be seen in ?KRV Corresponding metadata is stored in "throat.meta"
data(throat.otu.tab)
data(throat.otu.tab)
60 rows and 856 columns, where rows are patients and columns are OTUs
Simulation code can be seen in ?KRV
data(throat.tree)
data(throat.tree)
Phylogenetic tree with 856 tips and 855 internal nodes
Corresponding OTU matrix stored in "throat.otu.tab" See the GUniFrac package for more details