HTTPS SSH

MaAsLin2 User Manual

MaAsLin2 is the next generation of MaAsLin.

MaAsLin is a multivariate statistical framework that finds associations between clinical metadata and potentially high-dimensional experimental data.

If you use the MaAsLin2 software, please cite our manuscript: Himel Mallick, Timothy L. Tickle, Lauren J. McIver, Gholamali Rahnavard, George Weingart, Joseph N. Paulson, Siyuan Ma, Boyu Ren, Emma Schwager, Ayshwarya Subramanian, Eric A. Franzosa, Hector Corrada Bravo, Curtis Huttenhower. "Multivariable Association in Population-scale Meta'omic Surveys" (In Preparation).

If you have questions, please email the MaAsLin Users Google Group.


Contents

Description

MaAsLin2 was developed to find associations between microbiome multi'omics features and complex metadata in population-scale epidemiological studies. The software includes multiple analysis methods, normalization, and transform options to customize analysis for your specific study.

Requirements

MaAsLin2 is an R package that can be run on the command line or as an R function. It requires the following R packages included in Biocondutor and CRAN (Comprehensive R Archive Network). Please install these packages before running MaAsLin2.

Installation

MaAsLin2 can be run from the command line or as an R function. If only running from the command line, you do not need to install the MaAsLin2 package but you will need to install the MaAsLin2 dependencies.

From command line

  1. Download the source: MaAsLin2.tar.gz
  2. Decompress the download:
    • $ tar xzvf maaslin2.tar.gz
  3. Install the Bioconductor dependencies:
    • $ R -q -e "source('https://bioconductor.org/biocLite.R'); biocLite('edgeR'); biocLite('metagenomeSeq')"
  4. Install the CRAN dependencies:
    • $ R -q -e "install.packages(c('lmerTest','pscl','pbapply','car','dplyr','vegan','chemometrics','ggplot2','pheatmap','cplm','hash','logging','data.table','MASS','MuMIn'), repos='http://cran.r-project.org')"
  5. Install the MaAsLin2 package (only r,equired if running as an R function):
    • $ R CMD INSTALL maaslin2

From R

  1. Install devtools :
    • > install.packages('devtools')
  2. Install the Bioconuctor dependencies:
    • > source('https://bioconductor.org/biocLite.R'); biocLite('edgeR'); biocLite('metagenomeSeq')
  3. Install MaAsLin2 (and also all dependencies from CRAN):
    • > devtools::install_bitbucket("biobakery/maaslin2@default", ref="0.2")

How to Run

MaAsLin2 can be run from the command line or as an R function. Both methods require the same arguments, have the same options, and use the same default settings.

To run from the command line: $ Maaslin2.R $DATA $METADATA $OUTPUT

  • Provide the full path to the MaAsLin2 executable (ie ./R/Maaslin2.R if you are in the source folder).
  • Replace $DATA with the path to your data (or features) file.
  • Replace $METADATA with the path to your metadata file.
  • Replace $OUTPUT with the path to the folder to write the output.

To run from R as a function:

$ R
> library(Maaslin2)
> fit_data <- Maaslin2(data, metadata, output)

Input Files

MaAsLin2 requires two input files.

  1. Data (or features) file
    • This file is tab-delimited formatted with features as columns and samples as rows (the transpose is also okay).
    • Possible features in this file include data like taxonomic or gene abundances.
  2. Metadata file
    • This file is tab-delimited formatted with metadata as columns and samples as rows (the transpose is also okay).
    • Possible metadata in this file include gender or age.

The data file can contain samples not included in the metadata file (along with the reverse case). For both cases, those samples not included in both files will be removed from the analysis. Also the samples do not need to be in the same order in the two files.

NOTE: If running MaAsLin2 as a function, the data and metadata inputs can be of type data.frame instead of a path to a file.

Output Files

MaAsLin2 generates two types of output files: data and visualization.

  1. Data output files
    • all_results.tsv : This file contains all of the association results ordered by increasing q-value.
    • significant_results.tsv : This file is a subset of the data in the first file. It only includes those associations with q-values less than or equal to the significance threshold.
    • residuals.rds : This file contains a data frame with residuals for each feature analyzed from the model selected.
    • maaslin2.log : This file contains all of the debug information for the run. It includes all settings, warnings, errors, and steps run.
  2. Visualization output files
    • heatmap.pdf : This file contains a heatmap of the significant associations.
    • [0-9]+.pdf : These files are scatter plots with one generated for each significant association.

Run a Demo

Example input files can be found in the tests folder of the MaAsLin2 source.

To run: $ Maaslin2.R maaslin2/tests/example1_data.txt maaslin2/tests/example1_metadata.txt demo_output

When running this command, all output files will be written to a folder named demo_output.

Options

Run MaAsLin2 help to print a list of the options and the default settings.

$ Maaslin2.R --help
Usage: ./R/Maaslin2.R [options] <data.tsv> <metadata.tsv> <output_folder>


Options:
    -h, --help
        Show this help message and exit

    -a MIN_ABUNDANCE, --min_abundance=MIN_ABUNDANCE
        The minimum abundance for each feature [ Default: 0 ]

    -p MIN_PREVALENCE, --min_prevalence=MIN_PREVALENCE
        The minimum percent of samples for which a feature is detected at minimum abundance [ Default: 0.1 ]

    -s MAX_SIGNIFICANCE, --max_significance=MAX_SIGNIFICANCE
        The q-value threshold for significance [ Default: 0.25 ]

    -n NORMALIZATION, --normalization=NORMALIZATION
        The normalization method to apply [ Default: TSS ] [ Choices: TSS, CLR, CSS, NONE, TMM ]

    -t TRANSFORM, --transform=TRANSFORM
        The transform to apply [ Default: LOG ] [ Choices: LOG, LOGIT, AST, NONE ]

    -m ANALYSIS_METHOD, --analysis_method=ANALYSIS_METHOD
        The analysis method to apply [ Default: LM ] [ Choices: LM, CPLM, ZICP, NEGBIN, ZINB ]

    -r RANDOM_EFFECTS, --random_effects=RANDOM_EFFECTS
        The random effects for the model, comma-delimited for multiple effects [ Default: none ]

    -f FIXED_EFFECTS, --fixed_effects=FIXED_EFFECTS
        The fixed effects for the model, comma-delimited for multiple effects [ Default: all ]

    -c CORRECTION, --correction=CORRECTION
        The correction method for computing the q-value [ Default: BH ]

    -z STANDARDIZE, --standardize=STANDARDIZE
        Apply z-score so continuous metadata are on the same scale [ Default: TRUE ]

    -e CORES, --cores=CORES
        The number of R processes to run in parallel [ Default: 1 ]

Visualization

There are two functions in MaAsLin2 which visualize the outputs and provide ggplot2 plots that can be used to generate manuscript/report quality figures.

  • maaslin2_heatmap: this function generates a overview of all associations reported by MaAsLin2 and have the following parameters:

output_path : the path to the MaAsLin2 output

title: a title for the plot

cell_value: default 'Q.value'

data_label: default 'Data'

metadata_label: default 'Metadata'

border_color: default "grey93"

color: default colorRampPalette(c("blue","grey90", "red"))(500)

  • maaslin2_association_plots: this function produces plots (ggplot2) for each association and depends on the data types can be a scatter plot and boxplot. This function returns a vector of ggplot2 plots. The parameters for this function are as follow:

metadata_path: '/path-to-metadata-file/'

features_path: '/path-to-features-file/'

output_path: 'the path to the MaAsLin2 output'

write_to_file: default True

write_to: '~/path-to-output/'

Troubleshooting

  1. Question: When I run from the command line I see the error Maaslin2.R: command not found. How do I fix this?
    • Answer: Provide the full path to the executable when running Maaslin2.R.
  2. Question: When I run as a function I see the error Error in library(Maaslin2): there is no package called 'Maaslin2'. How do I fix this?
    • Answer: Install the R package and then try loading the library again.
  3. Question: When I try to install the R package I see errors about dependencies not being installed. Why is this?
    • Answer: Installing the R package will not automatically install the packages MaAsLin2 requires. Please install the dependencies and then install the MaAsLin2 R package.