HTTPS SSH

coexpp -- Large-scale Weighted Gene Coexpression Network Analysis

coexpp provides a focused coexpression network analysis workflow optimized for very large numbers of genes. Particular attention has been paid to mininimizing overall memory footprint. coexpp has a O(n^2) memory footprint with a constant factor very close to 1, and as such typically consumes one third of the memory of other WGCNA implementations.

coexpp wraps around the WGCNA package, replacing key memory and performance intensive operations with C++ implementations (using RcppEigen and Rclusterpp). Specifically coexpp maintains large matrices (those larger than R's maximum matrix size of ~46,000 x 46,000) entirely on the C++ side where they are not subject to R's size limits and copy-by-value semantics.

Note that coexpp is not a complete re-implementation of WGCNA. Instead it is optimization of the specific workflow in use at the sponsoring research organizations. coexpp was seeded from the SageBionetworksCoex package, and uses code developed at Sage Bionetworks by Bruce Hoff and others.

Installation

Minerva

coexpp is already installed on Minerva. Simply module load R and then library(coexpp) within R.

Mac OS X

The following procedure is designed for R version ≥3.2.2, Mac OS ≥10.10.5.

  1. You need to install WGCNA-R and its dependencies first.

  2. To enable multithreading, you need a compiler that supports OpenMP, like gcc 4.9 without multilib:

    • First, brew reinstall gcc --without-multilib; take a coffee break.
    • Put the following in ~/.R/Makevars

      CC = gcc-4.9
      CXX = g++-4.9
      PKG_CXXFLAGS += -fopenmp
      PKG_LIBS += -fopenmp
      SHLIB_OPENMP_CXXFLAGS = -fopenmp
      
  3. Install Rcpp and RcppEigen from CRAN source packages with install.packages(c("Rcpp", "RcppEigen"), type = "source"). You need to install from source because the binaries on CRAN don't have OpenMP enabled.

  4. Install flashClust from CRAN (the binary package is OK) with install.packages("flashClust").

  5. CRAN has Rclusterpp 0.2.3 (as of 2015-11-11), but you need ≥0.2.4, otherwise coexpp will have linking problems. Therefore, you need to install it straight from github, for instance:

    > install.packages("devtools")
    > devtools::install_github("nolanlab/Rclusterpp")
    
  6. Clone this repository, compress it as a .tar.gz with tar zcfv, and then within R: install.packages("path/to/.tar.gz", type="source").

Basic usage

In order for this package to make sense, you need to read the WGCNA tutorials first, as this package optimizes functionality from that package. The following is the most typical way to get started:

  1. Load coexpp on top of vanilla WGCNA-R: library(WGCNA); library(coexpp)

  2. coexppSetThreads(NULL) will enable multithreading, with as many threads as available cores.

  3. Get gene expression data into a matrix (let's call it geneExpr) with samples as rows and probes as columns.

  4. Run results <- coexpressionAnalysis(geneExpr) to kick off the standard workflow. results will be a list with a CoexppClusters object in $clusters (use ?CoexppClusters to read about its contents), the gene module color assignments in $geneModules, and a clustering of the modules in $genePCTree.

Extending/modifying/contributing to coexpp

A few notes if you are going to be modifying coexpp:

  1. coexpp uses roxygen2 to generate documentation. Do not modify the *.Rd files directly. Instead update the documentation at the function implementation and regenerate the documentation.