# ELSA: Extended Local Similarity Analysis

Finding Time-Dependent Associations in Time Series Datasets

## INTRODUCTION

In recent years, advances in molecular technologies have empowered researchers with the ability to spatially and temporally characterize natural microbial communities without lab cultivation. Mining and analyzing co-occurrence patterns in these new datasets are fundamental to revealing existing symbiosis relations and microbe-environment interactions. Time series data, in particular, are receiving more and more attention, since not only undirected but also directed associations can be inferred from these datasets.

Researchers typically use techniques like principal component analysis (PCA), multidimensional scaling (MDS), discriminant function analysis (DFA) and canonical correlation analysis (CCA)) to analyze microbial community data under various conditions. Different from these methods, the Local Similarity Analysis (LSA) technique is unique to capture the time-dependent associations (possibly time-shifted) between microbes and between microbe and environmental factors (Ruan et al., 2006). Significant LSA associations can be interpreted as a partially directed association network for further network-based analysis. A similar approach called Local Trend Analysis (LTA) has also been developed for the state change series, where a relative change threshold is applied to convert the original time series data into up-change, no-change and down-change state series (Xia et al 2015). Many advanced network analysis tools (including ELSA) have been analyzed in a benchmark paper published in the ISME Journal (Weiss et al. 2016).

Permutation based LSA and LTA techniques have discovered interesting and novel discoveries for microbial communities (Steele et al., 2011). However current dataset scale has made permutation based analysis cost-prohibitive. To improve computation efficiency, incorporate new features, such as time series data with replicates, and make the analysis technique more accessible to users, we have re-implemented the LSA and LTA algorithms as this ELSA package - a C++ extension to Python (Xia et al. 2011). Theoretical approximation have been recently developed for statistical testing of both LSA and LTA scores (Xia et al. 2013, 2015) and these updates were also included into eLSA packages. Users are allowed to choose testing methods based on theoretical approximation (fastest) as described in the later papers (Xia et al. 2013, 2015) or permutation (slow) as described in the original publication (Ruan et al., 2006), or to choose a hybrid approach, where theoretical approximation were first used for loose testing and screening and permutation approach is then applied to promising pairs to refine the P-values (see -p options in Manual).

## IMPLEMENTATION

Figure 1. The analysis workflow of Extended Local Similarity Analysis (ELSA) tools. Users start with raw data (matrices of time series) as input and specify their requirements as parameters. The ELSA tools subsequently F-transform and normalize the raw data and then calculate the Local Similarity (LS) Scores and/or Local Trend Scores. The tools then assess the statistical significance (P-values) of these correlation statistics using either permutation test or theoretical p-value approximation and filter out insignificant results. Finally, the tools construct a partially directed association network from significant associations.

## AVAILABILITY

1. Use released docker (Please email Dongmei Ai for support on this docker file: aidongmei@ustb.edu.cn):

sudo docker pull panhongfei/elsa:1.0
sudo docker run -it panhongfei/elsa:1.0
cd test/
sh test.sh
sudo docker run -it -v /media/d102/disk/panhongfei/charade-elsa-7bed46b84456/test:/install_soft panhongfei/elsa:1.0 #run elsa with mounted local directory

2. Use released: download released standalone source code package at: https://bitbucket.org/charade/elsa/get/release.tar.gz and install. Look into the README.txt file within the package (also viewable from https://bitbucket.org/charade/elsa) for detailed installation information and others.

3. Use developmental: source code git access at: https://bitbucket.org/charade/elsa. The python package is made open source for advanced users to pipeline the analysis or implement other variants.

## WIKI

ELSA's https://bitbucket.org/charade/elsa/wiki page is an growing resource for manuals, FAQs and other information. This is a MUST read place before you actually using the eLSA tool. These documentations are also openly editable. You are more than welcome to contribute to this ongoing documentation.

## CONTACTS

Questions and comments shall be addressed to lixia at stanford dot edu.

## CITATIONS

Please cite the references 1 and 2 if the ELSA python package and Local Similarity Analysis is used in your study. Please cite the references 2 and 3 if the ELSA python package and Local Trend Analysis is used in your study. Please also cite the reference 4 if you used the old R script or used the permutation approach in ELSA.

1. Li C Xia, Joshua A Steele, Jacob A Cram, Zoe G Cardon, Sheri L Simmons, Joseph J Vallino, Jed A Fuhrman and Fengzhu Sun. Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates BMC Systems Biology. 2011, 5(Suppl 2):S15 http://www.biomedcentral.com/1752-0509/5/S2/S15/
2. Li C. Xia, Dongmei Ai, Jacob Cram, Jed A. Fuhrman, Fengzhu Sun. Efficient Statistical Significance Approximation for Local Association Analysis of High-Throughput Time Series Data. Bioinformatics 2013, 29(2):230-237 http://bioinformatics.oxfordjournals.org/content/29/2/230.full
3. LC Xia, D Ai, JA Cram, X Liang, JA Fuhrman, F Sun. Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains. BMC Bioinformatics 2015, 16, 301 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0732-8
4. Quansong Ruan, Debojyoti Dutta, Michael S. Schwalbach, Joshua A. Steele, Jed A. Fuhrman and Fengzhu Sun. Local similarity analysis reveals unique associations among marine bacterioplankton species and environmental factors. Bioinformatics 2006, 22(20):2532-2538 http://bioinformatics.oxfordjournals.org/content/22/20/2532.short
5. Joshua A Steele, Peter D Countway, Li Xia, Patrick D Vigil, J Michael Beman, Diane Y Kim, Cheryl-Emiliane T Chow, Rohan Sachdeva, Adriane C Jones, Michael S Schwalbach, Julie M Rose, Ian Hewson, Anand Patel, Fengzhu Sun, David A Caron, Jed A Fuhrman. Marine bacterial, archaeal and protistan association networks reveal ecological linkages. The ISME Journal 2011, 51414–1425 http://www.nature.com/ismej/journal/v5/n9/abs/ismej201124a.html
6. S Weiss, VW Treuren, C Lozupone, K Faust, J Friedman, Y Deng, Li C. Xia, Z Xu, L Ursell, E Alm, A Birmingham, J Cram, J Fuhrman, J Raes, F Sun, J Zhou, R Knight. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. The ISME Journal 2016, Advanced online access http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2015235a.html

Updated