This is the wiki page for the Law and Public Policy Lab text analysis platform. The goal is an integrated text analysis platform for social science research. This is an integrated system for storage, management, and analysis of text data and social-science data.

Download the code here.

Team Members

Feature Extraction (/features/)

Feature Extraction Overview

Pre-Processing Overview


Word2Vec Word Vectors

Dependencies and Syntactic N-grams



Doc2Vec Paragraph Vectors

Hierarchical Word Clusters

Unsupervised Dimension Reduction (/unsupervised/)

Latent Dirichlet Allocation

Hierarchical Dirichlet Process

Word Vector Clusters

Principal Component Analysis

Supervised Dimension Reduction and Prediction (/supervised/)

Partial Least Squares

Multinomial Inverse Regression

Chi-Squared Threshold

Gram-wise OLS

Elastic Net

Visualization (/visualization/)

Word clouds, topic clusters, binned scatter plots, etc.

Databases (/database/)

Databases on sqlite, postgreSQL and table functionalities on postgres



OpenSUSE Appliance

Server Admin

Pronunciation Guide

We need to work on a test suite. This should include measures of fit, cross validation, perplexity, etc.

Eventually, we would like to do a GUI. This should be able to pull Google N-grams, for example.