About ark-sage

Ark-SAGE is a Java library that implements the L1-regularized version of Sparse Additive GenerativE models of Text (SAGE). SAGE is an algorithm for learning sparse representations of text. Details of the algorithm is described in

Eisenstein, Jacob, Amr Ahmed, and Eric P. Xing. "Sparse additive generative models of text." In Proceedings of ICML. pp. 1041-1048. 2011. PDF

The idea behind the L1-regularized implementation of SAGE is briefly described in

Yanchuan Sim, Noah A. Smith, David A. Smith. "Discovering Factions in the Computational Linguistics Community." In Proceedings of the Association for Computational Linguistics (ACL 2012) Special Workshop on Rediscovering 50 Years of Discoveries. pp. 22-32. 2012. PDF

There are several ways you can use this library. The most straightforward way is to use SAGE for learning sparse effects without latent variables using the tool included in the library. You can run the tool using the shell script

./ --help

See the relevant Javadoc for ark-sage on


for more details.

Fixes to Version 0.1 (4/7/2013)

  • Fixed usage information appearing > 1 time.

Fixes to Version 0.1 (3/9/2013)

  • Added regularization penalty to log likelihood calculation.
  • Fixed wrong commons-math library version in script
  • Added -XX:ParallelGCThreads as a default option in script

Version 0.1 (3/3/2013)

  • Initial release of SAGE along with SupervisedSAGE implementation.