This lib contains source code used to compute Instance-Context Embeddings (ICE) and do word sense induction (WSI) in Kågebäck et al. (2015).
word sense induction and ICE
Word sense induction (WSI) is the task of automatically detecting word senses using only a text corpus. This can be achieved by clustering geometrical embeddings, each corresponding to an instance of the targeted polysemous word.
We introduce the method instance-context embedding for use in WSI. ICE leverages a novel approach for combining Skip-gram word embeddings, based on semantic and temporal aspects of the context words (i.e. the words surrounding the target word). For more information regarding the method and our evaluation of its performance please read the paper.
- C compiler (tested on gcc under Linux)
- Clone the repo into a folder of your choosing.
- Add the repo to the path of your Matlab project (see demo.m for an example).
Running the demo
- Run "run_demo.sh" in the demo folder inside the repo.
The demo will download a small corpus, run the skip-gram model and compute ICE embeddings for a set of predefined words. The resulting ICE embeddings are finally used to do WSI, and the centroids of the induced senses are saved to file.
Kågebäck, M., Johansson, F., Johansson, R., & Dubhashi, D. (2015, June). Neural context embeddings for automatic discovery of word senses. In Proceedings of NAACL-HLT (pp. 25-32).