Overview
Atlassian Sourcetree is a free Git and Mercurial client for Windows.
Atlassian Sourcetree is a free Git and Mercurial client for Mac.
Ad Hoc Monitoring of Vocabulary Shifts over Time - ground truth
This repository contains the ground truth material used to obtain the results reported in "Ad Hoc Monitoring of Vocabulary Shifts over Time", Tom Kenter, Melvin Wevers, Pim Huijnen, Maarten de Rijke, CIKM 2015.
If you use this material, please cite the paper:
@inproceedings{kenter2015vocabulary_shifts, title={Ad Hoc Monitoring of Vocabulary Shifts over Time}, author={Kenter, Tom and Wevers, Melvin and Huijnen, Pim and de Rijke, Maarten}, booktitle={CIKM}, year={2015} }
File format
There are 21 files, all of which are in the same format:
seed words<TAB>time period<TAB>candidate word<TAB>annotator 1 name<TAB>annotator 1 score<TAB>annotator 2 name<TAB>annotator 2 score
Character encoding
The files are UTF-8 encoded. You can see this works correctly in, e.g., the efficiency_efficiƫntie.txt file. Still, there are a lot of 'weird' characters. They stem from the original output of the OCR engine.