Wiki
Clone wikiCATI / Home
Welcome
CATI is a platform assisting end-users in the construction of an annotated corpus. It combines event-detection with Active Learning. This platform is supported by LABEX IMU under the project IDENUM: Identités numèriques urbaines.
To do so, we propose a pipeline of methods to classify the documents in 3 stages. The Figure below shows an overview of the proposed method. The first phase comprises the detection of events. This uses textual and image features if available. The second phase consists of the classification of a subset of the documents based on the detected events and the assigned documents. As the event-related documents usually represent a small part of the dataset, the third phase assists the user to AL in the classification of the remaining documents.
Publications on CATI:
-
CATI: An Active Learning System For Event Detection On Mibroblogs Large Datasets. Gabriela Bosetti, Elöd Egyed-Zsigmond, Lucas Okumura-Ono. Webist 2019, Vienne, Autriche. (to appear)
-
Assisted Classification Through Image- and Text-Based Event. Gabriela Bosetti, Elöd Egyed-Zsigmond, Lucas Okumura-Ono. BDA 2019, Lyon (to appear)
Installing and importing data into CATI
Instructions for installation and data importation are available at getting started page
Using CATI
- Phase 2: Initial classification
- Phase 3: Active learning process
Running the experiments.py file
If you want to run the experiments, please execute it with at least the following 3 arguments:
python experiment.py -i your_index -s your_target_session -gts your_groundtruth_session
python experiment.py -h
-df True -cr True -dl True -i experiment_lyon_2015_gt -s session_lyon2015_test_03 -gts session_lyon2015_gt
Extending CATI
CATI offers some extension points to add new Search Modules and Active Learning models and sampling strategies. Please, read the dedicated section intended for developers.
Annexes
Updated