Wiki

Clone wiki

CATI / Home

Welcome

CATI is a platform assisting end-users in the construction of an annotated corpus. It combines event-detection with Active Learning. This platform is supported by LABEX IMU under the project IDENUM: Identités numèriques urbaines.

To do so, we propose a pipeline of methods to classify the documents in 3 stages. The Figure below shows an overview of the proposed method. The first phase comprises the detection of events. This uses textual and image features if available. The second phase consists of the classification of a subset of the documents based on the detected events and the assigned documents. As the event-related documents usually represent a small part of the dataset, the third phase assists the user to AL in the classification of the remaining documents.

process.png

Publications on CATI:

  • CATI: An Active Learning System For Event Detection On Mibroblogs Large Datasets. Gabriela Bosetti, Elöd Egyed-Zsigmond, Lucas Okumura-Ono. Webist 2019, Vienne, Autriche. (to appear)

  • Assisted Classification Through Image- and Text-Based Event. Gabriela Bosetti, Elöd Egyed-Zsigmond, Lucas Okumura-Ono. BDA 2019, Lyon (to appear)

Installing and importing data into CATI

Instructions for installation and data importation are available at getting started page

Using CATI

event-detection-banner.png

banner-classification.png

banner.png

Running the experiments.py file

If you want to run the experiments, please execute it with at least the following 3 arguments:

python experiment.py -i your_index -s your_target_session -gts your_groundtruth_session
You can access the full list of optional arguments by executing:
python experiment.py -h
If you are using PyCharm, you can also edit the run/debug configuration and add the following example parameters:
-df True -cr True -dl True -i experiment_lyon_2015_gt -s session_lyon2015_test_03 -gts session_lyon2015_gt

Extending CATI

CATI offers some extension points to add new Search Modules and Active Learning models and sampling strategies. Please, read the dedicated section intended for developers.

Annexes

Updated