This repository contains an analysis on various methods for solving the EVALITA 2023 ACTI ("Automatic Conspiracy Theory Identification") challenge.
The ACTI challenge is an Italian NLP challenge about recognizing Conspiracy theories in Italian Telegram messages. It is composed by two subtasks:
- SubtaskA - Conspiratorial Content Classification: the model must recognize if the message contains conspiratorial theories or not.
- SubtaskB - Conspiracy Category Classification: the model must discriminate to which conspiracy theory a message belongs from 4 possible conspiracy topics (Covid, Qanon, Flat Earth, Pro-Russia).
The paper will soon be published.
The following model are compared to solve both subtasks:
- BERT: we employed bert-base-italian-xxl-cased, an exclusively Italian-pretrained variant of base BERT and finetuned it with a custom classification head.
- XLM-RoBERTa: we employed XLM-RoBERTa-large the multilingual variant of RoBERTa and finetuned it with a custom classification head.
- LLama: we employed Llama 7B as a feature extractor on the samples and used a custom MLP classifier to trained on Llama extracted features.
Only for SubtaskB:
- topic-specific tfidf baseline: Top K specific keyword to each conspiracy topic are found with topic-specific tfidf (metric calculated using base tfidf). Occurrences of top K keywords for each topic are then used as input of a Random Forest classifier.
We used mainly Pytorch and Transformers libraries.
Each model best hyperparams were found based on their performance over an hold-out validation set.
The used metric is F1 score macro-averaged. Reported scores are test set results.
Model | Test score |
---|---|
BERT | 0.8257 |
XLM-RoBERTa | 0.8203 |
Llama | 0.8022 |
Model | Test score |
---|---|
BERT | 0.8265 |
XLM-RoBERTa | 0.8532 |
Llama | 0.7389 |
Topic-specific tfidf | 0.7520 |