Skip to content

Code for ACTI (Automatic Conspiracy Theory Identification), a task part of EVALITA 2023 challenges about Natural Language Processing in italian.

License

Notifications You must be signed in to change notification settings

giacomo-cgn/nlp-EVALITA2023-ACTI-challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EVALITA 2023 ACTI challenge - HLT project

This repository contains an analysis on various methods for solving the EVALITA 2023 ACTI ("Automatic Conspiracy Theory Identification") challenge.

The ACTI challenge is an Italian NLP challenge about recognizing Conspiracy theories in Italian Telegram messages. It is composed by two subtasks:

  • SubtaskA - Conspiratorial Content Classification: the model must recognize if the message contains conspiratorial theories or not.
  • SubtaskB - Conspiracy Category Classification: the model must discriminate to which conspiracy theory a message belongs from 4 possible conspiracy topics (Covid, Qanon, Flat Earth, Pro-Russia).

The paper will soon be published.

Models

The following model are compared to solve both subtasks:

  • BERT: we employed bert-base-italian-xxl-cased, an exclusively Italian-pretrained variant of base BERT and finetuned it with a custom classification head.
  • XLM-RoBERTa: we employed XLM-RoBERTa-large the multilingual variant of RoBERTa and finetuned it with a custom classification head.
  • LLama: we employed Llama 7B as a feature extractor on the samples and used a custom MLP classifier to trained on Llama extracted features.

Only for SubtaskB:

  • topic-specific tfidf baseline: Top K specific keyword to each conspiracy topic are found with topic-specific tfidf (metric calculated using base tfidf). Occurrences of top K keywords for each topic are then used as input of a Random Forest classifier.

Implementation

We used mainly Pytorch and Transformers libraries.

Each model best hyperparams were found based on their performance over an hold-out validation set.

Results

The used metric is F1 score macro-averaged. Reported scores are test set results.

SubtaskA

Model Test score
BERT 0.8257
XLM-RoBERTa 0.8203
Llama 0.8022

SubtaskB

Model Test score
BERT 0.8265
XLM-RoBERTa 0.8532
Llama 0.7389
Topic-specific tfidf 0.7520

About

Code for ACTI (Automatic Conspiracy Theory Identification), a task part of EVALITA 2023 challenges about Natural Language Processing in italian.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published