The Causal News Corpus: Annotating Causal Relations in Event Sentences from News

Tan, Fiona Anting and Hürriyeto˘glu, Ali and Caselli, Tommaso and Oostdijk, Nelleke and Nomoto, Tadashi and Hettiarachchi, Hansi and Ameer, Iqra and Uca, Onur and Liza, Farhana Ferdousi and Hu, Tiancheng (2022) The Causal News Corpus: Annotating Causal Relations in Event Sentences from News. In: 13th Conference on Language Resources and Evaluation (LREC 2022), 20th - 25th June 2022, Marseille, France.

[img]
Preview
Text
2022.lrec-1.246.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (405kB)

Abstract

Despite the importance of understanding causality, corpora addressing causal relations are limited. There is a discrepancy between existing annotation guidelines of event causality and conventional causality corpora that focus more on linguistics. Many guidelines restrict themselves to include only explicit relations or clause-based arguments. Therefore, we propose an annotation schema for event causality that addresses these concerns. We annotated 3,559 event sentences from protest event news with labels on whether it contains causal relations or not. Our corpus is known as the Causal News Corpus (CNC). A neural network built upon a state-of-the-art pre-trained language model performed well with 81.20% F1 score on test set, and 83.46% in 5-folds cross-validation. CNC is transferable across two external corpora: CausalTimeBank (CTB) and Penn Discourse Treebank (PDTB). Leveraging each of these external datasets for training, we achieved up to approximately 64% F1 on the CNC test set without additional fine-tuning. CNC also served as an effective training and pre-training dataset for the two external corpora. Lastly, we demonstrate the difficulty of our task to the layman in a crowd-sourced annotation exercise. Our annotated corpus is publicly available, providing a valuable resource for causal text mining researchers.

Item Type: Conference or Workshop Item (Paper)
Dates:
DateEvent
4 April 2022Accepted
1 June 2022Published Online
Uncontrolled Keywords: causality, event causality, text mining, natural language understanding
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Hansi Hettiarachchi
Date Deposited: 07 Dec 2022 16:26
Last Modified: 07 Dec 2022 16:26
URI: https://www.open-access.bcu.ac.uk/id/eprint/13996

Actions (login required)

View Item View Item

Research

In this section...