Text Embedding-based Event Detection for Social and News Media

Hettiarachchi, Hansi (2023) Text Embedding-based Event Detection for Social and News Media. Doctoral thesis, Birmingham City University.

Hansi Hettiarachchi PhD Thesis published_Final version_Submitted Dec 2022_Final Award Jun 2023 .pdf - Accepted Version

Download (10MB)


Today, social and news media are the leading platforms that distribute newsworthy content, and most internet users access them regularly to get information. However, due to the data’s unstructured nature and vast volume, manual analyses to extract information require enormous effort. Thus, automated intelligent mechanisms have become crucial. The literature presents several emerging approaches for social and news media event detection, along with distinct evolutions, mainly due to the variations in the media. However, most available social media event detection approaches primarily rely on data statistics, ignoring linguistics, making them vulnerable to information loss. Also, the available news media event detection approaches mostly fail to capture long-range text dependencies and support predictions of low-resource languages (i.e. languages with relatively fewer data). The possibility of utilising interconnections between different data levels to improve final predictions also has not been adequately explored.

This research investigates how the characteristics of text embeddings built using prediction-based models that have proven capabilities to capture linguistics can be used in event detection while defeating available limitations. Initially, it redefines the problem of event detection based on two data granularities, coarse- and fine-grained levels, to allow systems to tackle different information requirements. Mainly, the coarse-grained level targets the notification of event occurrences and the fine-grained level targets the provision of event details. Following the new definition, this research proposes two novel approaches for coarse- and fine-grained level event detections on social media, Embed2Detect and WhatsUp, mainly utilising linguistics captured by self-learned word embeddings and their hierarchical relationships in dendrograms. For news media event detection, this proposes a TRansformer-based Event Document classification architecture (TRED) involving long-sequence and cross-lingual transformer encoders and a novel learning strategy, Two-phase Transfer Learning (TTL), supporting the capturing of long-range dependencies and data level interconnections.

All the proposed approaches have been evaluated on recent real datasets, covering four aspects crucial for event detection: accuracy, efficiency, expandability and scalability. Social media data from two diverse domains and news media data from four high- and low-resource languages are mainly involved. The obtained results reveal that the proposed approaches outperform the state-of-the-art methods despite the data diversities, proving their accuracy and expandability. Additionally, the evaluations on efficiency and scalability adequately confirm the methods’ appropriateness for (near) real-time processing and ability to handle large data volumes. In summary, the achievement of all crucial requirements evidences the potential and utility of proposed approaches for event detection in social and news media.

Item Type: Thesis (Doctoral)
1 December 2022Submitted
6 June 2023Accepted
Uncontrolled Keywords: Event Detection, Social Media, News Media, Text Embeddings
Subjects: CAH10 - engineering and technology > CAH10-03 - materials and technology > CAH10-03-06 - others in technology
CAH24 - media, journalism and communications > CAH24-01 - media, journalism and communications > CAH24-01-05 - media studies
Divisions: Doctoral Research College > Doctoral Theses Collection
Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Jaycie Carter
Date Deposited: 20 Jul 2023 14:41
Last Modified: 20 Jul 2023 14:41
URI: https://www.open-access.bcu.ac.uk/id/eprint/14623

Actions (login required)

View Item View Item


In this section...