Transformer-based active learning for multi-class text annotation and classification

Afzal, Muhammad and Hussain, Jamil and Abbas, Asim and Hussain, Maqbool and Attique, Muhammad and Lee, Sungyoung (2024) Transformer-based active learning for multi-class text annotation and classification. Digital Health, 10. ISSN 2055-2076

[thumbnail of afzal-et-al-2024-transformer-based-active-learning-for-multi-class-text-annotation-and-classification.pdf]
Preview
Text
afzal-et-al-2024-transformer-based-active-learning-for-multi-class-text-annotation-and-classification.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB)

Abstract

Objective

Data-driven methodologies in healthcare necessitate labeled data for effective decision-making. However, medical data, particularly in unstructured formats, such as clinical notes, often lack explicit labels, making manual annotation challenging and tedious.
Methods

This paper introduces a novel deep active learning framework designed to facilitate the annotation process for multiclass text classification, specifically using the SOAP (subjective, objective, assessment, plan) framework, a widely recognized medical protocol. Our methodology leverages transformer-based deep learning techniques to automatically annotate clinical notes, significantly easing the manual labor involved and enhancing classification performance. Transformer-based deep learning models, with their ability to capture complex patterns in large datasets, represent a cutting-edge approach for advancing natural language processing tasks.
Results

We validate our approach through experiments on a diverse set of clinical notes from publicly available datasets, comprising over 426 documents. Our model demonstrates superior classification accuracy, with an F1 score improvement of 4.8% over existing methods but also provides a practical tool for healthcare professionals, potentially improving clinical documentation practices and patient care.
Conclusions

The research underscores the synergy between active learning and advanced deep learning, paving the way for future exploration of automatic text annotation and its implications for clinical informatics. Future studies will aim to integrate multimodal data and large language models to enhance the richness and accuracy of clinical text analysis, opening new pathways for comprehensive healthcare insights.

Item Type: Article
Identification Number: 10.1177/20552076241287357
Dates:
Date
Event
1 October 2024
Accepted
17 October 2024
Published Online
Uncontrolled Keywords: Text classification, text annotation, active learning, transfer learning, deep learning, BERT, clinical text, SOAP
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Faculty of Computing, Engineering and the Built Environment > College of Computing
Depositing User: Gemma Tonks
Date Deposited: 13 Jan 2025 14:15
Last Modified: 13 Jan 2025 14:15
URI: https://www.open-access.bcu.ac.uk/id/eprint/16071

Actions (login required)

View Item View Item

Research

In this section...