"Your blog is (the) shit": a corpus linguistic approach to the identification of swearing in computer mediated communication

Lutzky, Ursula and Kehoe, Andrew (2016) "Your blog is (the) shit": a corpus linguistic approach to the identification of swearing in computer mediated communication. International Journal of Corpus Linguistics, 21 (2). pp. 165-191. ISSN 13846655 (ISSN)

Lutzky & Kehoe [finalised].pdf - Accepted Version

Download (690kB)


The study of swearing has increased in the last decade, diversifying to include a wider range of data and methods of analysis. Nevertheless, certain types of data and specifically large corpora of computer mediated communication (CMC) have not been studied extensively. In this paper, we fill a gap in research by studying the use of swearwords in blog data, and illustrate ways of identifying swearing in a large corpus by taking context into account. This approach, based on the examination of shared and unique collocates of known expletives, facilitates the distinction of attestations of swearing from non-swearing in the case of polysemous lexemes, and the analysis of overlaps in usage and meaning of swearwords. This work therefore goes beyond basic sentiment analysis and offers new insights into the use of collocation for refining profanity filters, providing innovative perspectives on issues of growing importance as online interaction becomes more widespread.

Item Type: Article
Identification Number: https://doi.org/10.1075/ijcl.21.2.02lut
Date: September 2016
Uncontrolled Keywords: Blogs, CMC, Collocation, Pragmatics, Swearing
Subjects: Q100 Linguistics
Q300 English studies
Divisions: Faculty of Arts, Design and Media > Birmingham Institute of Media and English > School of English
REF UoA Output Collections > REF2021 UoA27: English Language and Literature
Depositing User: Andrew Kehoe
Date Deposited: 10 Nov 2016 15:46
Last Modified: 27 Nov 2017 15:39
URI: http://www.open-access.bcu.ac.uk/id/eprint/3463

Actions (login required)

View Item View Item


In this section...