Efficient Textual Similarity using Semantic MinHashing
Nawaz, Waqas and Baig, Maryam and Khan, Kifayat Ullah (2024) Efficient Textual Similarity using Semantic MinHashing. In: 2024 IEEE International Conference on Big Data and Smart Computing, 18th-21st February 2024, Bangkok, Thailand.
Preview |
Text
Efficient_Textual_Similarity_using_Semantic_MinHashing.pdf - Accepted Version Download (855kB) |
Abstract
Quantifying the likeness between words, sentences, paragraphs, and documents plays a crucial role in various applications of natural language processing (NLP). As Bert, Elmo, and Roberta exemplified, contemporary methodologies leverage neural networks to generate embeddings, necessitating substantial data and training time for cutting-edge performance. Alternatively, semantic similarity metrics are based on knowledge bases like WordNet, using approaches such as the shortest path between words. MinHashing, a nimble technique, quickly approximates Jaccard similarity scores for document pairs. In this study, we propose employing MinHashing to gauge semantic scores by enhancing original documents with information from semantic networks, incorporating relationships such as syn-onyms, antonyms, hyponyms, and hypernyms. This augmentation improves lexical similarity based on semantic insights. The MinHash algorithm calculates compact signatures for extended vectors, mitigating dimensionality concerns. The similarity of these signatures reflects the semantic score between the documents. Our method achieves approximately 64 % accuracy in the MRPC and SICK data sets.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Identification Number: | 10.1109/BigComp60711.2024.00048 |
Dates: | Date Event 11 December 2023 Accepted 11 April 2024 Published Online |
Uncontrolled Keywords: | MinHashing, Semantic similarity, WordNet, Natural Language Processing (NLP), Jaccard similarity, Algorithm |
Subjects: | CAH11 - computing > CAH11-01 - computing > CAH11-01-03 - information systems CAH11 - computing > CAH11-01 - computing > CAH11-01-05 - artificial intelligence |
Divisions: | Faculty of Business, Law and Social Sciences > College of Accountancy, Finance and Economics Faculty of Business, Law and Social Sciences > College of Accountancy, Finance and Economics > Centre for Accountancy Finance and Economics |
Depositing User: | Kifayat Khan |
Date Deposited: | 02 Jan 2025 14:35 |
Last Modified: | 02 Jan 2025 14:35 |
URI: | https://www.open-access.bcu.ac.uk/id/eprint/16059 |
Actions (login required)
View Item |