k-NN Embedding Stability for word2vec Hyper-Parametrisation in Scientific Text

Dridi, Amna and Gaber, Mohamed Medhat and Azad, R. Muhammad Atif and Bhogal, J. (2018) k-NN Embedding Stability for word2vec Hyper-Parametrisation in Scientific Text. In: Discovery Science, October 29–31, 2018, Limassol, Cyprus.

[img]
Preview
Text
Camera Ready Paper 59.pdf

Download (370kB)

Abstract

Word embeddings are increasingly attracting the attention of researchers dealing with semantic similarity and analogy tasks. However, finding the optimal hyper-parameters remains an important challenge due to the resulting impact on the revealed analogies mainly for domain-specific corpora. While analogies are highly used for hypotheses synthesis, it is crucial to optimise word embedding hyper-parameters for precise hypothesis synthesis. Therefore, we propose, in this paper, a methodological approach for tuning word embedding hyper-parameters by using the stability of k-nearest neighbors of word vectors within scientific corpora and more specifically Computer Science corpora with Machine learning adopted as a case study. This approach is tested on a dataset created from NIPS (Conference on Neural Information Processing Systems) publications, and evaluated with a curated ACM hierarchy and Wikipedia Machine Learning outline as the gold standard. Our quantitative and qualitative analysis indicate that our approach not only reliably captures interesting patterns like ``unsupervised_learning is to kmeans as supervised_learning is to knn'', but also captures the analogical hierarchy structure of Machine Learning and consistently outperforms the $$61\backslash%$$61%sate-of-the-art embeddings on syntactic accuracy with $$68\backslash%$$68%.

Item Type: Conference or Workshop Item (Speech)
Subjects: G400 Computer Science
Divisions: Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology > Enterprise Systems
Depositing User: Mohamed Gaber
Date Deposited: 29 Oct 2018 14:27
Last Modified: 29 Oct 2018 14:27
URI: http://www.open-access.bcu.ac.uk/id/eprint/6501

Actions (login required)

View Item View Item

Research

In this section...