Leveraging Temporal Word Embeddings for the Detection of Scientific Trends

Dridi, Amna (2021) Leveraging Temporal Word Embeddings for the Detection of Scientific Trends. Doctoral thesis, Birmingham City University.

Amna Dridi PhD Thesis published_Final version_Submitted Feb 2021_Final Award Jul 2021.pdf - Accepted Version

Download (5MB)


Tracking the dynamics of science and early detection of the emerging research trends could potentially revolutionise the way research is done. For this reason, computational history of science and trend analysis have become an important area in academia and industry. This is due to the significant implications for research funding and public policy. The literature presents several emerging approaches to detecting new research trends. Most of these approaches rely mainly on citation counting. While citations have been widely used as indicators of emerging research topics, they pose several limitations. Most importantly, citations can take months or even years to progress and then to reveal trends. Furthermore, they fail to dig into the paper content.

To overcome this problem, this thesis leverages a natural language processing method – namely temporal word embeddings – that learns semantic and syntactic relations among words over time. The principle objective of this method is to study the change in pairwise similarities between pairs of scientific keywords over time, which helps to track the dynamics of science and detect the emerging scientific trends. To this end, this thesis proposes a methodological approach to tune the hyper-parameters of word2vec – the word embedding technique used in this thesis – within the scientific text. Then, it provides a suite of novel approaches that aim to perform the computational history of science by detecting the emerging scientific trends and tracking the dynamics of science. The detection of the emerging scientific trends is performed through the two approaches Hist2vec and Leap2Trend.These two approaches are, respectively, devoted to the detection of converging keywords and contextualising keywords. On the other hand, the dynamics of science is performed by Vec2Dynamics that tracks the evolvement of semantic neighborhood of keywords over time.

All of the proposed approaches have been applied to the area of machine learning and validated against different gold standards. The obtained results reveal the effectiveness of the proposed approaches to detect trends and track the dynamics of science. More specifically, Hist2vec strongly correlates with citation counts with 100% Spearman’s positive correlation. Additionally, Leap2Trend performs with more than 80% accuracy and 90% precision in detecting emerging trends. Also, Vec2Dynamics shows great potential to trace the history of machine learning literature exactly as the machine learning timeline does. Such significant findings evidence the utility of the proposed approaches for performing the computational history of science.

Item Type: Thesis (Doctoral)
8 February 2021Submitted
1 July 2021Accepted
Uncontrolled Keywords: Scholarly data mining, trend analysis, computational history of science, machine learning, temporal word embeddings
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
CAH11 - computing > CAH11-01 - computing > CAH11-01-08 - others in computing
Divisions: Doctoral Research College > Doctoral Theses Collection
Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Jaycie Carter
Date Deposited: 11 Jul 2022 16:01
Last Modified: 11 Jul 2022 16:01
URI: https://www.open-access.bcu.ac.uk/id/eprint/13410

Actions (login required)

View Item View Item


In this section...