Audio equalisation using natural language

Spyridon, Stasis (2019) Audio equalisation using natural language. Doctoral thesis, Birmingham City University.

PhD Thesis - Spyridon STASIS - Audio Equalisation Using Natural Language.pdf - Submitted Version

Download (4MB)


Equalisation allows a user to control a series of frequency-dependent gains by adjusting the parameters of a network of filters, capable of manipulating the timbre of a sound. In sound production, engineers often use natural language to refer to these timbral transformations, resulting in a shared vocabulary of descriptive terms. This lexicon of semantic terminology allows for simplified and compact descriptions of complex processing actions performed by sound engineers. However, due to the use of natural language to outline these operations, the meaning of the descriptive terms used may be misunderstood, or may possess divergent meanings for different individuals. The problems inherent to natural language can be alleviated by performing an analysis on the semantic terms that are used in music production and developing computational models based on their function. To perform this analysis crowdsourcing techniques are implemented in order to gather an extensive dataset of terms. In this manner it is possible to exposes the ways in which producers and engineers approach creative audio processing, and this analysis can then be used as a foundation for intuitive interface design. This thesis presents findings from a number of studies on the use of the semantic terminology used in music production, and formalises taxonomies of descriptive terms to provide novel methods for users to interface with equalisation parameters. Initially, the salience of equalisation in the context of a full processing chain is evaluated. Furthermore, the relationship between a number of key datasets in the field is explored, and synonymous and antonymous definitions within a core list of adjectives is established. In addition, the agreement between descriptive term definitions and the structural similarity of the datasets is analysed. Moreover, the extent to which a term can have multiple definitions, each of which is perceptually divergent, is examined. By clustering different definitions of the same term the concept of semantic sub-representations is introduced. In order to perform this analysis a model of stacked autoencoders is implemented. The model of stacked autoencoders is then used to create a novel audio production interface, by which users are able to control equalisation parameters based on descriptive language. An unweighted model is first presented, which allows users to navigate between different descriptors using a low-dimensional slider. In turn, signal processing techniques are implemented in order to make these term definitions adapt to a user’s input signal, and can be trained arbitrarily on parameter data.
Overall, this thesis validates the use of descriptive language as a medium for controlling equalisation parameters. It is shown that within this vocabulary, there are consistent relationships between recognised terms, from which a thesaurus of synonymous terminology is constructed. Using these concepts, methods for reducing barriers for inexperienced users are introduced through the development of intuitive abstract interfaces.

Item Type: Thesis (Doctoral)
22 February 2019Completed
Uncontrolled Keywords: Audio equalisation, intelligent music production, machine learning, neural networks, semantic audio
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
CAH11 - computing > CAH11-01 - computing > CAH11-01-04 - software engineering
Divisions: Doctoral Research College > Doctoral Theses Collection
Depositing User: Doris Riou
Date Deposited: 30 Jan 2020 14:47
Last Modified: 12 Jan 2022 12:57

Actions (login required)

View Item View Item


In this section...