Leveraging Evaluation Time to Produce Simple Machine Learning Models with Genetic Programming

Sambo, Aliyu Sani (2022) Leveraging Evaluation Time to Produce Simple Machine Learning Models with Genetic Programming. Doctoral thesis, Birmingham City University.

Aliyu Sani Sambo_PhD Thesis_Final Version_Submitted Jan 2022_Final Award Apr 2022.pdf - Accepted Version

Download (7MB)


The quest for simple solutions is not new in machine learning (ML) and related methods such as genetic programming (GP). GP is a nature-inspired approach to the automatic programming of computers used to create solutions to a broad range of computational problems. However, the evolving solutions can grow unnecessarily complex, which presents considerable challenges. Typically, the control of complexity in GP means reducing the sizes of the evolved expressions – known as bloat-control. However, size is a function of solution representation, and hence it does not consistently capture complexity across diverse GP applications. Instead, this thesis proposes to estimate the complexity of the evolving solutions by their evaluation time – the computational time required to evaluate a GP evolved solution on the given task. After all, the evaluation time depends not only on the size of the evolved expressions but also on other aspects such as their composition, thus acting as a more nuanced measure of model complexity than the expression size alone. Also, GP evaluates all the solutions in a population identically to determine their relative performance, for example, with the same dataset. Therefore, evaluation time can consistently compare the relative complexity.

To discourage complexity using the proposed evaluation time, two approaches are used. The first approach explicitly penalises models with long evaluation times by customising well-tested techniques that traditionally control the size. The second uses a novel technique that implicitly discourages long evaluation times by incorporating a race condition in the GP process. The proposed methods yield accurate yet simple solutions; furthermore, the implicit method improves the runtime and training speed of GP.

Across a diverse suite of GP applications, the evaluation time methods proffer several qualitative advantages over the bloat-control methods. They effectively manage the functional complexity of regression models to enable them to predict unseen data (generalise) better than those produced by bloat-control. In two feature engineering applications, they decrease the number of features – principally responsible for model complexity – while bloat-control does not. In a robot control application, they evolve accurate and efficient routines – efficient routines use fewer time steps to complete their tasks; bloat-control could not detect the efficiency of the programs. In Boolean logic problems where size emerges as the major cause of complexity, these methods are not hindered and perform at least as well as bloat-control. Overall, the proposed system characterises and manages various forms of complexity; also, it is broadly applicable and, hence, suitable for an automatic programming system.

Item Type: Thesis (Doctoral)
25 January 2022Submitted
28 April 2022Accepted
Uncontrolled Keywords: Genetic programming, Model complexity, Evolutionary Computing, Parallel computing, Evaluation time, Machine Learning, Artificial intelligence, Symbolic Regression, Automatic Programming, Program Synthesis
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
CAH11 - computing > CAH11-01 - computing > CAH11-01-05 - artificial intelligence
Divisions: Doctoral Research College > Doctoral Theses Collection
Faculty of Computing, Engineering and the Built Environment > School of Computing and Digital Technology
Depositing User: Jaycie Carter
Date Deposited: 27 Jul 2022 10:04
Last Modified: 27 Jul 2022 10:04
URI: https://www.open-access.bcu.ac.uk/id/eprint/13439

Actions (login required)

View Item View Item


In this section...