DatApollo: Orchestration of Serverless Functions for Scalable Data Mining

Shahin, Mahtab and Bertl, Markus and Janatian, Nasim and Aznar-Poveda, Juan and Shah, Syed Attique and Fahringer, Thomas and Arakkal Peious, Sijo and Draheim, Dirk (2025) DatApollo: Orchestration of Serverless Functions for Scalable Data Mining. IEEE Access, 13. pp. 142813-142828. ISSN 2169-3536

[thumbnail of DatApollo_Orchestration_of_Serverless_Functions_for_Scalable_Data_Mining-1.pdf]
Preview
Text
DatApollo_Orchestration_of_Serverless_Functions_for_Scalable_Data_Mining-1.pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB)

Abstract

With the exponential growth of data generated from enterprise systems, social networks, and the Internet of Things, traditional data mining techniques face major challenges in terms of scalability and efficiency. As a foundational unsupervised learning method for detecting patterns in transactional datasets, Association Rule Mining (ARM) is commonly encountered in distributed environments with performance bottlenecks due to excessive memory consumption, static resource provisioning, and costly data shuffle. The present paper presents DatApollo, a novel serverless orchestration framework that enables the execution of distributed ARM workflows in a scalable and efficient manner. DatApollo, based on the Apollo orchestration engine, offers stateless cloud functions, dynamic scheduling, intermediate state persistence, and fault-tolerant coordination in order to address the limitations of both traditional cluster-based architectures and existing Function-as-a-Service models. By decomposing ARM pipelines into orchestrated microfunctions, the framework enables elastic, cloud-native execution with minimal idle time. Using real-world healthcare and meteorological datasets, we describe the architectural design, algorithmic components, and computational complexity of DatApollo and perform a comprehensive experimental evaluation. DatApollo provides up to five times faster execution time compared to Apache Spark and lowers infrastructure costs by utilizing elastic scaling and event-driven function invocations. The results demonstrate that DatApollo is a robust, cost-effective and high-performance alternative to ARM in dynamic, large-scale data environments.

Item Type: Article
Identification Number: 10.1109/ACCESS.2025.3591712
Dates:
Date
Event
14 July 2025
Accepted
24 July 2025
Published Online
Uncontrolled Keywords: Association rule mining, big data, serverless computing, function-as-a-service (FaaS), workflow orchestration, Apriori algorithm, cloud scalability, Apache spark.
Subjects: CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions: Architecture, Built Environment, Computing and Engineering > Computer Science
Depositing User: Gemma Tonks
Date Deposited: 12 May 2026 12:13
Last Modified: 12 May 2026 12:13
URI: https://www.open-access.bcu.ac.uk/id/eprint/17033

Actions (login required)

View Item View Item

Research

In this section...