DatApollo: Orchestration of Serverless Functions for Scalable Data Mining
Shahin, Mahtab and Bertl, Markus and Janatian, Nasim and Aznar-Poveda, Juan and Shah, Syed Attique and Fahringer, Thomas and Arakkal Peious, Sijo and Draheim, Dirk (2025) DatApollo: Orchestration of Serverless Functions for Scalable Data Mining. IEEE Access, 13. pp. 142813-142828. ISSN 2169-3536
Preview |
Text
DatApollo_Orchestration_of_Serverless_Functions_for_Scalable_Data_Mining-1.pdf - Published Version Available under License Creative Commons Attribution. Download (1MB) |
Abstract
With the exponential growth of data generated from enterprise systems, social networks, and the Internet of Things, traditional data mining techniques face major challenges in terms of scalability and efficiency. As a foundational unsupervised learning method for detecting patterns in transactional datasets, Association Rule Mining (ARM) is commonly encountered in distributed environments with performance bottlenecks due to excessive memory consumption, static resource provisioning, and costly data shuffle. The present paper presents DatApollo, a novel serverless orchestration framework that enables the execution of distributed ARM workflows in a scalable and efficient manner. DatApollo, based on the Apollo orchestration engine, offers stateless cloud functions, dynamic scheduling, intermediate state persistence, and fault-tolerant coordination in order to address the limitations of both traditional cluster-based architectures and existing Function-as-a-Service models. By decomposing ARM pipelines into orchestrated microfunctions, the framework enables elastic, cloud-native execution with minimal idle time. Using real-world healthcare and meteorological datasets, we describe the architectural design, algorithmic components, and computational complexity of DatApollo and perform a comprehensive experimental evaluation. DatApollo provides up to five times faster execution time compared to Apache Spark and lowers infrastructure costs by utilizing elastic scaling and event-driven function invocations. The results demonstrate that DatApollo is a robust, cost-effective and high-performance alternative to ARM in dynamic, large-scale data environments.
| Item Type: | Article |
|---|---|
| Identification Number: | 10.1109/ACCESS.2025.3591712 |
| Dates: | Date Event 14 July 2025 Accepted 24 July 2025 Published Online |
| Uncontrolled Keywords: | Association rule mining, big data, serverless computing, function-as-a-service (FaaS), workflow orchestration, Apriori algorithm, cloud scalability, Apache spark. |
| Subjects: | CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science |
| Divisions: | Architecture, Built Environment, Computing and Engineering > Computer Science |
| Depositing User: | Gemma Tonks |
| Date Deposited: | 12 May 2026 12:13 |
| Last Modified: | 12 May 2026 12:13 |
| URI: | https://www.open-access.bcu.ac.uk/id/eprint/17033 |
Actions (login required)
![]() |
View Item |

Tools
Tools