A Real-Life Evaluation of Supervised and Semi-Supervised Machine Learning Approaches for Indirect Estimation of Indoor Occupancy

Mena-Martinez, Alma and Davila Delgado, Manuel and Alvarado-Uribe, Joanna and Ceballos, Hector G. (2024) A Real-Life Evaluation of Supervised and Semi-Supervised Machine Learning Approaches for Indirect Estimation of Indoor Occupancy. IEEE Access, 12. pp. 118673-118693. ISSN 2169-3536

[thumbnail of A_Real-Life_Evaluation_of_Supervised_and_Semi-Supervised_Machine_Learning_Approaches_for_Indirect_Estimation_of_Indoor_Occupancy.pdf]
Preview
Text
A_Real-Life_Evaluation_of_Supervised_and_Semi-Supervised_Machine_Learning_Approaches_for_Indirect_Estimation_of_Indoor_Occupancy.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB)

Abstract

Occupancy information is essential for space management, energy efficiency, and in times of the COVID-19 pandemic, for crowd control. Obtaining labeled data is challenging due to hardware limitations, privacy considerations, and the required underlying costs. This study demonstrates the benefits of using Semi-Supervised Learning (SSL) for occupancy estimation in enclosed spaces; which requires less labeled data than other Machine Learning (ML) methods. It presents an empirical comparison between Supervised ML and SSL models in three real-life university classrooms (uncontrolled conditions). The data was collected for three weeks at each classroom using an in-house developed Internet of Things (IoT) device that measures air temperature, relative humidity, and atmospheric pressure. The ground truth records were gathered through manual logging of occupancy levels. Datasets’ sizes averaged 2350 entries with only 280 labeled instances per dataset. Support Vector Machine (SVM), Random Forest (RF), and Multi-Layer Perceptron (MLP) were used to define a performance baseline for supervised ML. Self-Training (ST) and Label Propagation (LP) were tested for SSL. ST achieved superior performance compared to baseline models (SVM, RF, MLP) with a highest average accuracy of 87.33% compared to SVM (86.66%). These results demonstrate the effectiveness of SSL for indirect occupancy estimation while reducing the need for extensive data collection and labeling.

Item Type: Article
Identification Number: 10.1109/ACCESS.2024.3449810
Dates:
Date
Event
18 August 2024
Accepted
26 August 2024
Published Online
Uncontrolled Keywords: Occupancy estimation, semi-supervised learning, environmental sensors, machine learning
Subjects: CAH17 - business and management > CAH17-01 - business and management > CAH17-01-02 - business studies
Divisions: Business School > Management, Business and Marketing
Depositing User: Gemma Tonks
Date Deposited: 31 Jul 2025 15:09
Last Modified: 31 Jul 2025 15:09
URI: https://www.open-access.bcu.ac.uk/id/eprint/16550

Actions (login required)

View Item View Item

Research

In this section...