Modular Framework for Responsive and Explainable Robotic Assistance with Intention Prediction Using Human-Centric Digital Twins

Asad, Usman; Khalid, Azfar; Lughmani, Waqas Akbar; Rasheed, Shummaila; Khan, Muhammad Mahabat

Modular Framework for Responsive and Explainable Robotic Assistance with Intention Prediction Using Human-Centric Digital Twins

Asad, Usman and Khalid, Azfar and Lughmani, Waqas Akbar and Rasheed, Shummaila and Khan, Muhammad Mahabat (2026) Modular Framework for Responsive and Explainable Robotic Assistance with Intention Prediction Using Human-Centric Digital Twins. Sensors, 26 (12). p. 3810. ISSN 1424-8220

Preview

Text
sensors-26-03810-v2.pdf - Published Version
Available under License Creative Commons Attribution.
Download (44MB)

Official URL: https://www.mdpi.com/1424-8220/26/12/3810

Abstract

Proactive robotic assistance in human–robot collaboration (HRC) requires systems that can perceive evolving task contexts, anticipate user needs, and intervene appropriately without disrupting human workflow. We present the Agentic Unified Robotic Assistance (AURA) Framework, which couples Large Language Model (LLM) reasoning grounded by Standard Operating Procedures (SOPs) with a modular layer of specialized Intent, Motion, Perception, Sound, Affordance, and Performance Monitors that supply structured context to a central decision-making module, making the framework reconfigurable and auditable without retraining or re-prompting. We introduce a human-in-the-loop teleoperation data collection methodology and an offline evaluation scheme with an Appropriateness Score (A-Score) tailored to proactive intervention timing, and release a benchmark dataset of annotated multimodal HRC episodes containing workspace and robot wrist camera videos, robot joint states, and labeled intervention events. Across three tasks of varying complexity, we observe progressive gains in intent prediction and decision-making as the modules are supplied with richer grounded context (prior-state memory and tracked object locations), with Combined F1 rising by over 20 points between context-poor and context-rich conditions. The structured grounding allows lightweight multimodal backbones such as Gemini 3.1 Flash Lite to perform on par with heavier reasoning-tier models at roughly one-fifth the inference latency. Together, these contributions establish a scalable framework, benchmark, and evaluation methodology for advancing proactive robotic assistance in collaborative environments.

Item Type:	Article
Identification Number:	10.3390/s26123810
Dates:	Date Event 13 June 2026 Accepted 15 June 2026 Published Online
Uncontrolled Keywords:	human–robot collaboration, proactive assistance, vision-language models, intent prediction, explainable AI, digital twins, industry 5.0
Subjects:	CAH10 - engineering and technology > CAH10-01 - engineering > CAH10-01-01 - engineering (non-specific)
Divisions:	Architecture, Built Environment, Computing and Engineering > Engineering
Depositing User:	Gemma Tonks
Date Deposited:	30 Jun 2026 12:49
Last Modified:	30 Jun 2026 12:49
URI:	https://www.open-access.bcu.ac.uk/id/eprint/17094