Modular Framework for Responsive and Explainable Robotic Assistance with Intention Prediction Using Human-Centric Digital Twins
Asad, Usman and Khalid, Azfar and Lughmani, Waqas Akbar and Rasheed, Shummaila and Khan, Muhammad Mahabat (2026) Modular Framework for Responsive and Explainable Robotic Assistance with Intention Prediction Using Human-Centric Digital Twins. Sensors, 26 (12). p. 3810. ISSN 1424-8220
Preview |
Text
sensors-26-03810-v2.pdf - Published Version Available under License Creative Commons Attribution. Download (44MB) |
Abstract
Proactive robotic assistance in human–robot collaboration (HRC) requires systems that can perceive evolving task contexts, anticipate user needs, and intervene appropriately without disrupting human workflow. We present the Agentic Unified Robotic Assistance (AURA) Framework, which couples Large Language Model (LLM) reasoning grounded by Standard Operating Procedures (SOPs) with a modular layer of specialized Intent, Motion, Perception, Sound, Affordance, and Performance Monitors that supply structured context to a central decision-making module, making the framework reconfigurable and auditable without retraining or re-prompting. We introduce a human-in-the-loop teleoperation data collection methodology and an offline evaluation scheme with an Appropriateness Score (A-Score) tailored to proactive intervention timing, and release a benchmark dataset of annotated multimodal HRC episodes containing workspace and robot wrist camera videos, robot joint states, and labeled intervention events. Across three tasks of varying complexity, we observe progressive gains in intent prediction and decision-making as the modules are supplied with richer grounded context (prior-state memory and tracked object locations), with Combined F1 rising by over 20 points between context-poor and context-rich conditions. The structured grounding allows lightweight multimodal backbones such as Gemini 3.1 Flash Lite to perform on par with heavier reasoning-tier models at roughly one-fifth the inference latency. Together, these contributions establish a scalable framework, benchmark, and evaluation methodology for advancing proactive robotic assistance in collaborative environments.
| Item Type: | Article |
|---|---|
| Identification Number: | 10.3390/s26123810 |
| Dates: | Date Event 13 June 2026 Accepted 15 June 2026 Published Online |
| Uncontrolled Keywords: | human–robot collaboration, proactive assistance, vision-language models, intent prediction, explainable AI, digital twins, industry 5.0 |
| Subjects: | CAH10 - engineering and technology > CAH10-01 - engineering > CAH10-01-01 - engineering (non-specific) |
| Divisions: | Architecture, Built Environment, Computing and Engineering > Engineering |
| Depositing User: | Gemma Tonks |
| Date Deposited: | 30 Jun 2026 12:49 |
| Last Modified: | 30 Jun 2026 12:49 |
| URI: | https://www.open-access.bcu.ac.uk/id/eprint/17094 |
Actions (login required)
![]() |
View Item |

Tools
Tools