Augment Reality LLM Training System for Complex Machine Operations

Our AR-MLLM system integrates augmented reality with multimodal language models (e.g., ChatGPT) for context-aware guidance and activity recognition in complex machine tasks reducing errors and training time for non-experts.

Key Components

  • Context-Aware Guidance: Converts technical manuals and machine feedback into step-by-step AR overlays anchored to physical equipment using TARCO prompt framework for precise, deterministic outputs.
  • Hardware: Microsoft HoloLens 2 for immersive AR, Unity engine for development, Azure OpenAI for MLLM integration, with real-time image capture and spatial anchoring.
  • Algorithm: Dual validation (heuristic for image quality, semantic via MLLM) processes inputs, extracts features/actions, and activates prefabs for interactive training.
  • Simple Operation: Capture images of manuals or machine states, system interprets via prompts, renders guidance; supports tasks like stylus qualification and feature measurement.

Benefits

  • Enhances efficiency with reduced workload and faster task completion.
  • Improves accuracy in activity recognition and measurements.
  • Adaptable for broader industrial applications, minimizing supervision needs.