From Prompt to Service: An SLM-Based Agent Orchestration Gateway for AI-Driven Virtual Worlds

2026-06-02 • Artificial Intelligence

Artificial IntelligenceHuman-Computer Interaction

AI summaryⓘ

The authors address the challenge of managing different AI models needed for virtual worlds where users interact in many ways. They propose a system called an Agent Orchestration Gateway that uses a small language model (SLM) at the edge to understand user intent and send requests to the right AI backend without changing the virtual world app. Their tests in a virtual museum setup show that using compact models for routing and larger models for conversation works efficiently on typical edge devices. This architecture helps virtual worlds add new AI features more easily and run them more smoothly across both cloud and edge systems.

generative AIvirtual worldsagent orchestrationsemantic intentservice routingsmall language modelsedge computingcloud infrastructurefine-tuningmultimodal interaction

Authors

Louis Nisiotis, Aimilios Hadjiliasi

Abstract

As generative AI capabilities expand, AI-driven virtual worlds face a growing architectural challenge. Users interact through in-world interfaces in multimodal ways, yet their requests demand fundamentally different AI backend models and computational resources. Embedding these capabilities directly into virtual world systems reduces extensibility, complicates maintenance, and limits the ability to coordinate services distributed across edge and cloud infrastructure. This paper presents an SLM-based Agent Orchestration Gateway, a lightweight runtime coordination mechanism that decouples a virtual world client from heterogeneous AI backends through intent-driven service routing. An edge-deployed SLM classifies the semantic intent of each user prompt, a configurable service registry validates and resolves the routing decision, and the selected backend is invoked transparently, enabling new AI capabilities to be introduced in the virtual world without modifying the client application. The gateway is implemented and evaluated within the InterwovenXR virtual museum testbed. The evaluation shows that compact SLMs can serve as reliable intent routers on edge hardware, and that task-specific fine-tuning can transform sub-billion-parameter models into practical, low-latency routers. A layered configuration pairing a fine-tuned sub billion-parameter model as router with a larger SLM for conversational response generation is shown to be deployable on mid-range edge hardware and more efficient than delegating both responsibilities to a single model. The findings show that SLMs can support practical AI service orchestration in virtual worlds and the work contributes an evaluated architecture for scalable, extensible, and edge-supported AI interaction, enabling virtual agents become access points to distributed generative AI services.

View PDFOpen arXiv