Neural Router: Semantic Content Matching for Agentic AI

2026-05-25 • Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster ComputingComputation and LanguageInformation RetrievalNetworking and Internet Architecture

AI summaryⓘ

The authors studied how large language models (LLMs) can improve content matching in systems that send relevant information to agents across different computing devices. They analyzed when compressing data helps reduce model use and when the models start making too many mistakes, depending on their size and training. Their results show that only very large models manage big data sets well, and choosing the right model is more important than tweaking the system. They also propose algorithms and a method to automatically pick the best model based on quality.

Large language modelsContent-based publish/subscribeSemantic matchingContext windowCompression pipelineMulti-label retrievalEdge-cloud computingModel discrimination capacityModel selectionQuality-of-Experience

Authors

Lauri Lovén, Abhishek Kumar, Alexander Engelhardt, Alaa Saleh, Roberto Morabito, Xiaoli Liu, Naser Hossein Motlagh, Sasu Tarkoma

Abstract

Large language models (LLMs) can serve as the semantic-matching engine of a content-based publish/subscribe broker for agentic AI across the edge-cloud computing continuum, bridging the vocabulary and modality gaps that defeat keyword and embedding filters. Framed as offline multi-label retrieval over three public datasets spanning social-media, legal, and smart-home sensor domains (six LLMs, seven baselines), our central contribution is a two-crossover cost-accuracy characterisation: an analytical context-window crossover below which a CoverAndMerge compression pipeline reduces LLM invocations, and an empirical discrimination-capacity crossover above which matching accuracy collapses independently of context budget, by a model-dependent factor of parameter count and training generation. Two findings carry practical weight: above the discrimination crossover, compression cannot recover accuracy and only frontier-scale models clear large subscription sets; and there backend choice dominates configuration choice, so model selection, not pipeline tuning, is the primary operator lever. We accompany this with three composable algorithms and a per-cluster Quality-of-Experience framework for autonomic LLM-tier selection.

View PDFOpen arXiv