TTFT-Aware Graph Chain-of-Thought:Distance-Indexed Neural A* for Low-Hallucination Multi-Hop Medical Reasoning

2026-06-22Artificial Intelligence

Artificial Intelligence
AI summary

The authors address issues with clinical language models making mistakes or giving unclear answers by developing a system called GraphRAG. This system uses a large medical knowledge graph and guides the model to find reliable, step-by-step reasoning paths supported by exact distance checks and smart heuristics. Their approach helps generate answers faster and more accurately, reducing errors when answering fertility-related questions. Overall, the authors provide a practical method to make AI medical reasoning more trustworthy and explainable for real-world use.

GraphRAGknowledge graphchain-of-thoughtPruned Landmark Labeling (PLL)AStarNet heuristicretrieval-augmented generation (RAG)latencyhallucinationsclinical language modelssemantic types
Authors
Bechir Dardouri, Kaïs Zhioua, Yassine Msaddak
Abstract
Hallucinations and opaque reasoning remain unacceptable failure modes for clinical LLMs. We present a production-grade GraphRAG stack that constrains answers to verifiable graph chain-of-thought paths in a heterogeneous, ~700K-node medical knowledge graph powering a fertility assistant. The core idea is targeted navigation: a directed Pruned Landmark Labeling (PLL) oracle provides exact distances for sub-millisecond feasibility checks and simple-path enumeration, while a lightweight AStarNet heuristic operates strictly within the PLL corridor to prioritize clinically plausible expansions. We score and pack a small, diverse set of paths (CUI/semantic-type overlap, length prior, provenance priors) to condition generation, yielding compact prompts and improved Time to First Token (TTFT). On fertility-focused queries, the hybrid (PLL+AStarNet) establishes a better latency/recall Pareto frontier than text-only RAG and single-component baselines, lowers TTFT, and reduces clinician-audited hallucinations while preserving explanation clarity. The result is a practical recipe for explainable, low-hallucination multi-hop medical reasoning ready for real-world deployment.