Compliance-Scored Best-of-N Guardrail Orchestration for Multimodal Document Generation in Payments Dispute Defense

2026-06-01 • Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster ComputingArtificial IntelligenceComputation and LanguageMachine Learning

AI summaryⓘ

The authors developed a system to create important business documents, like dispute reports and compliance notices, more reliably and quickly. Instead of using separate tools for privacy checks, content rules, and formatting, they built one integrated layer that generates multiple versions, scores them against rules, and picks the best one early. Their system runs efficiently, achieving about 91% compliance in under 20 seconds. The authors tested it on payment dispute summaries, finding it improved certain success rates by around 7-11% compared to older methods. They also provide detailed reports on how they measure quality and ensure consistent results.

enterprise document generationschema validationPII detectioncontent moderationcompliance scoringmulti-candidate generationlow-latency systemspayment dispute summariesResponsible AIOCR

Authors

Nataraj Agaram Sundar, Tejas Morabia

Abstract

High-stakes enterprise document generation, including financial dispute narratives, compliance notices, and audit summaries, demands schema correctness, policy compliance, and low-latency operation at scale. Prior to a unified guardrail layer, production systems often stitched together separate PII redaction, content moderation, and format validation steps, leading to fragmented logic, slower request paths, and higher operational cost. We present a guardrail orchestration layer for text and image inputs that couples multi-candidate generation with an explicit compliance score used for early exit. The framework runs configurable parallel generation heads, scores candidates against weighted guardrails including PII detection, content moderation, schema constraints, and domain rules, and returns the best-scoring output with selection metadata. The available operational readout reports 5 attempts within 20 seconds and 91 percent compliance. For payments dispute defense summaries, we analyze aggregate operational scenario readouts rather than a randomized A/B test. Variable cohorts show higher count win rates than controls overall, 301/659 versus 536/1548, corresponding to +11.0 percentage points with 95 percent confidence interval [6.6, 15.5] and p < 0.001, and for adjusted item-not-received cases, +7.5 percentage points with 95 percent confidence interval [0.2, 15.7] and p = 0.045. Fraud and local evidence-ranking deltas are directionally positive but not statistically significant from the aggregate count data. We also report reviewer-calibrated Responsible-AI evidence-quality signals from 770 generated-evidence reviews and a 70-case OCR slice, and document the reproducibility boundary through the request interface, scoring logic, pseudocode, and operational evidence boundary.

View PDFOpen arXiv