BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

2026-05-25Computation and Language

Computation and LanguageArtificial IntelligenceMachine Learning
AI summary

The authors address challenges in creating detailed step-by-step reasoning data for training large language models. They introduce the BC Protocol, a method pairing a domain expert with a knowledge engineer to better capture the expert's implicit thought process. They also propose a model for participant traits that affect data quality and emphasize selecting the right people over redesigning processes. In tests with narrative fiction, their method produced reasoning chains that judges rated as much more natural than those written solo by experts.

chain-of-thoughtlarge language modelsdual-expert elicitationcrystallized intelligencefluid intelligenceParticipant Aptitude ModelCalibrated IgnoranceSelection-over-Prescriptionnarrative fictionreasoning process
Authors
Bo Zou, Chao Xu
Abstract
High-quality expert chain-of-thought (CoT) data is one of the core bottlenecks in large language model (LLM) post-training. Existing data production methods each have structural limitations: crowdsourced annotation lacks deep reasoning paths; expert solo writing is constrained by the "expert blind spot" -- experts structurally skip reasoning steps they consider obvious; RLHF only produces preference signals rather than reasoning chains. This paper proposes the BC Protocol -- a structured dual-expert elicitation method for LLM post-training data production. The method carefully pairs a domain expert (crystallized intelligence) with a knowledge engineer (fluid intelligence), systematically externalizing the expert's implicit judgments as natural language reasoning chains. We introduce the Participant Aptitude Model, which defines six participant characteristic dimensions that affect elicitation quality. "Calibrated Ignorance" is an original concept proposed in this paper. We further propose "Selection-over-Prescription" as a methodological principle: for implicit knowledge elicitation tasks, investing quality-control resources in personnel selection yields a higher return than investing the same resources in process design. In a controlled experiment in the narrative fiction domain, we directly compared CoT produced by BC Protocol dual dialogue (Group A, (n=20)) against CoT written independently by the same domain expert (Group B, (n=20)). Three cross-vendor judge models -- GPT-4o, Claude Opus 4.5, and Gemini 2.5 Pro -- conducted blind evaluation across five dimensions (600 ratings total). Results show that the BC Protocol achieves an overwhelming advantage in "naturalness of reasoning process" (Group A mean 4.80 vs. Group B mean 1.30, (p=2.4\times10^{-8}), Cliff's (δ=1.0)).