Boosting Self-Consistency with Ranking
2026-06-03 • Computation and Language
Computation and Language
AI summaryⓘ
The authors improve the way large language models pick the best answer by turning the problem into a ranking task instead of just voting for the most common answer. They use a model called LambdaRank to score answers based on five features like how often answers appear and how consistent the reasoning is. Their method, called RISC, works better and more efficiently than older methods, especially on question answering tests. They also show that combining different features helps find the best answers more reliably.
large language modelsself-consistencyrankingLambdaRankanswer selectionsemantic centralityreasoning tracequestion answeringaccuracy-efficiency trade-off
Authors
Maria Marina, Daniil Moskovskiy, Sergey Pletenev, Mikhail Salnikov, Alexander Panchenko, Viktor Moskvoretskii
Abstract
Self-consistency improves large language models by sampling multiple reasoning paths and selecting the most frequent answer, but majority voting often fails to recover correct answers that are already present among the samples. We address this limitation with Ranking-Improved Self-Consistency (RISC), which reformulates answer selection in self-consistency as a ranking problem. Instead of relying on a single uncertainty or confidence signal, RISC uses a lightweight LambdaRank model to score candidate answers with five carefully designed features that capture answer frequency, semantic centrality, and reasoning-trace consistency. We evaluate RISC on three datasets under a range of test-time budgets. Across datasets, RISC consistently achieves a better accuracy-efficiency trade-off than standard self-consistency and strong baselines, with particularly large gains on question answering benchmarks. Further analysis shows that the proposed features are individually useful and, more importantly, complementary, highlighting the value of learning to combine multiple informative signals for test-time answer selection.