Verifier-Backed Hard Problem Generation for Mathematical Reasoning

2026-05-07Machine Learning

Machine LearningArtificial IntelligenceComputation and Language
AI summary

The authors address the problem that large language models (LLMs) are good at solving scientific and math problems but not at creating new, valid, and challenging ones. To fix this, they developed a new method called VHG, which adds a third role—a verifier—to check if the problems are valid and hard enough. This verifier works alongside the problem creator and solver to improve problem quality. They tested this approach on math tasks and found it works better than older methods.

Large Language ModelsProblem GenerationSelf-playVerifierProblem ValidityProblem DifficultyMathematical ReasoningIntegral CalculusReward Hacking
Authors
Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao
Abstract
Large Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifier-enhanced hard problem generation framework built upon three-party self-play. By integrating an independent verifier into the conventional setter-solver duality, our design constrains the setter's reward to be jointly determined by problem validity (evaluated by the verifier) and difficulty (assessed by the solver). We instantiate two verifier variants: a Hard symbolic verifier and a Soft LLM-based verifier, with evaluations conducted on indefinite integral tasks and general mathematical reasoning tasks. Experimental results show that VHG substantially outperforms all baseline methods by a clear margin.