Step-TP: A Grounded, Step-Level Dataset with Chain-of-Thought Reasoning for LLM-Guided Tensor Program Optimization
2026-05-25 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors created Step-TP, a new dataset to help large language models better optimize tensor programs step-by-step instead of just final results. Their dataset breaks down complex optimizations into small, clear steps with reasoning that can be checked and understood. This makes it easier for language models to learn precise decisions in complicated optimization processes. Step-TP uses a special way to represent programs that is efficient and verifiable, improving the reliability of multi-step program improvements.
tensor program optimizationlarge language modelsintermediate representationchain-of-thought reasoningTVM TIRcomposable optimizationstep-level supervisioncombinatorial optimizationprogram transformations
Authors
Mengfan Liu, Da Zheng, Junwei Su, Chuan Wu
Abstract
Despite the strong reasoning capabilities of large language models (LLMs), optimizing the execution efficiency of tensor programs remains challenging due to the need for precise, composable transformation decisions. Recent LLM-guided approaches frame tensor program optimization as an iterative decision process, but existing datasets provide only end-to-end optimized program pairs using token-inefficient representations, lacking verifiable step-level supervision and interpretability. As a result, LLMs struggle to make reliable single-step decisions in large combinatorial optimization spaces. We introduce Step-TP, a post-training dataset for tensor program optimization that provides grounded, atomic, step-level supervision with structured chain-of-thought (CoT) reasoning. Step-TP forms a closed reasoning loop over intermediate program states, enabling reliable multi-step optimization rather than outcome imitation. Its design is guided by four principles: (i) a token-efficient, verifiable intermediate representation (IR) that deterministically lowers to TVM TIR; (ii) atomic and composable optimization strategies that decompose complex trajectories into interpretable single-step decisions; (iii) structured CoT supervision coupled with explicit IR-to-IR state transitions; and (iv) strategy filtering to balance coverage while preventing shortcut exploitation. The dataset and implementation are available at a GitHub link, https://github.com/LIUMENGFAN-gif/StepTP.