Does Traversal Order Matter? A Systematic Study of Tree Traversal Methods in Transformer Grammars

2026-06-15 • Computation and Language

Computation and Language

AI summaryⓘ

The authors studied how different ways of turning syntactic tree structures into sequences affect Transformer Grammars, which are models that use these trees to better understand language. They tested traditional Depth-First Traversal, Breadth-First Traversal, and a new hybrid method called Production-Rule Traversal. By comparing these methods across tasks like language modeling, grammar understanding, and summarization, they found trade-offs between focusing on detailed nested structure versus broad global context. Their work helps guide how to design these models for different language tasks.

Transformer Grammarssyntactic treesDepth-First TraversalBreadth-First TraversalProduction-Rule Traversallanguage modelingsyntactic generalizationtree linearizationtransformer models

Authors

Zongru Liu, Pengyu Ji, Pengcheng Wang, Kewei Tu

Abstract

Transformer Grammars (TGs) enhance language modeling by incorporating syntactic tree structures. Despite the potentially significant impact on model performance of how syntactic trees are linearized in TGs, existing studies rely solely on Depth-First Traversal (DFT) for linearization. In this paper, we expand the traversal design space by exploring Breadth-First Traversal (BFT) and a novel hybrid traversal strategy, Production-Rule Traversal (PRT), which combines the structural lookahead of BFT with the early lexical generation of DFT. We integrate these traversal methods with varying tree configurations and masking strategies, and empirically evaluate their performance on language modeling, syntactic generalization and summarization. We reveal the inherent trade-offs between nested composition and global lookahead, providing actionable recommendations for designing task-aware Transformer Grammars.

View PDFOpen arXiv