PHAGE: Patent Heterogeneous Attention-Guided Graph Encoder for Representation Learning
2026-05-11 • Computation and Language
Computation and Language
AI summaryⓘ
The authors studied how patent claims are connected in a hierarchy, where some claims depend on and refine earlier ones. They found that existing methods ignore this structure by treating claims just as plain text. Their new method, PHAGE, keeps track of different types of connections between claims and teaches a model to pay attention to these relationships at both the claim and word levels. This approach helps the model better understand patent documents and improves tasks like classifying and grouping patents.
patent claimsdependency structureself-attentionTransformersgraph encodingheterogeneous edgescontrastive learningtoken-level attentionlegal citations
Authors
Yongmin Yoo, Qiongkai Xu, Zhangkai Wu, Longbing Cao
Abstract
Patent claims form a directed dependency structure in which dependent claims inherit and refine the scope of earlier claims; however, existing patent encoders linearize claims as text and discard this hierarchy. Directly encoding this structure into self-attention poses two challenges: claim dependencies mix relation types that differ in semantics and extraction reliability, and the dependency graph is defined over claims while Transformers attend over tokens. PHAGE addresses the first challenge through a deterministic graph construction pipeline that separates near-deterministic legal citations from noisier rule-based technical relations, preserving type distinctions as heterogeneous edges. It addresses the second through a connectivity mask and learnable relation-aware biases that lift claim-level topology into token-level attention, allowing the encoder to differentially weight each relation type. A dual-granularity contrastive objective then aligns representations with both inter-patent taxonomy and intra-patent topology. PHAGE outperforms all baselines on classification, retrieval, and clustering, showing that intra-document claim topology is a stronger inductive bias than inter-document structure and that this bias persists in the encoder weights after training.