Predicate Importance Estimation and Decoupled Rationale-Score Distillation for Entity Alignment
2026-06-22 • Computation and Language
Computation and Language
AI summaryⓘ
The authors focus on making it easier to combine different knowledge graphs (KGs) by linking matching entities, a challenge when simple word matching fails. They created a new dataset and two methods: one (PIE) that better understands connections between entities by focusing on the roles of relationships, and another (DRSD) that teaches a smaller language model to decide if entities match by learning from a bigger model's detailed explanations. Their approach improves entity alignment and allows the system to flag uncertain matches for human checking, making the process more reliable in real-world settings.
Knowledge GraphsEntity AlignmentLarge Language ModelsPredicate ImportanceEmbeddingDistillationConfidence EstimationHuman-in-the-loopPseudo-labelingRationale
Authors
Keunha Kim, Yoonjin Jang, Hyeon-gu Lee, Sihyung Kim, Youngjoong Ko
Abstract
Knowledge graphs (KGs) are increasingly used as structured context for Large Language Models (LLMs), but industrial KG-RAG systems often need to integrate public and domain-specific KGs constructed from heterogeneous databases. This integration relies on Entity Alignment (EA), where lexical matching alone is insufficient under predicate-name variation and incomplete local neighborhoods. We address EA for KG integration by constructing a pairwise EA dataset and proposing two complementary modules: Predicate Importance Estimation (PIE) and Decoupled Rationale-Score Distillation (DRSD). PIE is a compact embedding-based approach that removes the subject information from each 1-hop triple, encodes the resulting subjectless triples, and aggregates them with learnable predicate-importance weights to build predicate-aware entity embeddings. DRSD trains a distilled small language model (SLM) with pseudo-answers produced by a teacher LLM through distinct prompts. By converting binary EA labels into text-based supervision and decoupling confidence-score estimation from label-consistent rationales, DRSD enables the SLM to learn task-specific reasoning while retaining a less label-biased confidence signal. Experiments show that PIE and DRSD improve EA classification. Moreover, because DRSD decouples confidence-score estimation from the decision, a discrepancy between the two flags an uncertain prediction for human review, thereby enabling a practical discrepancy between automatic acceptance and human-in-the-loop verification.