Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model Architectures

2026-06-15Computation and Language

Computation and Language
AI summary

The authors investigated whether different large language model (LLM) architectures represent high-level ideas in similar structural ways. They found that while the models had only moderate similarity in their internal geometry, they showed nearly perfect similarity in how those concepts functioned across different tasks. To measure this, they developed a new method called contrastive-difference CKA that better distinguishes concept-specific patterns than standard techniques. Their findings held true across multiple concept types and architectures, suggesting some universal functional patterns despite geometric differences. They also suggest their method can help identify unusual model behaviors without needing additional training.

Large Language ModelsGeometric ConvergenceFunctional TransferContrastive-Difference CKAKernel AlignmentConcept RepresentationArchitectural UniversalityModel DiagnosticsCross-Architecture SimilarityHigh-Level Concepts
Authors
Xueping Gao
Abstract
Do different LLM architectures encode high-level concepts in structurally compatible ways? We systematically characterize a geometric-functional universality dissociation: across multiple concept domains and architectural families, moderate geometric convergence coexists with near-perfect functional transfer. Using contrastive-difference CKA (CKA_Delta), a training-free diagnostic that computes kernel alignment on per-sample contrastive differences, we isolate concept-specific convergence from generic similarity -- achieving significant discrimination where standard CKA cannot. The dissociation replicates across all six concept domains we test (five with p <= 0.017 geometric discrimination and safety as a converging-functional trend, p = 0.08), including two non-instruction concepts (code-vs-NL, reasoning-vs-recall) validated without system prompts; a single 70B--70B pair provides an observational note that universality may strengthen with scale, requiring replication with additional >=70B models. We position CKA_Delta as a practical regime classifier and architectural outlier detector (Gemma: d = 1.08, AUC = 0.79) rather than an absolute transfer-accuracy predictor, providing a training-free diagnostic for cross-architecture concept monitoring.