See More, Match Better: Multi-Source Feature Fusion for Two-View Correspondence Learning

2026-06-08Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors propose TriMatch, a method to better find true matching points between two images by using different types of information such as shape, texture, and structure. They improve how these features work together so the system avoids confusing repetitive or plain areas that look similar but are actually different. Their approach also fine-tunes these matches step-by-step, making the identification of correct matches more accurate. Tests show that TriMatch works well and can apply to different kinds of images.

Two-view correspondenceInliers and outliersFeature extractionGeometric consistencySemantic featuresTexture featuresFeature fusionHierarchical refinementImage matchingCross-modal alignment
Authors
Xiaojie Li, Xin Jiang, Luanyuan Dai, Jinnan Yang, Yongdong Zhang, Zechao Li
Abstract
Two-view correspondence learning aims to distinguish true correspondences (inliers) from false ones (outliers) in image pairs by leveraging their underlying differences. Existing methods mainly rely on coordinate-based geometric consistency. However, they often struggle with pseudo-consistent outliers in scenes containing repetitive structures, textureless regions, or locally similar geometric patterns. To address this limitation, we propose TriMatch, a multi-source feature fusion framework for two-view correspondence learning, which consists of two parts: feature extraction and feature refinement. In feature extraction, TriMatch jointly extracts geometric, texture semantic, and structural semantic features to provide complementary evidence for correspondence discrimination. To bridge the gap between semantic and geometric features, texture and structural semantic features are aligned with geometric features through dedicated Texture-Geometric Alignment and Structural-Geometric Alignment modules, respectively. We further introduce a Semantic-Guided Correspondence Modulation module, which modulates geometric features using semantic information to suppress geometrically plausible but semantically inconsistent correspondences. In feature refinement, a Hierarchical Semantic-Enhanced Correspondence Refinement strategy progressively models correspondence dependencies and recalibrates multi-context feature responses, enabling more reliable inlier-outlier discrimination. Extensive experiments demonstrate the effectiveness, robustness, and generalization capability of TriMatch.