MDGMIX: Boundary-Aware Subgraph Mixing for Multi-Domain Graph Pre-Training

2026-05-25Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors study how to make graph models that work well across different types of data (domains) without needing lots of computing power. They found that many parts of the training data are redundant, so they designed a method called MDGMIX that mixes important parts of graphs from different domains to learn better. MDGMIX separates shared and specific patterns using special loss functions and uses a simple way to apply learned knowledge to new tasks. Their experiments show MDGMIX works better and faster than existing methods on small-data classification problems.

graph pre-trainingmulti-domain graphssubgraph mixingdomain discriminationtransfer learningfew-shot classificationcomputational efficiencydomain adaptationhierarchical discriminationprompt weighting
Authors
Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao, Xinyan Huang
Abstract
Multi-domain graph pre-training is a crucial step in constructing foundational graph models with cross-domain generalization capabilities. However, existing methods predominantly rely on jointly training all source domain graphs, resulting in high computational costs. Furthermore, it remains unclear whether all source domain graph data contribute equally to effective transfer. This paper empirically reveals significant data redundancy in multi-domain graph pre-training. Based on this finding, we propose the Multi-domain Graph Pre-training Framework, MDGMIX, which combines boundary-aware subgraph mixing with hierarchical discrimination. By selecting boundary nodes to construct challenging mixed-domain subgraphs, MDGMIX employs coarse-grained domain discrimination and fine-grained domain decomposition losses to decouple shared patterns from domain-specific patterns. During adaptation, MDGMIX employs a lightweight prompt weighting mechanism to transfer source domain knowledge. Extensive experiments demonstrate that MDGMIX consistently outperforms strong baselines in few-shot classification tasks while exhibiting superior time and memory efficiency. The code is available at: https://github.com/zhengziyu77/MDGMIX.