Gate the Filter, Not the Message: Node-Channel Mixtures for Pre-Propagation GNNs
2026-06-01 • Machine Learning
Machine Learning
AI summaryⓘ
The authors study a type of graph neural networks called pre-propagation GNNs (PPGNNs) that do all their graph-related work before training to make things faster. They found that different ways of mixing information across graph nodes and feature channels matter more than just adding complexity to the mixing method. They propose a new model, FilterMoE, which learns to adaptively combine filters based on both nodes and channels at the same time. Their approach performs better than existing methods on many benchmark tests, showing that this joint adaptation is a useful strategy.
graph neural networkspre-propagationhop aggregationMLP aggregatorhop attentiongraph filtersChebyshev filtersmixture of expertsnode adaptationchannel adaptation
Authors
Zichao Yue, Zhiru Zhang
Abstract
Pre-propagation graph neural networks (PPGNNs) push all graph-dependent computation into a preprocessing step and train only on the resulting dense hop features, which makes them highly scalable. A puzzle in this regime is that more complex hop aggregators do not reliably outperform simpler ones: on many benchmarks, a plain MLP-based aggregator matches or beats hop-attention variants. We revisit this behavior from a graph-filter perspective. Over a precomputed diffusion basis, existing PPGNNs differ mainly in how filter coefficients are shared across nodes and feature channels, rather than simply in raw aggregator capacity. MLP-based architectures learn channel-dependent filters that are largely shared across nodes, while hop-attention-based architectures learn node-dependent mixtures that are largely shared across channels. This reveals a missing regime in standard PPGNN designs: joint node- and channel-adaptive filtering under the pre-propagation computational contract. We propose FilterMoE, a mixture-of-experts PPGNN in which a small bank of learnable Chebyshev filter experts is routed jointly over nodes and channels by a 3D gating tensor. Across eleven homophilic and heterophilic benchmarks, FilterMoE outperforms strong PPGNN baselines on nine datasets and ranks first on all three large-scale benchmarks, improving the average test score by 1.53 points. These results establish joint node-channel filter routing as a robust alternative to dataset-specific hop-aggregator selection.