Rethinking Molecular Graph Backdoors under Chemistry-aware Admission

2026-06-22 • Machine Learning

Machine LearningArtificial Intelligence

AI summaryⓘ

The authors point out that many attacks trying to trick molecular graph neural networks (GNNs) by changing their input graphs don't work well because real systems check if the molecular data is valid and consistent first. They created ChemGuard, a system that only accepts molecular data passing these chemical sanity tests, showing many attacks fail at this step. However, they also developed ChemBack, a smarter attack that creates chemically valid and convincing molecules which can sneak past these checks and still fool the models. Their work shows that while checks stop some simple attacks, more sophisticated, chemistry-aware attacks remain a risk.

Backdoor attacksGraph neural networks (GNNs)Molecular graphSanitizationCanonicalizationChemGuardChemBackTanimoto similarityMolecular fingerprintsPoisoning attack

Authors

Thinh T. H. Nguyen, Sze Jue Yang, Khoa D. Doan, Chee Seng Chan, Kok-Seng Wong

Abstract

Backdoor attacks on molecular graph neural networks (GNNs) are typically evaluated as abstract graph edits, but real molecular learning pipelines do not train on arbitrary graphs. Molecular records must first survive parsing, sanitization, canonicalization, and graph-string consistency checks. We formalize this overlooked admission stage as ChemGuard, an operational protocol for testing whether a submitted molecular record can enter a realistic learning pipeline, while complementing existing defenses. ChemGuard admits a record only when its molecular string is sanitizable and the graph reconstructed from that string matches the submitted molecular graph. Under this operational view, many existing graph-based backdoors lose much of their apparent efficacy because their poisons are chemically invalid or representation-inconsistent. We then show that admission checks alone are insufficient to rule out molecular backdoors. We propose ChemBack, an admission-aware molecular backdoor attack that constructs chemically feasible motif-anchor attachments and ranks admitted candidates by fingerprint-based Tanimoto similarity to clean target-class molecules. ChemBack is model-free during trigger selection, using molecular structures, target labels, fingerprints, and public validity checks, but no victim model, surrogate GNN, learned embedding, gradient, logit, or training-code access. Across molecular benchmarks, validators, architectures, and defenses, \textbf{ChemBack} achieves high attack success with fully admitted poisons while preserving clean accuracy. Our results reveal a two-sided lesson, chemistry-aware admission suppresses many graph-only backdoors, yet chemically valid and target-aligned molecular backdoors remain a practical threat.

View PDFOpen arXiv