Beyond Drug Discovery: The Nanotechnology Molecular Optimization (NMO) Benchmark

2026-06-29 • Machine Learning

Machine LearningArtificial IntelligenceComputational Engineering, Finance, and Science

AI summaryⓘ

The authors explain that current AI methods for designing molecules mostly focus on drug-like properties using big pharma datasets, which limits how well they work on different kinds of materials. To fix this, they created a new benchmark called NMO that uses quantum simulations to test AI models on nanotechnology challenges with tougher rules. They found that complex methods don’t always do better on these new tasks, so they developed a simpler approach with new ways to handle structural rules and avoid drug data bias. Their work shows that AI can help discover new scientific knowledge beyond just optimizing drugs.

Generative molecular designProxy benchmarksPretrained modelsQuantum simulationsNanotechnologyMolecular optimizationFitness landscapesStructural constraintsDomain-agnostic pretrainingMachine learning

Authors

Matthias Blaschke, Daniel Kienzle, Zsuzsanna Koczor-Benda, Julian Lorenz, Rainer Lienhart, Fabian Pauly

Abstract

Generative molecular design is shaped by simple proxy benchmarks for drug-like properties and models pretrained on large pharmaceutical datasets. This combination yields strong benchmark metrics but limits transferability to domains structurally distinct from drug discovery. To overcome this limitation and drive discovery toward real, scientifically grounded targets, we introduce the Nanotechnology Molecular Optimization (NMO) Benchmark, which bridges machine learning (ML) and quantum materials science. NMO acts simultaneously as a rigorous testbed for the ML community and a discovery engine for nanotechnology research. The suite replaces proxy oracles with quantum simulations and introduces strict protocols that prioritize scientific utility over leaderboard-oriented overfitting. The physics-based NMO tasks impose hard structural constraints and rugged fitness landscapes, posing fundamentally new requirements on generative models. Notably, advanced molecular optimization methods underperform much simpler approaches on the NMO tasks. We develop a new baseline method identifying the critical components to solve the NMO tasks, including a novel representation for modeling structural constraints and a domain-agnostic pretraining strategy to eliminate pharmaceutical dataset bias. Our results surpass state-of-the-art physical properties and reveal previously unknown structural motifs, offering new insights for the nanotechnology community and demonstrating that ML can drive genuine scientific discovery.

View PDFOpen arXiv