Closing the Prior-Posterior Loop: Self-Reflective Molecular Design with Analysis-Driven LLM Iteration

2026-06-08 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors show that large language models (LLMs) can design molecules more precisely when they receive detailed scientific feedback instead of just a simple score. Instead of guessing and checking, their system uses real chemistry data like electron behavior to help the model learn why a molecule works or not. This method leads to very accurate results in predicting molecule properties like energy gaps and electric moments. The authors found this approach works well across different LLMs, making molecular design more like understanding a puzzle than random trial-and-error.

large language modelsmolecular designHOMO-LUMO gapphysicochemical rationaleorbital energiesatomic chargeselectron densitiesdipole momentretrieval-augmented generationself-reflection module

Authors

Junyi Gong, Zijie Qiu, Ben Zhong Tang

Abstract

Can a general-purpose large language model design molecules with the precision of a seasoned chemist? Current LLM-based frameworks answer this question with scalar feedback loops-generate, score, reject-that amount to informed trial-and-error. Here we show that replacing a single number with the full physicochemical rationale from first-principles calculations transforms the LLM from a stochastic sampler into a causal reasoner. Our system couples retrieval-augmented generation with a self-reflection module that feeds orbital energies, atomic charges, and electron densities-rather than compressed scores-back into the design loop. On HOMO-LUMO gap targets from 1.0 to 5.0 eV, this structure-property-relationship (SPR) reflection achieves a deviation as low as 0.0003 eV and a 100% success rate on moderate tasks, decisively outperforming scalar-feedback and non-reflective baselines. The framework generalizes seamlessly to dipole-moment design and proves robust across five distinct LLM backbones. These results establish a new paradigm: when the model understands not only that a molecule fails, but why, iterative molecular design becomes genuinely mechanistic.

View PDFOpen arXiv