Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models
2026-06-02 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors present a new method to create realistic and diverse graphs more efficiently than existing approaches. Their method arranges graph data into sequences using a special ordering, which lets the computer generate graphs faster and with less complexity. They also use a two-step training process to help the model avoid copying old graphs exactly and instead create new, valid ones. Tests show their approach produces novel and unique graphs without sacrificing accuracy.
graph generationautoregressive modelstopological orderingdiffusion modelssequence modelingLSTMgraph noveltymachine learning scalabilitydata augmentationcausal sequence models
Authors
Alessio Barboni, Massimiliano Lupo Pasini, Bishal Lakha, Edoardo Serra
Abstract
Generating realistic and diverse graphs is a key problem in machine learning, with applications in molecular discovery, circuit design, cybersecurity, and beyond. However, current graph generative models remain limited by scalability and novelty. Diffusion-based methods often require costly full-adjacency operations and long denoising chains, while many autoregressive and hybrid models have at least quadratic complexity. In addition, these models often imitate training graphs rather than generalize beyond them. We propose a lightweight autoregressive framework to address these issues. It uses a structure-guided topological ordering to serialize graphs into regular edge sequences, enabling near log-linear generation, and a two-phase training strategy that combines exploration-oriented augmentation with iterative refinement to reduce overfitting and promote controlled novelty. Experiments on molecular and non-molecular benchmarks show that our approach improves novelty while preserving high validity and uniqueness. The framework also supports both LSTM and Mamba-style causal sequence backbones, with large-memory accelerators enabling longer graph-sequence experiments beyond typical GPU limits.