Graph2Idea:Retrieval-Augmented Scientific Idea Generation with Graph-Structured Contexts

2026-06-08 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors created Graph2Idea, a method that helps computers come up with new scientific research ideas by using knowledge graphs built from related research papers instead of just flat text like abstracts. This approach organizes information into clear relationships, making it easier for the system to see connections between different studies. Their method then uses this structure to guide the idea generation process in two steps: finding promising directions and creating ideas based on those connections. Tests showed Graph2Idea produces more novel, high-quality, and feasible ideas than other methods. This suggests that using graphs to structure scientific knowledge can improve how language models generate research ideas.

Large Language ModelsKnowledge GraphScientific Idea GenerationRetrieval-Augmented GenerationStructured Knowledge TriplesNoveltyFeasibilityResearch BenchmarkText SummarizationCross-paper Relations

Authors

Xu Li, Hanzhe Tu, Xun Han

Abstract

Generating novel, feasible, and high-quality research ideas is an important yet challenging task in scientific discovery.Recent Large Language Model (LLM)-based methods often ground idea generation with retrieved literature, but the retrieved evidence is usually provided as flat text, such as titles, abstracts, or summaries. Such flat contexts may contain redundant or weakly relevant information, while making cross-paper relations among problems, methods, mechanisms, and findings difficult to identify and trace.To address this challenge, we propose Graph2Idea, a knowledge graph-guided framework for retrieval-augmented scientific idea generation.Graph2Idea first retrieves papers according to the input topic, transforms them into structured knowledge triples, and dynamically constructs a target-centered knowledge graph to make literature relations explicit.It then extracts compact graph-derived contexts that retain target-relevant relational evidence while reducing noisy textual input.Based on these contexts, a two-stage generation process first identifies promising research directions and then guides the LLM to synthesize candidate ideas from graph-grounded evidence.Experiments on a scientific idea generation benchmark show that Graph2Idea outperforms representative baselines under the automatic evaluation protocol.Compared with the strongest baseline scores, it improves Novelty from 0.45 to 0.52, Quality from 0.24 to 0.29, and Feasibility from 0.22 to 0.28.These results suggest that graph-structured evidence helps LLMs generate research ideas through more explicit, compact, and traceable recombination of prior scientific knowledge.

View PDFOpen arXiv