GPIC: A Giant Permissive Image Corpus for Visual Generation

2026-05-28 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial Intelligence

AI summaryⓘ

The authors created GPIC, a huge collection of about 28 trillion pixels from diverse internet images with captions generated by a top vision-language model. These images are carefully filtered for safety, have no duplicates, and come with licenses that allow both research and commercial use. They also set up a standard way to test image-generating models on this dataset and provided an example method to do so. Everything, including the data and code, is openly available online for others to use.

visual generative modelingimage datasetvision-language modelcaptioninglicensedataset filteringdeduplicationbenchmarkingflow matchingHugging Face

Authors

Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal, Michael Jang, Michael Poli, Juan Carlos Niebles, Justin Johnson, Jiajun Wu, Li Fei-Fei

Abstract

Studying scalable methods for visual generative modeling requires large, accessible, and stable datasets. We introduce GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels. GPIC comprises diverse internet images captioned by a state-of-the-art vision-language model, including 100M training, 200K validation, and 1M test examples. Moreover, all GPIC images are permissively licensed for both research and commercial use. GPIC is safety-filtered, deduplicated, and centrally hosted on Hugging Face. We provide a benchmarking protocol for generative modeling on GPIC. Finally, we provide a reference baseline for pixel-space flow matching on GPIC. Our dataset, benchmark, and models are available at https://huggingface.co/datasets/stanford-vision-lab/gpic. Evaluation toolkit and code are available at https://gpic.stanford.edu

View PDFOpen arXiv