Viral Images: Identifying Reprintings within 1.5 Million Photographs in Chronicling America
2026-06-15 • Digital Libraries
Digital LibrariesInformation Retrieval
AI summaryⓘ
The authors studied millions of old American newspaper photos to find pictures that were printed multiple times across different papers. They used a computer method called CLIP to compare and group similar images without needing labels. This helped them find repeating photos and ads, showing how newspapers shared visual content a lot, much like viral text. They also created a website to help researchers explore these image groups easily.
Chronicling Americahistoric newspapersCLIPimage clusteringvisual culturenewspaper reprintingunsupervised learningcontrastive learningdigital humanities
Authors
Bruno Buccalon, Yueran Sun, Benjamin Charles Germain Lee
Abstract
Within the millions of digitized historic American newspapers in the Chronicling America initiative are tens of millions of photographs, illustrations, cartoons, and advertisements. Much of this visual culture is shared across newspaper titles and issues. Just as reprinted texts within these newspapers speak to the virality of textual content, so too does this reprinted visual culture speak to newspapers as sites of constant information circulation and exchange. In this paper, we introduce Viral Images, a project to identify reprintings within 1.5 million photographs in Chronicling America. For our analysis, we adopt the Newspaper Navigator dataset of extracted photographs from over 16 million pages in Chronicling America. We introduce an unsupervised method of identifying reprintings by leveraging contrastive language-image pretraining (CLIP) to embed these 1.5 million photographs and applying clustering to identify re-printed content. We detail our public interface, https://viral-images.org, which we designed in order to enable humanists to interactively browse and study these identified clusters. In addition, we analyze the identified clusters, uncovering a diversity of photographs and advertisements that have been circulated across different newspapers over time.