Intent Lenses: Inferring Capture-Time Intent to Transform Opportunistic Photo Captures into Structured Visual Notes

2026-04-10Human-Computer Interaction

Human-Computer Interaction
AI summary

The authors studied how people take quick photos to save information but often end up with messy collections that don’t clearly show their original purpose. They created Intent Lenses, which use smart language models to guess what the user wanted when taking a photo and turn that into clear, interactive notes. These notes organize information in ways that make sense to the user and help them explore and understand their captured content better. They tested this with academics at conferences and found that the approach matched users' intentions and helped them think more deeply about their notes.

opportunistic photo captureautomatic note generationintent inferencelarge language modelsinteractive notessensemakingspatial canvasacademic conferencesstructured visual notesuser intent
Authors
Ashwin Ram, Aeneas Leon Sommer, Martin Schmitz, Jürgen Steimle
Abstract
Opportunistic photo capture (e.g., slides, exhibits, or artifacts) is a common strategy for preserving information encountered in information-rich environments for later revisitation. While fast and minimally disruptive, such photo collections rarely become meaningful notes. Existing automatic note-generation approaches provide some support but often produce generic summaries that fail to reflect what users intended to capture. We introduce Intent Lenses, a conceptual primitive for intent-mediated note generation and sensemaking. Intent Lenses reify users' capture-time intent inferred from captured information into reusable interactive objects that encode the function to perform, the information sources to focus on, and how results are represented at an appropriate level of detail. These lenses are dynamically generated using the reasoning capabilities of large language models. To investigate this concept, we instantiate Intent Lenses in the context of academic conference photos and present an interactive system that infers lenses from presentation captures to generate structured visual notes on a spatial canvas. Users can further add, link, and arrange lenses across captures to support exploration and sensemaking. A study with nine academics showed that intent-mediated notes aligned with users' expectations, providing effective overviews of their captures while facilitating deeper sensemaking.