Personal Salience: Highlighting Is Social, but Individuality Lives in Selection

2026-06-08Information Retrieval

Information RetrievalComputation and LanguageHuman-Computer InteractionSocial and Information Networks
AI summary

The authors studied how much you can learn about a person from the passages they highlight in texts that many people read. They found that what most people highlight is very similar—a strong group pattern, called crowd salience—but the small differences that show individual tastes are more about which important passages a person chooses rather than which passages stand out overall. The personal uniqueness is subtle and mostly reflects consistent topic preferences. Their analysis carefully avoided common mistakes that make personalization seem stronger than it really is.

social highlighterspersonal saliencecrowd saliencegeneric salienceco-readershiptopic decompositionembedding scorerhistory-conditioningpersonalizationnaturalistic traces
Authors
Kazuki Nakayashiki, Keisuke Watanabe
Abstract
Social highlighters let people mark passages that matter to them. We ask how much of an individual is recoverable from these naturalistic traces, using a co-readership identity control (the same document highlighted by many users) that holds document and topic fixed and asks whether a person's own history predicts their marks better than another reader's does. We separate generic salience (structure), crowd salience (what others marked), and personal salience (the individual residual). First, highlighting is social: which sentences you mark is predicted far better by the crowd than by structure or by a personal model, and even a well-estimated crowd, an information-privileged baseline that sees others' marks on the same document, beats a frontier LLM twin built from your other-document history; the within-document personal signal is at most a whisper (own-vs-other gap +0.017 by an embedding scorer, small but significant). Second, in sharp contrast, individuality lives in selection: asked which of the already-salient passages are yours, your own history is a strong, leakage-free predictor (gap +0.14). A topic decomposition shows this is largely stable thematic preference: it shrinks ~6-8x against a topically-matched peer, and a thin residual cannot be separated from finer topic. The non-obvious part is an asymmetry: under the same scorer the individual signal is ~6-8x weaker in salience than in selection. Methodologically, naive history-conditioning evaluations leak (the target's own marks enter the profile in ~42% of pairs, inflating personal scores by up to +0.15 AP) and small crowds overstate personalization; our results are leakage-free, use a dense crowd, and a model-matched control. Highlights carry a genuine individual signature, but a thin layer over a strong shared one, surfacing far more in which salient things a person selects than in what is salient.