Creative Collision: Directorial Persona Steering and Competition in Large Language Models

2026-06-15Computation and Language

Computation and LanguageMachine Learning
AI summary

The authors explore a new way to guide language models by mixing two opposite personality styles at the same time, calling this a 'Creative Collision.' They use example writings from filmmakers Steven Spielberg and Martin Scorsese, representing optimistic and dark moral tones, and blend them to see how the model's output changes. They find that Spielberg's style tends to dominate, mixing the two styles can actually make the output more coherent, and both styles are linked to a certain layer in the model. These insights help understand how language models handle conflicting ideas and how to better control them for creative writing.

activation steeringsemantic directiontransformer residual streampersona vectorsmoral valencelayer localizationtext generation coherencemean-difference contrastdecoder-only transformercontrollable generation
Authors
Subramanyam Sahoo, Justin Shenk
Abstract
Activation steering has emerged as a powerful tool for shaping the behaviour of large language models at inference time, yet most prior work injects a \emph{single} semantic direction into the residual stream. We study the richer setting in which two semantically opposing steering vectors are superimposed -- a regime we call \textbf{Creative Collision}. Concretely, we construct directorial persona vectors for Steven Spielberg (optimistic, redemptive moral valence) and Martin Scorsese (dark, morally ambiguous) via mean-difference activation contrast on curated screenplay-derived corpora, then interpolate between them with a scalar mixing parameter $α\in [0,1]$ and a steering coefficient $λ$. Across five evaluation axes -- moral valence, generation coherence, surface style, directional dominance, and vector geometry -- three principal findings emerge: (i)~Spielberg's representational signature exhibits robust \emph{directional dominance}, suppressing Scorsese's moral influence across almost the entire interpolation range; (ii)~intermediate collision points paradoxically \emph{improve} generation coherence relative to pure single-director steering at high $λ$; and (iii)~both personas localise maximally to layer~28 of a 40-layer decoder-only transformer, revealing a shared \emph{moral-tone substrate}. These results illuminate the geometry of competing semantic directions in transformer residual streams and have direct implications for controllable creative generation and value-aligned narrative synthesis.