Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention
2026-04-09 • Computation and Language
Computation and Language
AI summaryⓘ
The authors introduce Kathleen, a lightweight text classification model that works directly on raw bytes without needing tokenizers or attention mechanisms, using only 733K parameters. Kathleen uses three new ideas: special sinusoid convolutions for efficient processing, a compact encoder that replaces large embedding tables, and a simple sinusoidal function called PhaseHarmonics that greatly improves accuracy. Their experiments show these frequency-based methods perform better and use fewer resources than more complex models. Kathleen achieves strong accuracy on standard text datasets and handles long text sequences more efficiently than typical Transformer models.
text classificationUTF-8 bytesfrequency-domain processingconvolutionembedding tablessinusoidal functionsPhaseHarmonicsFFT (Fast Fourier Transform)sequence modelingTransformer
Authors
George Fountzoulas
Abstract
We present Kathleen, a text classification architecture that operates directly on raw UTF-8 bytes using frequency-domain processing -- requiring no tokenizer, no attention mechanism, and only 733K parameters. Kathleen introduces three novel components: (1) RecurrentOscillatorBanks -- damped sinusoid convolutions with temporal memory for O(L) sequence processing; (2) an FFT-Rotate Wavetable Encoder that maps all 256 byte values using a single learnable vector (256 floats), replacing conventional embedding tables (65K parameters) while improving accuracy; (3) PhaseHarmonics -- a sinusoidal non-linearity with just 6 learnable phase parameters that our ablation identifies as the single most impactful component (+2.6% accuracy, <0.001% of model parameters). Through comprehensive ablation of a 1.8M-parameter predecessor, we show that frequency-domain components systematically outperform complex cognitive architectures: removing a 560K-parameter bio-inspired framework costs only -0.2%, while removing the 6-parameter PhaseHarmonics costs -2.6%. The resulting Kathleen-Clean achieves 88.6% on IMDB, 92.3% on AG News, and 83.3% on SST-2 -- outperforming a tokenized counterpart with 16x more parameters on IMDB (+1.6%) and AG News (+2.1%). Kathleen processes sequences in O(L) time and memory, enabling byte-level operation at sequence lengths where O(L^2) Transformers exhaust GPU memory.