The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers

2026-06-15Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionArtificial IntelligenceMachine Learning
AI summary

The authors explored how different image classifiers recognize images by focusing on two parts of the image’s Fourier transform: phase and magnitude. They found that in most models, the phase carries the important identity information, while the magnitude isn't crucial for the model's decision. One model, ResNet-50, appeared different at first, but further testing showed it also uses phase information, just in a subtler way affected by its internal operations. Overall, the study shows that different architectures rely on phase signals for understanding images but represent that information differently inside their layers.

Fourier transformphasemagnitudeimage classifiershidden layersReLUResNet-50ViTtransplanting featurestexture-shape gap
Authors
Alper Yıldırım
Abstract
Oppenheim and Lim (1981) showed that natural images stay recognizable when reconstructed from their Fourier phase alone, while the magnitude carries little of their identity. We ask whether trained image classifiers reproduce this asymmetry inside their hidden layers, and we test it causally: given two images, we transplant the phase of one onto the magnitude of the other at a chosen layer and record which image the prediction follows. In PRISM2D, GFNet, and ViT-B/16 the prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy, so identity rides on phase while image-specific magnitude is largely dispensable to the readout. ResNet-50 at first seems to break the pattern, because transplanting sign after its ReLUs does nothing; a fair intervention before the ReLU reveals a strong latent sign code in the late blocks, and a DC-only control shows the readout consumes a channel-wise spatial average. Controls rule out the trivial case in which magnitude simply stops depending on the image. The architectures therefore share a phase/sign identity code but expose it in different bases, set by rectification and readout geometry, which gives a mechanistic account of the texture--shape gap between CNNs and attention models.