Representation Fréchet Loss for Visual Generation
2026-04-30 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors show that a measurement called Fréchet Distance, which is usually seen as hard to use for training AI models, can actually be used well if applied in a smart way called FD-loss. They separate the number of samples used to estimate the distance from the number used to compute gradients, making training more effective. Using this approach improves image quality and can simplify complex models into simpler ones without extra tricks. They also found that common quality scores can sometimes be misleading, so they propose a new metric that looks at multiple ways of representing images. Their work suggests new directions for improving AI model training and evaluation.
Fréchet DistanceFD-lossrepresentation spacegradient computationgenerator modelFID scoreInception feature spaceone-step generatormulti-step generatordistributional distance
Authors
Jiawei Yang, Zhengyang Geng, Xuan Ju, Yonglong Tian, Yue Wang
Abstract
We show that Fréchet Distance (FD), long considered impractical as a training objective, can in fact be effectively optimized in the representation space. Our idea is simple: decouple the population size for FD estimation (e.g., 50k) from the batch size for gradient computation (e.g., 1024). We term this approach FD-loss. Optimizing FD-loss reveals several surprising findings. First, post-training a base generator with FD-loss in different representation spaces consistently improves visual quality. Under the Inception feature space, a one-step generator achieves0.72 FID on ImageNet 256x256. Second, the same FD-loss repurposes multi-step generators into strong one-step generators without teacher distillation, adversarial training or per-sample targets. Third, FID can misrank visual quality: modern representations can yield better samples despite worse Inception FID. This motivates FDr$^k$, a multi-representation metric. We hope this work will encourage further exploration of distributional distances in diverse representation spaces as both training objectives and evaluation metrics for generative models.