RaysUp: Ultra-light Universal Feature Upsampling via Geometry-Aware Ray Representation

2026-06-22Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors created RaysUp, a lightweight method to increase the resolution of feature maps from vision models without losing important details. Unlike older methods that either blur details or need complicated retraining, RaysUp uses a new way of thinking about the image’s geometry to guide the upsampling process. It introduces special components like a guidance encoder and ray-based positional encoding to improve accuracy and flexibility. Their tests show that RaysUp works well on different tasks, is much faster, and uses fewer parameters than previous approaches.

Vision Foundation ModelsFeature UpsamplingCross-AttentionPlucker CoordinatesPositional EncodingDense PredictionGeometry-Aware ProcessingNeural NetworksImage ResolutionSemantic Representation
Authors
Yuchuan Ding, Linfei Li, Lin Zhang, Ying Shen
Abstract
Pre-trained Vision Foundation Models (VFMs) have become central to modern computer vision due to their powerful semantic representations and strong generalization ability. However, their patchified or pooled outputs are inherently low-resolution, limiting their effectiveness in tasks requiring fine-grained, pixel-level reasoning. Existing feature upsampling approaches either degrade semantic fidelity or rely on VFM-specific retraining and heavy architectures, hindering efficiency and scalability. To address these challenges, we propose RaysUp, an ultra-lightweight, task-agnostic, and VFM-agnostic feature upsampling framework that reconstructs high-resolution feature maps at arbitrary resolutions. Unlike conventional 2D interpolation or attention-based schemes, RaysUp lifts feature reconstruction into a geometry-aware ray domain. Specifically, we introduce a Spatially Decoupled Guidance Encoder for direction-aware guidance encoding, an Any-Resolution Cross-Attention mechanism for resolution-flexible reconstruction, and a novel Ray Positional Encoding (RayPE) that injects implicit 3D geometric priors via 6D Plucker ray coordinates. Finally, a Geometry-Aware Neighborhood Attention module further ensures content-adaptive bilateral aggregation while preserving geometric consistency. Extensive experiments across diverse dense prediction tasks demonstrate that RaysUp achieves state-of-the-art performance while using only 16% of the parameters of AnyUp and delivering approximately 7x faster inference. These results highlight a substantially improved accuracy-efficiency trade-off and establish RaysUp as a practical and scalable solution for universal feature upsampling. Code is available at https://github.com/MAP-RaysUp/RaysUp.