Zero-Shot Depth from Defocus

2026-03-27 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors created a new big and high-quality dataset called ZEDD to improve how depth is estimated from images taken with different focus settings. They designed a new type of neural network called FOSSA, which uses a Transformer model to better process these image stacks by sharing information across different focus points. They also developed a method to create training data from existing depth datasets by simulating focus changes. Their approach works well across different datasets, reducing errors significantly compared to older methods. They have shared the dataset and code for others to use.

Depth from Defocusfocus stackTransformerzero-shot generalizationRGBD datasetneural networkfocus distance embeddingattention layersynthetic training datadepth estimation

Authors

Yiming Zuo, Hongyu Wen, Venkat Subramanian, Patrick Chen, Karhan Kayan, Mario Bijelic, Felix Heide, Jia Deng

Abstract

Depth from Defocus (DfD) is the task of estimating a dense metric depth map from a focus stack. Unlike previous works overfitting to a certain dataset, this paper focuses on the challenging and practical setting of zero-shot generalization. We first propose a new real-world DfD benchmark ZEDD, which contains 8.3x more scenes and significantly higher quality images and ground-truth depth maps compared to previous benchmarks. We also design a novel network architecture named FOSSA. FOSSA is a Transformer-based architecture with novel designs tailored to the DfD task. The key contribution is a stack attention layer with a focus distance embedding, allowing efficient information exchange across the focus stack. Finally, we develop a new training data pipeline allowing us to utilize existing large-scale RGBD datasets to generate synthetic focus stacks. Experiment results on ZEDD and other benchmarks show a significant improvement over the baselines, reducing errors by up to 55.7%. The ZEDD benchmark is released at https://zedd.cs.princeton.edu. The code and checkpoints are released at https://github.com/princeton-vl/FOSSA.

View PDFOpen arXiv