SynFlow: Scaling Up LiDAR Scene Flow Estimation with Synthetic Data
2026-04-10 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors address the challenge of teaching computers to understand 3D motion without relying on limited, hard-to-get real-world data. They created SynFlow, a system that generates a huge amount of realistic simulated 3D motion data for LiDAR sensors, focusing on diverse movement patterns rather than just realistic visuals. Training models on this synthetic data alone helps them perform well on real-world tests without extra real data, and fine-tuning with a small amount of real data improves results even more. Their approach offers a new, scalable way to improve 3D motion estimation across different environments.
3D dynamic perceptionLiDARscene flowsynthetic datasetmotion priorssimulationzero-shot learningdomain invariancefine-tuningself-supervision
Authors
Qingwen Zhang, Xiaomeng Zhu, Chenhan Jiang, Patric Jensfelt
Abstract
Reliable 3D dynamic perception requires models that can anticipate motion beyond predefined categories, yet progress is hindered by the scarcity of dense, high-quality motion annotations. While self-supervision on unlabeled real data offers a path forward, empirical evidence suggests that scaling unlabeled data fails to close the performance gap due to noisy proxy signals. In this paper, we propose a shift in paradigm: learning robust real-world motion priors entirely from scalable simulation. We introduce SynFlow, a data generation pipeline that generates large-scale synthetic dataset specifically designed for LiDAR scene flow. Unlike prior works that prioritize sensor-specific realism, SynFlow employs a motion-oriented strategy to synthesize diverse kinematic patterns across 4,000 sequences ($\sim$940k frames), termed SynFlow-4k. This represents a 34x scale-up in annotated volume over existing real-world benchmarks. Our experiments demonstrate that SynFlow-4k provides a highly domain-invariant motion prior. In a zero-shot regime, models trained exclusively on our synthetic data generalize across multiple real-world benchmarks, rivaling in-domain supervised baselines on nuScenes and outperforming state-of-the-art methods on TruckScenes by 31.8%. Furthermore, SynFlow-4k serves as a label-efficient foundation: fine-tuning with only 5% of real-world labels surpasses models trained from scratch on the full available budget. We open-source the pipeline and dataset to facilitate research in generalizable 3D motion estimation. More detail can be found at https://kin-zhang.github.io/SynFlow.