MicroViTv2: Beyond the FLOPS for Edge Energy-Friendly Vision Transformers
2026-05-11 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors improved a lightweight Vision Transformer called MicroViT to make it run faster and use less energy on real devices, like the Jetson AGX Orin. They introduced new designs called RepEmbed and RepDW for faster processing, and a method called SDTA to better understand image details. Even though the new model uses a bit more calculation, it is more accurate and more efficient than similar models. Their experiments show that designing models with hardware in mind is important for balancing accuracy and energy use.
Vision TransformerReparameterizationPatch EmbeddingDepth-Wise ConvolutionAttention MechanismFLOPsEnergy EfficiencyJetson AGX OrinImageNet-1KCOCO Dataset
Authors
Novendra Setyawan, Chi-Chia Sun, Mao-Hsiu Hsu, Wen-Kai Kuo, Jun-Wei Hsieh
Abstract
The Vision Transformer (ViT) achieves remarkable accuracy across visual tasks but remains computationally expensive for edge deployment. This paper presents MicroViTv2, a lightweight Vision Transformer optimized for real-device efficiency. Built upon the original MicroViT, the proposed model is designed based on reparameterized design, specifically Reparameterized Patch Embedding (RepEmbed) and Reparameterized Depth-Wise convolution mixer (RepDW) for faster inference, and introduces the Single Depth-Wise Transposed Attention (SDTA) to capture long-range dependencies with minimal redundancy. Despite slightly higher FLOPs, MicroViTv2 improves accuracy up to 0.5% compared to its predecessor and surpassing MobileViTv2, EdgeNeXt, and EfficientViT while maintaining fast inference and high energy efficiency on Jetson AGX Orin. Experiments on ImageNet-1K and COCO demonstrate that hardware-aware design and structural re-parameterization are key to achieving high accuracy and low energy consumption, validating the need to evaluate efficiency beyond FLOPs. Code is available at https://github.com/novendrastywn/MicroViT.