Implementation and Optimization of HQC Decoding on NPU-Integrated Devices
2026-06-01 • Cryptography and Security
Cryptography and SecurityHardware ArchitecturePerformance
AI summaryⓘ
The authors study how to make a specific post-quantum cryptographic method called Hamming Quasi-Cyclic (HQC) decoding work better on mobile devices with Qualcomm Hexagon processors. They find that HQC decoding can be organized in a way that uses vector operations efficiently on specialized hardware called Hexagon Vector eXtensions (HVX). By redesigning key parts of the decoding process to align with HVX capabilities, their implementation reduces the time and energy needed for decoding by a large margin. This shows that mobile platforms with neural processing units can effectively handle complex cryptographic tasks if the algorithms are adapted properly.
Hamming Quasi-Cyclic (HQC)post-quantum cryptographyReed-Muller codeReed-Solomon codeHexagon Vector eXtensions (HVX)Neural Processing Unit (NPU)finite-field arithmeticHadamard transformsyndrome computationvectorization
Authors
Vu Minh Chau, Nguyen Ngoc Kiet, Pham Quang Minh, Mai Xuan Ngoc, Nguyen Duc Anh, Hoang Ta
Abstract
Hamming Quasi-Cyclic (HQC) has been selected by NIST for standardization as an additional code-based key-encapsulation mechanism, providing algorithmic diversity alongside lattice-based post-quantum cryptography. Efficient deployment of HQC on mobile and embedded platforms, however, requires careful optimization of its decoding procedure, whose Reed-Muller and Reed-Solomon components dominate the computational cost. This paper studies HQC decoding on Qualcomm Hexagon processors in NPU-integrated devices, focusing on the Hexagon Vector eXtensions (HVX) backend rather than a tensor-inference engine. We observe that HQC decoding naturally exposes vector-structured computation, including Reed-Muller reliability vectors, Hadamard-transform coefficients, Reed-Solomon syndrome vectors, finite-field products, and packed support-point evaluations. Based on this observation, we redesign the dominant decoding kernels around HVX-friendly data layouts and execution patterns, including a vectorized Reed-Muller Hadamard transform, scalar-equivalent peak selection, HVX-oriented finite-field arithmetic, vectorized syndrome computation, and shortened-support locator-root evaluation. We implement and evaluate the optimized decoder using both Hexagon simulator measurements and real-device experiments on a Snapdragon~8 Gen~2 hardware development kit. The results show that Hexagon/HVX-assisted decoding substantially reduces latency and energy consumption, improving energy efficiency by up to $18.13\times$ while significantly offloading host CPU work. These results indicate that NPU-integrated mobile platforms can serve as effective backends for structured post-quantum cryptographic decoding when the underlying kernels are reformulated around vector execution.