Feasibility of Time-Domain DNN-Based Speech Enhancement on Embedded FPGA for Hearing Aid

2026-06-02Sound

SoundHardware Architecture
AI summary

The authors studied how well a small neural network model for improving speech works on a tiny computer like those in hearing aids. They tested two tasks: separating speech from noise and just cleaning up noise, using a device called the AMD-Xilinx Kria KV260. They found that the main delay comes from moving data around inside the chip, not the math itself. Using fewer bits to represent numbers saved memory without hurting speech quality. Their noise cleaning method met the crucial hearing aid delay limit, showing what is needed for future hearing aid technology with neural networks.

DNN (Deep Neural Network)speech enhancementlatencyfixed-point precisionAMD-Xilinx Kria KV260SuDoRM-RF++speech separationdenoisingembedded hardwareparameter caching
Authors
Feyisayo Olalere, Umut Altin, Kiki van der Heijden, Marcel van Gerven
Abstract
Hearing aids impose strict latency and power constraints that current DNN-based speech enhancement systems struggle to meet on embedded hardware. We characterize this gap by deploying both speech separation and denoising using the lightweight SuDoRM-RF++ architecture on the AMD-Xilinx Kria KV260, evaluated at FP32 and 16-bit fixed-point precision for each task. Across these configurations, first-sample latency tracks with on-chip parameter caching rather than arithmetic throughput, identifying data movement as the primary bottleneck. Precision reduction halves the model memory footprint without compromising objective speech quality. The fixed-point denoising accelerator achieves a first-sample latency of 9.7~ms, meeting the 10~ms clinical threshold, while speech separation reaches 16.0~ms. These measurements establish concrete resource requirements for embedded DNN-based speech enhancement and quantify the remaining gap to hearing aid deployment.