Towards Delta Aware Training: Efficient DNN Weight Storage for Resource-Constrained FPGAs

2026-06-15Hardware Architecture

Hardware Architecture
AI summary

The authors address the problem of running deep neural networks on small, limited devices called FPGAs by reducing the amount of memory needed to store the network's weights. They do this by saving the differences (called deltas) between weight values in fewer bits and training the network to handle this compression. They tested two types of delta schemes on a simple image dataset and found that using a fixed reference point for deltas works better than using consecutive differences. Their specialized hardware design cuts the memory size in half and runs efficiently on a specific FPGA device, though with some decrease in accuracy.

embedded deep neural networksFPGAmemory compressionweight deltasfixed-reference deltamulti-layer perceptronFashionMNISTmultiply-and-accumulatehardware accelerator
Authors
David Peter Federl, Lukas Einhaus, Andreas Erbslöh, Gregor Schiele
Abstract
The deployment of embedded deep neural networks on resource-constrained field programmable gate arrays (FPGAs) is challenging due to limited memory and computational capacities. We introduce a new compression technique to reduce the memory footprint by saving weights in deltas with lower bitwidth and training the network to cope with compressed deltas. Two delta schemes are investigated: consecutive deltas and deltas with a fixed-reference value. We evaluate both on the FashionMNIST data set with a multi-layer-perceptron. The results indicate that fixed-reference delta compression outperforms the consecutive variant, achieving a validation accuracy of approximately 78.6 %, with 4 bit weight deltas, representing an accuracy loss of roughly 8.3 % compared to a fixed-point network with 8 bit. Our specialized hardware accelerator with a delta-compressed multiply-and-accumulate operator compresses weights by nearly 50 % and achieves a maximum throughput of 7.992M MACs/s on an AMD Spartan-7 S15 FPGA.