Probabilistic Memory for Trustworthy Edge Intelligence

2026-07-02Hardware Architecture

Hardware Architecture
AI summary

The authors explain that making computers smarter and more trustworthy often needs them to work with probabilities, but generating the random numbers needed for this is usually slow and costly. They created a special kind of memory, called probabilistic memory (p-MEM), that stores simple statistical info and can quickly produce random numbers right where data is stored. Their tests show p-MEM is extremely fast and efficient, cutting down time, energy, and computing steps needed for probabilistic AI tasks. This makes it easier for devices like CPUs and GPUs to handle uncertainty in data without slowing down.

Probabilistic computationGaussian random number generation (GRNG)Probabilistic memory (p-MEM)Bayesian neural networksMemory bandwidthUncertainty quantificationEdge intelligenceSampling latencyInstruction overheadHardware acceleration
Authors
Likai Pei, Jiahao Zheng, Xueji Zhao, Emilie Ye, Jianbo Liu, Hanqing Tao, Ming-Yen Lee, Ruiyang Qin, Yiyu Shi, Shimeng Yu, X. Sharon Hu, Ningyuan Cao
Abstract
Probabilistic computation plays an important role in trustworthy edge intelligence to quantify uncertainty, enhance robustness, reconstruct data, and protect privacy, but its adoption is limited by the orders-of-magnitude data throughput gap between Gaussian random number generation (GRNG) and computation, as well as instruction overhead. This paper introduces probabilistic memory (p-MEM), a unified memory primitive that stores distribution parameters, such as mean and standard deviation, and samples directly at the native memory bandwidth, where deterministic data becomes the zero-variance special case. Using a layout-validated p-MEM simulator, we comprehensively explore device choices, memory specifications, and technology nodes, showing that p-MEM can achieve more than 1000 GSa/s/mm^2 GRNG throughput, including memory-array access. Integrated into CPU/GPU systems, p-MEM reduces instruction count by up to 2.19x/4.37x, sampling latency by 562x/3.45x, and energy by 295.5x/3.53x for Bayesian neural network workloads, providing a scalable hardware substrate for trustworthy probabilistic AI.