Parent-Hash DAG: A Cost Analysis of Constant-Time Append for On-Chain Registries
2026-06-08 • Distributed, Parallel, and Cluster Computing
Distributed, Parallel, and Cluster ComputingCryptography and Security
AI summaryⓘ
The authors study a special way to store data called provenance trees, which keep track of information changes using blockchain technology. They focus on a pattern called PHDAG, which lets new data be added with a fixed and predictable cost, no matter how big the tree is. They compare this to the more common method, incremental Merkle trees (IMT), which get more expensive as the data grows. Their tests show that PHDAG is much more efficient for large data sets, and they also show how to rebuild the entire record safely using only public logs.
provenance treesblockchaindirected acyclic graph (DAG)parent-hash DAG (PHDAG)incremental Merkle tree (IMT)gas costregistrypublic event logsdata structure appendstochastic cost model
Authors
Ian C. Moore, Fernando Paredes Garcia
Abstract
Provenance trees are append-only directed acyclic graphs of artifact registrations anchored on a public blockchain, recently introduced as the data substrate of operator-gated provenance infrastructure. Their defining data-structural pattern is a parent-hash directed acyclic graph (PHDAG), in which each append performs a constant number of storage writes to previously-untouched slots. This pattern has not previously been isolated as a standalone primitive, formally bounded with explicit constants, or benchmarked against the standard alternative, the incremental Merkle tree (IMT). We formalize PHDAG append as O(1) in gas cost, independent of registry size and tree depth, and develop a stochastic cost model for IMT in which per-insert cost is a random variable over the leaf index, deriving closed-form expressions for its mean and variance. We validate both analyses empirically on Base Sepolia across tree depths 1 to 25. PHDAG is observed to be depth-invariant at 76,276 gas (standard deviation about 6 gas), while IMT cost grows linearly with depth. The crossover below which IMT is cheaper falls far beneath the depths of every production registry surveyed. We further establish trustless registry reconstruction from public event logs in linear time with no off-chain dependency.