Caching for Dollars, Not Hits: An Exact Offline Reference for Cloud-Egress Caching and the Crossover That Decides When It Pays

2026-06-18Databases

DatabasesData Structures and Algorithms
AI summary

The authors study caching strategies for cloud storage, where costs depend on both the number of fetch requests and data size, not just speed. They show that traditional caching methods aim to reduce misses but often ignore the actual cost differences of fetching items. They develop a way to find the lowest possible cost caching decision and compare common methods to this ideal. Their findings reveal that cost-aware caching performs much better when fetch costs vary, and they identify a clear rule for when cost-aware caching is necessary based on cloud pricing details. They validate their approach using real-world data from Twitter and provide a benchmark for future studies.

cache misscloud object storagecaching policycost optimizationLRUGreedyDualoffline optimallinear programmingcost-aware cachingcloud pricing
Authors
Madhulatha Mandarapu, Sandeep Kunkunuru
Abstract
When a cache miss fetches from cloud object storage, the bill is per GET request and per byte of egress, not latency. Classic caching minimizes the miss rate, the wrong objective: a rarely but expensively fetched object can cost thousands of times more dollars than a frequently but cheaply fetched one. Generalized-caching theory bounds the miss-cost objective, but no reported benchmark measures how far deployed heuristics sit from the dollar-optimal offline policy on real cloud prices. We supply that reference. For uniform-size page caches with heterogeneous miss costs the offline dollar-optimum is exact in polynomial time via an integral interval linear program -- validated against brute force; variable sizes are NP-hard, so we extend the flow-based offline bound from the hit-ratio objective to dollars (cost-FOO), tight to about four percent. Against this reference we find: (i) a heterogeneity-regret law -- LRU's dollar-regret rises with miss-cost dispersion (Spearman 0.87) while cost-aware GreedyDual cuts it to roughly a tenth; (ii) a contention frontier -- GreedyDual's residual regret collapses to near zero exactly when the budget fits the expensive working set, and is the open slice otherwise; and (iii) a closed-form crossover s* = GET_fee/egress_rate (about 4 KB on S3, 330 B on GCS) that predicts which deployments need dollar-aware caching at all. On a real Twitter trace the price vector alone moves the workload across s*, shifting the regime as predicted. The artifact is a reproducible billing-faithful benchmark; heuristics and bounds it builds on are prior work, credited.