ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation
2026-03-17 • Distributed, Parallel, and Cluster Computing
Distributed, Parallel, and Cluster ComputingArtificial IntelligenceHardware Architecture
AI summaryⓘ
The authors address the challenge of testing a system that combines CPUs and GPUs, especially when built from multiple small chips (chiplets). They created a method that records and replays system behaviors to help find and fix problems more easily before the hardware is made. By using this replay technique on both simulations and emulations, they can repeat complex tasks consistently, which speeds up debugging and integration. This approach helped them successfully run the whole system and its workloads in a relatively short time, showing that their method works well for chiplet-based designs.
CPUGPUchiplet architecturepre-silicon validationsimulationemulationNetwork-on-Chip (NoC)waveform capturereplay-driven validation
Authors
Nij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern Lim, Cathrane Teoh, Chuan Yin Loo, Bishara Shomar, Anthony Lester
Abstract
Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability. As systems evolve toward chiplet-based architectures, pre-silicon validation of tightly coupled CPU-GPU subsystems becomes increasingly challenging due to complex validation framework setup, large design scale, high concurrency, non-deterministic execution, and intricate protocol interactions at chiplet boundaries, often resulting in long integration cycles. This paper presents a replay-driven validation methodology developed during the integration of a CPU subsystem, multiple Xe GPU cores, and a configurable Network-on-Chip (NoC) within a foundational SoC building block targeting the ODIN integrated chiplet architecture. By leveraging deterministic waveform capture and replay across both simulation and emulation using a single design database, complex GPU workloads and protocol sequences can be reproduced reliably at the system level. This approach significantly accelerates debug, improves integration confidence, and enables end-to-end system boot and workload execution within a single quarter, demonstrating the effectiveness of replay-based validation as a scalable methodology for chiplet-based systems.