Raja Koduri and SanDisk aim to redefine AI hardware with high bandwidth memory

Raja Koduri, one of the most influential figures in GPU engineering, is back in the spotlight—this time with a bold new mission: to overhaul how memory interacts with AI accelerators. After decades at AMD, Intel, and Apple, Koduri’s newest venture focuses on pushing the limits of high bandwidth memory (HBM) to meet the rising demands of artificial intelligence workloads. His latest collaboration with SanDisk signals a strategic pivot toward hardware that supports massive datasets, high-speed processing, and unmatched memory scalability. In this article, we’ll explore why HBM is crucial for the AI era, the potential of Koduri’s partnership with SanDisk, and how this vision could reshape the future of GPU design and computing infrastructure.

Why AI needs more than traditional memory

Modern AI models—particularly large language models (LLMs) and generative AI frameworks—demand immense computational throughput and massive memory bandwidth. Traditional DRAM and GDDR architectures, although effective for gaming and general GPU tasks, are increasingly becoming bottlenecks in AI workflows. As model parameters grow into the hundreds of billions, the need for a memory solution that can keep pace with data transfer and real-time model execution becomes critical.

HBM addresses these challenges by stacking memory closer to the compute cores and dramatically widening the memory bus. This architecture allows for considerably higher bandwidth while maintaining energy efficiency—a non-trivial advantage in data centers where thermal and power constraints are as important as raw performance. With AI training cycles consuming terabytes of data daily, a rethinking of how VRAM is structured and delivered is necessary to avoid compute starvation.

A new kind of partnership: Koduri and SanDisk

In a notable move, Raja Koduri has teamed up with SanDisk—historically a titan of NAND flash innovation—to explore what he calls high bandwidth flash (HBF). If successful, this effort could marry the speed of high-end flash storage with the latency and pipeline demands of GPU accelerators. Unlike conventional SSDs used for bulk storage, these HBF modules would be architected for low-latency access patterns optimized for AI inference and training tasks.

The collaboration is focused on integrating SanDisk’s flash storage capabilities directly into the GPU memory stack, bypassing traditional I/O bottlenecks. Koduri believes that by bridging the gap between flash performance and GPU memory needs, it’s possible to deliver significantly larger memory pools without sacrificing speed. This could translate into GPUs that are less dependent on external DRAM while offering TB-scale usable capacity tuned specifically for AI workflows.

The 4TB GPU: A vision for AI’s next leap

Perhaps the most headline-grabbing idea in Koduri’s roadmap is his vision for GPUs equipped with up to 4 terabytes of effective VRAM. This level of memory would be a seismic leap forward—today’s top-tier AI accelerators, like NVIDIA’s H100, peak around 80 to 96GB of HBM. A jump to 4TB would fundamentally change how datasets are managed in memory, enabling entire corpora, training environments, or real-time simulations to live fully in-GPU.

Such massive memory density isn’t just about size—it’s also about accessibility and throughput. The potential of HBF-infused memory stacks foresees AI processors accessing data at flash-like volumes but with bandwidth bordering close to on-package HBM. If realized, this could reduce training times, enable higher model concurrency, and significantly lower the cost of scaling AI workloads in production. It may also stimulate a wave of new hardware startups looking to follow this integrated memory direction.

What this means for hardware and developers

Koduri’s blueprint has broader implications beyond silicon design. Developers working on next-gen AI systems could soon target platforms where local memory is no longer a limitation. Engineers would reduce the overhead of memory streaming techniques and instead focus on performance-per-watt efficiency and temporal coherence in compute graphs.

On the hardware vendor side, this development could challenge NVIDIA and AMD to rethink their design cycles. GPUs with native support for HBF, or similar memory architectures, could emerge as a new competitive class alongside traditional accelerators. For cloud providers, integrating these types of GPUs could mean consolidating workloads—fewer nodes doing more work, thanks to vastly improved memory throughput and scale.

Final thoughts

Raja Koduri’s pursuit of radical improvements in GPU memory architecture could mark a turning point in how the industry designs for artificial intelligence. With his partnership with SanDisk laying the groundwork for high bandwidth flash integration, the concept of 4TB GPUs may soon shift from ambition to product roadmap. As AI workloads continue to balloon in size and complexity, aligning compute with equally high-performance memory isn’t just an optimization—it’s a necessity. If Koduri’s plans unfold as envisioned, the next generation of AI accelerators might look nothing like what we use today—and the performance ceiling will be vastly higher for developers, researchers, and enterprises alike.

{
“title”: “Raja Koduri and SanDisk aim to redefine AI hardware with high bandwidth memory”,
“seo_title”: “Raja Koduri, SanDisk, and the next leap in AI GPU memory”,
“category”: “PC Hardware”,
“tags”: [“AI accelerators”, “high bandwidth memory”, “Raja Koduri”, “GPU VRAM”, “SanDisk”],
“author”: “Tech & AI Desk”,
“post_type”: “article”
}

Image by: unavailable parts
https://unsplash.com/@unavailable_parts

Similar Posts