A technical review paper written by a team of engineers at the University of California, Riverside, published in the journal Device, explores wafer-scale accelerators, a new form of computing chip that is expected to revolutionize the future of artificial intelligence and be more environmentally friendly.
Unlike traditional graphics processing units (GPUs), which are only the size of a postage stamp, wafer-level accelerators are fabricated directly from a single silicon wafer and are as large as a dinner plate. These chips, created by Cerebras, are leading a quantum revolution in computing architecture.
The peer-reviewed paper, co-authored by a multidisciplinary research team at UC Riverside, points out that wafer-level processors not only deliver breakthroughs in computing performance, but also offer significant energy efficiency benefits – a key bottleneck that needs to be addressed in the face of increasingly large and complex AI models.
"Wafer-level technology is a game-changer advancement," said Mihri Ozkan, first author of the paper and professor of electrical and computer engineering at UCR, "and it enables AI models with trillions of parameters to run faster and more efficiently than ever before."
GPUs have been at the heart of AI development because of their powerful parallel processing capabilities, which can perform thousands of computing tasks simultaneously, making them ideal for processing images, languages, and real-time data streams. This parallelism allows self-driving vehicles to perceive their environment in real time, text-generated images to become reality, and ChatGPT to suggest a variety of recipes based on a string of ingredients.
Figure: Wafer-level chips: the super-engine that powers the next generation of AI
However, as the size of AI model parameters grows exponentially, even the top GPUs are starting to push the limits in terms of performance and power.
"AI computing today is not only about speed, but also about heat dissipation and energy consumption," Ozakan said, "and it's all about building a powerful and efficient data transmission architecture."
In the paper, the UCR team compares current mainstream GPU chips with systems such as Cerebras' latest third-generation wafer-level engine (WSE-3). WSE-3 integrates up to 4 trillion transistors and 900,000 compute cores optimized for AI. Another example is Tesla's Dojo D1 modules, each with about 1.25 trillion transistors and nearly 9,000 cores.
The advantage of these wafer-level systems is that they break through the latency and energy consumption bottlenecks caused by the "chip-to-chip communication" in traditional multi-chip architectures.
"Centralizing all computing units on a single wafer significantly reduces the loss and latency caused by data being transferred between multiple chips," Ozkan explains, "which not only improves performance, but also lays the foundation for the sustainability of AI systems."
In short, wafer-level chips are opening a new door for the next generation of AI computing, finding a better balance between performance, efficiency and environmental protection, and providing key support for the evolution of future intelligent systems.