Generative AI refers to AI systems that are capable of creating new content, which can be text, images, audio, or video, among others. This type of AI system learns large amounts of data, understands patterns and structures in the data, and is able to generate new, previously unseen instances of data. Through deep learning technology, generative AI is able to mimic and surpass the raw data distribution, making a profound impact in multiple industries such as education, entertainment, media, design, and more.
The development of large model technology and the improvement of computing power have provided strong support for generative AI, and the abundant training data resources have also promoted the rapid development of generative AI.
While none of the existing edge AI accelerators are suitable for transformers, the semiconductor industry is working to make up for this shortcoming. Demanding computing requirements are addressed at three distinct levels: innovative architectures, silicon scaling to lower technology nodes, and multi-chip stacking. However, advances in digital logic have not solved the memory bottleneck. Instead, they lead to the undesirable effect of what is known as a "memory wall".
A memory wall is a metaphor used to describe a situation where a system's performance is limited when the processor speed increases and memory access can't keep up. With the development of processor technology, the speed of CPUs is getting faster and faster, but the progress of memory technology has not kept up, resulting in memory becoming a performance bottleneck. It's like a wall that hinders the rapid transmission of data streams.
Figure: Generative AI and memory walls: A wake-up call for the IC industry
Memory barriers have plagued the semiconductor industry for years, and with each generation of processors, the problem has become more severe. To combat this, the industry has proposed a multi-level hierarchical memory architecture that employs faster but more expensive memory technologies close to the processor. Closest to the processor is multi-level caching, which minimizes the amount of communication between the slower main memory and the slowest external memory. Inevitably, the more layers are traversed, the greater the impact on latency and the less efficient the processor becomes.
The rapid development of generative AI has placed greater demands on computing power and memory bandwidth. As generative AI models grow in size and computational demands, memory bandwidth can be a key limiting factor in their performance. Especially when dealing with large datasets or running complex models, memory access speed can become a bottleneck, limiting the efficiency and speed of generative AI. The rapid development of generative AI has put forward higher technical requirements for the IC industry, including higher computing power, greater memory bandwidth, and lower latency. This requires the IC industry to continuously innovate and advance to meet the needs of generative AI.