Today's we are going to share the article Lines Blurring Between Supercomputing And HPC. semiengineering.com focuses on the reporting and analysis of semiconductor engineering and related technology fields, providing professional and cutting-edge information for industry insiders and technology enthusiasts. This article delves into the growing blurring of the line between supercomputing and high-performance computing (HPC). Here are the key takeaways from the article:
At present, the vigorous development of AI and the decoupling of computing components are profoundly changing the frontier computing field. Supercomputers and high-performance computers have historically differed significantly in the service market, with supercomputers mostly used for scientific and academic computing, and their performance measured in exascale floating-point operations; HPC, on the other hand, focuses on traditional applications, relying on high-bandwidth memory, fast processor communication, and a large number of floating-point operations per second. However, as AI training and inference have become the focus of computing, their architectures are converging.
AI has had a profound impact on both modes of computing. On the one hand, the convergence of CPUs and GPUs in heterogeneous environments continues to evolve, and GPUs have moved from gaming and mining to AI computing cores, and their excellent scalability has become the key to improving computing performance. Google, for example, uses a combination of CPUs and GPUs to train AI models such as image recognition and natural language processing in its data centers to optimize performance. This has led to the rise of hybrid computing systems, which blend classical, supercomputing, and even quantum computing to meet the performance, reliability, and security needs of a wide range of applications. But on the other hand, the powerful computing demands brought by AI also pose challenges, supercomputers are extremely energy-intensive, such as Microsoft, OpenAI and SoftBank's Stargate system, which is expected to require 5 gigawatts of electricity, far more than nuclear power plants in the United Kingdom and the United States, energy efficiency and sustainability have become urgent issues to be solved, such as the "Apex" supercomputer at Oak Ridge National Laboratory in the United States, although powerful, engineers are also trying to explore new cooling technologies and power management strategies to reduce their energy consumption.
Figure: The line between supercomputing and HPC is blurring
Technological progress is the core driving force behind the convergence of the two. High-bandwidth memory, high-speed chip-to-chip communication technologies, and chiplet-based solutions are enabling supercomputing and HPC while meeting the demands of AI computing. However, data mobility challenges have emerged, with the cost of data transmission outpacing the cost of computing, which can be mitigated by advanced packaging technologies, but also presents challenges in terms of heat dissipation and power delivery. For example, NVIDIA's DGX A100 system, which combines powerful GPUs, high-bandwidth memory, and high-speed interconnect technology, is used in AI research institutes to train large language models, as well as some supercomputing-related applications, but also faces challenges in data transmission and cooling.
From the perspective of computing accuracy, there is a contradiction between the probabilistic calculation results of AI and the high precision required by supercomputing, and the double-precision 64-bit commonly used in scientific computing may be replaced by 8-bit or 16-bit in AI, which makes the applicability of hardware complicated. In addition, the timeliness of data processing is also challenging, and the delay between processors affects the speed of feedback of results. In the aerospace field, supercomputing tasks such as fluid dynamics simulation of aircraft design require high-precision calculations to ensure reliable performance, and some AI-based flight trajectory prediction models may be trained with lower accuracy to speed up, which may affect reliability due to accuracy problems in practical applications.
It is worth mentioning that the concept of "supercomputer" is not only a technical representation, it also has significant cultural and inspiring value, and is a symbol of the frontier of science and technology, inspiring a new generation of engineers and scientists to continue to explore. Every year, the Supercomputing Conference attracts a large number of students and technology enthusiasts, stimulating their interest in the field of science and technology, and many of them are determined to devote themselves to related research.
Overall, the blurring of the lines between supercomputing and HPC is an inevitable trend in technology development, where opportunities and challenges coexist. In the future, with the continuous innovation of technology, the integration of the two will bring more changes to the entire technology field, which deserves our continued attention.