According to China Jiemian News, Xiaomi is actively building its own GPU Wanka cluster and plans to invest heavily in the research and development of AI large models. Since its inception in April 2023, Xiaomi's AI model team already has 6,500 GPU resources. Lei Jun emphasized the company's long-term commitment to AI when the team was founded, and mentioned Xiaomi's AI lab, Xiaoai voice assistant and autonomous driving team. Later in 2023, Lei Jun further elaborated on the progress of AI models at Xiaomi's annual event, particularly emphasizing the team's focus on lightweight, locally deployable solutions. At present, Xiaomi has run a large AI model with 1.3 billion parameters on its smartphones, and in some scenarios, its performance is close to that of a cloud model with 6 billion parameters. In addition, on December 20, 2023, Luo Fuli, the key developer of the DeepSeek-V2 open-source model, joined Xiaomi AI Lab to lead the work of the large model. Since its establishment in 2016, Xiaomi's AI department has expanded to more than 3,000 people, covering vision, speech, natural language processing, machine learning and multimodal AI and other fields. In addition to Xiaomi, some domestic technology giants have also bet on AI large models, showing strong technical strength and wide application potential. From the improvement of the performance of the basic model to the expansion of the application, the domestic large model has achieved breakthroughs in many aspects, and has gradually moved to the center of the global artificial intelligence stage.
1. Baidu Wenxin model: multi-dimensional innovation leads development
Baidu Wenxin will continue to make efforts in 2024 and achieve remarkable results in performance, technology and application. Wenxin 4.0turbo, released in June, delivers a 48% improvement in model performance, powering a wide range of AI applications. The retrieval enhancement of the Wensheng Diagram technology IRAG effectively reduces the illusion of large models in Wensheng Diagram, making the generated images closer to user needs and realistic logic. The Wenxin Intelligent Twin Platform brings together 150,000 enterprises and 800,000 developers, forming a huge innovation ecosystem to jointly promote the in-depth application of AI technology in various fields. With the blessing of hundreds of multimodal AI capabilities, especially innovative features such as free canvas, Baidu Wenku has attracted more than 70 million monthly active users, realizing the transformation from a traditional document platform to an intelligent service provider.
2. Tele-FLM-1T: A low-carbon and high-efficiency innovation model
As the world's first low-carbon monomer dense trillion-dollar language model, the Tele-FLM-1T jointly developed by KLCII and China Telecom Artificial Intelligence Research Institute has found a new balance between energy efficiency and performance. The emergence of this model not only reflects the spirit of exploration in technological innovation of domestic large models, but also provides valuable experience for the global artificial intelligence field to cope with environmental challenges. In the pursuit of high-performance computing, Tele-FLM-1T successfully reduces energy consumption through optimized algorithms and hardware utilization, laying the foundation for large-scale application of large models in the framework of sustainable development.
Figure: Xiaomi increases investment in AI large models (Source: Jiemian News)
3. DeepSeek V3: A double breakthrough in performance and efficiency
With its 671B parameter volume and MoE model architecture, DeepSeek V3 stands out in multiple evaluations and reaches the leading level in the open source field. Surpassing the performance of Llama 3.1 405B, it can compete with top models such as GPT-4o and Claude 3.5 Sonnet. Of particular note is its efficient training process, which takes less than 2.8 million GPU hours to complete, significantly reducing training time and training costs compared to comparable models. The training speed per trillion tokens can be completed in 3.7 days, and the efficient generation ability of 60 tokens per second makes DeepSeek V3 have higher response speed and processing efficiency in practical applications.
4. CodeGeeX4 - ALL - 9B
CodeGeeX4 - ALL - 9B from Zhipu AI has made an important breakthrough in the field of code generation. Iterative on the powerful language capabilities of GLM-4, the model greatly enhances the full range of code generation capabilities, including code completion, generation, interpretation, network search, tool calling, and warehouse-level long code Q&A. Whether it's simple code snippet generation or complex project-level code assistance, CodeGeeX4 - ALL - 9B can provide accurate and efficient support for developers, greatly improving the efficiency and quality of programming development.
5. Vivo Blue Heart Model: Industry Exploration of Device-Cloud Combination
As a typical representative of China's AI device-cloud integration, vivo's blue heart model gives full play to the advantages of end-to-end and matrix. By deeply integrating large-scale model technology with the mobile phone industry, vivo is committed to reconstructing mobile phone functions and exploring many landing scenarios to bring users a more intelligent and convenient experience. From intelligent voice assistant to image recognition optimization, the application of the Blue Heart model on mobile phones not only improves the user experience, but also provides new ideas and examples for the intelligent development of mobile terminal devices.
6. Application prospects and challenges of domestic large models
The progress of these domestic large models has not only promoted the wide application of artificial intelligence technology in China, such as intelligent writing, image generation, intelligent customer service, programming assistance and other fields, but also provided technical support for the upgrading and innovation of related industries. However, domestic large models also face some challenges in the development process, such as data quality and privacy protection, algorithm interpretability, and talent competition.