The Earthquake That Shook Silicon Valley
In late January 2025, Silicon Valley, the heart of the tech industry, was rocked by an unprecedented shock. On January 27, the Nasdaq index plummeted by 3.1%, and NVIDIA, the king of AI chips, lost approximately $600 billion in market capitalization in a single day. This marked the largest single-day loss for a single company in U.S. stock market history. In the aftermath, over $1 trillion evaporated from the U.S. technology market as a whole. The epicenter of this event, dubbed the ‘DeepSeek Shock,’ was not a market collapse or an economic crisis. The cause was a single press release from DeepSeek, a small Chinese startup founded just a year prior with fewer than 200 employees.
The core of this article is to analyze the paradox inherent in this event. DeepSeek's R1 model achieved performance nearly on par with the world's top proprietary models at a fraction of the cost. This achievement, a dramatic demonstration of capital and resource ‘efficiency,’ seemed to herald a de-escalation of the costly AI development race. However, this report presents the opposite conclusion. This event has, in fact, clarified the strategic landscape and will become a catalyst that intensifies the massive capital-based AI arms race, led by U.S. HyperScalers, to an unprecedented level.
The DeepSeek Shock proved China's emerging innovative capabilities, but it paradoxically reaffirmed the absolute dominance of the Scaling Law. While the barrier to entry for developing ‘good enough’ AI models has been lowered, it has become clear that the cost of developing true frontier models that can dominate the market is higher than ever. This ultimately solidifies the strategic advantage of U.S. HyperScalers and is ushering in an era of even more intense, capital-centric geopolitical competition.
 

DeepSeek's Innovation Strategy

 
What: A Frontier Model at a Breakthrough Price
The core of the DeepSeek Shock is the R1 model, unveiled on January 20, 2025. This model demonstrated performance on par with or even surpassing OpenAI's o1 model in key benchmarks like reasoning, mathematics, and coding. The most shocking aspect was the cost. While the training cost for U.S. models like GPT-4 is estimated to be around $100 million, the training cost for DeepSeek R1 was a mere $6 million. This represents a cost reduction of over 90% compared to U.S. Big Tech companies that spend over $100 million on frontier model training. This was achieved with only about 2,000 NVIDIA H800 GPUs—lower-spec chips designed for export to China under U.S. trade sanctions—unlike U.S. models that use over 16,000 top-of-the-line NVIDIA H100 GPUs. Unlike the H100 GPU, which analysts estimated at the time to cost between $25,000 and $30,000 per unit, the H800 GPU had performance limitations, but DeepSeek overcame this with algorithmic efficiency.
How to (1): A Paradigm Shift to Reinforcement Learning (RL)
DeepSeek's success went beyond mere cost reduction; it raised fundamental questions about AI training methodology itself. The predominant method used by existing U.S. models is Reinforcement Learning from Human Feedback (RLHF). This is a complex and costly process that involves training a separate ‘reward model’ with vast amounts of human-labeled data, which is then used to optimize the language model.
DeepSeek pioneered two different, innovative paths. First, with its DeepSeek-R1-Zero model, it proved for the first time in the world that reasoning capabilities could be elicited through pure reinforcement learning, without a Supervised Fine-Tuning (SFT) stage. This is a fundamental breakthrough in the field of AI research.
Second, for the actually released DeepSeek-R1 model, it applied a new technique called Group Relative Policy Optimization (GRPO). Instead of relying on a human-created reward model, GRPO directly optimizes the model using automatically verifiable rewards, such as ‘Did the code compile?’ or ‘Is the math problem's answer correct?’. This method dramatically reduces the immense cost and time required for data labeling compared to the PPO (Proximal Policy Optimization)/RLHF approach. It is an ingenious solution that bypasses the bottlenecks of traditional AI training methods.
How to (2): Maximizing Architectural Efficiency with Mixture of Experts (MoE)
DeepSeek also effectively utilized the Mixture of Experts (MoE) architecture. MoE is a technique that constitutes a model from numerous small ‘expert’ sub-networks instead of a single, massive model. When a specific input is given, a ‘gating network’ activates only the most relevant few experts (e.g., 2 out of 8) to handle the task. The core objective of MoE is to increase the model's total number of parameters (i.e., its knowledge capacity) while suppressing a proportional increase in computational cost during inference by activating only a part of the model. This is a key strategy for extracting maximum performance from limited computing resources. While the MoE architecture itself was not a new concept, previous attempts were merely academic discoveries. In contrast, it is significant in that DeepSeek used it to implement a commercially viable, near-frontier model.
The Strategic Gamble of Adopting Open Source
Crucially, DeepSeek immediately released the weights of its R1 model as open source. This announcement, timed to coincide with the U.S. Presidential Inauguration, was a clear strategic signal. It positioned DeepSeek not just as a competitor but as a contributor to the global developer community, directly challenging the closed-model paradigm led by OpenAI and Anthropic. As a result, the DeepSeek app achieved the feat of surpassing ChatGPT to become the number one downloaded app on the U.S. iOS App Store within days.
These technical and strategic moves show that DeepSeek's innovation was not mere imitation but a creative response to its given constraints. When U.S. export controls restricted access to top-tier H100 GPUs, DeepSeek had to maximize the performance of lower-spec H800 chips. Simultaneously, unable to afford the costly RLHF methodology, it developed a new alternative in GRPO. This is a clear example of how constraints can trigger asymmetric innovation.
Furthermore, the shock that DeepSeek sent through the market was not just about surprise at its cost-efficiency; it stemmed from the realization that America's 'methodological moat' could be eroded. Until then, U.S. AI leadership was believed to stand on three pillars: superior hardware (NVIDIA), massive capital, and leading development methodologies (RLHF). However, DeepSeek proved that frontier-level performance could be reached with inferior hardware and less capital. More importantly, through its GRPO and pure reinforcement learning experiments, it showed that RLHF was not the only path to achieving high-level reasoning abilities. This sparked a sense of crisis that America's technological superiority might be more fragile than previously thought.
 

The Big Tech Empire Strikes Back: Why the Scaling Law Remains Unbroken

 
The Unshakeable Law of Scale
While DeepSeek's efficiency innovation struck the market, the fundamental principle of AI development remained unchanged: the Scaling Law. This is an empirically observed law stating that a model's performance (measured by test loss) improves in a predictable way as three key factors increase: the number of model parameters (N), the size of the dataset (D), and the amount of computation used for training (C). The key is that to reach the absolute frontier of AI performance, all three of these elements must be scaled to unprecedented levels, and this cost increases exponentially, by more than the cube of the performance improvement.
Even DeepSeek's developers have directly admitted to developing their model based on the scaling law. Among the papers they published with the release of DeepSeek, "DeepSeek LLM: Scaling Open-Source Language Models with Longtermism" is a key document that most directly shows DeepSeek's technical philosophy, stating from the first paragraph that the scaling law is the foundation of their technology.
"Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective."
Moreover, there are undisclosed aspects to DeepSeek's cost-efficiency. While the DeepSeek R1 model was stated to have utilized a small amount of 'cold-start' data and synthetic data during its training process, these datasets were not released as open source like the model weights, and the cost required to generate them remains unknown. Considering that building high-quality data for RLHF entails significant costs, the reported $6 million training cost may only be a fraction of the total development cost. This implies that even DeepSeek's innovation requires substantial initial investment, further highlighting the importance of the scaling law.
From Panic to Reassurance: ‘DeepSeek Shock’: The Starting Gun for the Global AI Arms Race
The initial market plunge was a reflexive reaction to the fear that if a small startup could compete so effectively, the billions of dollars invested in AI infrastructure might be a bad bet. However, the market quickly rebounded. This rebound was not irrational but the result of a sophisticated reassessment. The market began to understand that while DeepSeek had lowered the 'floor' for competitive AI, the 'ceiling' for the next-generation AI that will truly dominate the market is still determined by the scaling law.
The most definitive evidence demonstrating the validity of a strategy to gain an edge in frontier model development through the superiority of scale, in accordance with the scaling law, is the capital expenditure (capex) plans of U.S. HyperScalers, which became even more resolute after the DeepSeek Shock.
 
Microsoft plans a capex of about $80 billion for the 2025 fiscal year, with plans to increase it to over $100 billion the following year. Most of this will be invested in building AI data centers.
  • Meta has announced a forecast of $60-65 billion.
  • Amazon is targeting $100 billion in spending and has already made its commitment to the competition clear by launching its own frontier model family, Amazon Nova, in December 2024.
  • Alphabet (Google) announced it would spend $85 billion in 2025.
These numbers are not just budgets. It is a strategic declaration that the U.S. will absorb tactical algorithmic innovations and win this AI great war with overwhelming industrial and financial capabilities.

The Moat of Data and Distribution
In addition to computing power, hyperscalers have two decisive advantages that create synergy with the scaling law: vast proprietary data and global distribution channels. xAI utilizes the real-time data stream from X (formerly Twitter) for training its Grok model, and Microsoft directly integrates OpenAI's technology into its vast suite of enterprise and consumer software products. The global user bases of Google and Meta are both proprietary data sources and also serve as the revenue base that supports their astronomical capital. This combination of computing, proprietary data, and customer access builds a multi-layered and robust moat that cannot be overcome by mere efficiency alone.
This situation suggests that the DeepSeek Shock paradoxically justified the massive spending strategies of hyperscalers and accelerated the AI arms race. Before DeepSeek, massive capital expenditures might have seemed speculative and inefficient to some investors. However, DeepSeek's success signaled the emergence of a tangible and credible external competitor, and the threat was no longer theoretical. This rallied U.S. tech giants and their investors. The debate shifted from ‘Is this spending wise?’ to ‘Is this spending a strategic imperative for national and economic security?’ The market's rebound reflects this new consensus. Now, the purpose of the spending is not just to innovate, but to create a gap so large that even highly efficient competitors cannot catch up.
Consequently, the nature of AI competition is shifting from an algorithmic competition to an industrial-scale supply chain competition. Early AI development was driven by algorithmic breakthroughs like the Transformer architecture. DeepSeek's GRPO is also an algorithmic innovation, but its impact is limited by access to hardware and data. In contrast, the hyperscalers' response is primarily focused on logistical and industrial aspects. This is an all-out competition to secure hundreds of thousands of GPUs, build city-sized data centers, secure energy contracts, and control data pipelines. The key competitive metrics are now capex budgets, GPU cluster sizes, and data throughput. This is a game that only a few trillion-dollar companies can play, and it effectively excludes all other competitors from the frontier race.
 

[Prediction] The Dual AI Arms Race

 
America's Response: An All-Out Focus on Scale
The DeepSeek Shock shattered the concept of a linear path of AI development led by Silicon Valley. It proved that China, under the pressure of U.S. sanctions, is not just a ‘fast follower’ but a true innovator capable of forging its own path.
Faced with a competitor that can innovate by bypassing hardware constraints, the strategic response of the U.S., especially the hyperscalers, is to go all-in on the one advantage that China cannot easily replicate: overwhelming capital and existing global infrastructure. The AI competition has now been clearly established as a core pillar of the U.S.-China great power competition.
Potential Bifurcation of the AI LLM Market
These dynamics will likely bifurcate the global AI market into two tiers.
Tier 1: The Frontier Model Layer. This is a high-risk, capital-intensive war to capture state-of-the-art (SOTA) performance. This competition will be waged among U.S. hyperscalers (Microsoft/OpenAI, Google, Amazon, Meta, xAI, Anthropic) and potentially state-backed champions from China. This is the stage where the geopolitical ‘Chip War’ and ‘AI Arms Race’ are actually taking place. Here, the goal is to secure strategic and structural dominance, beyond commercial success.
Tier 2: The Commercialized Application Layer Based on Open Source. Thanks to high-quality, efficient open-source models like DeepSeek R1 and Meta's Llama, a vibrant ecosystem will emerge where innovation and value creation happen at the application layer. Companies will compete not by creating the best foundation models, but by integrating these commoditized models into specific products and services.
 
Intensification of the Arms Race
Now, the competition will accelerate on all fronts. Spending on computing will explode (see Table 2), and the ‘AI talent war’ will intensify as companies offer massive rewards to recruit top researchers. Furthermore, government-level measures like the US AI Action Plan to secure a national advantage will be strengthened, and efforts to prevent model replication and intellectual property theft will also increase. The open-sourcing of models like DeepSeek R1 is a double-edged sword for the U.S. On one hand, it accelerates global innovation and allows U.S. companies to build applications on top of powerful free models, reducing their R&D costs. On the other hand, it commoditizes technology that U.S. companies spent billions to develop, weakening their competitive advantage at the model level. More decisively, it forms a kind of soft power, increasing global dependence on and familiarity with non-U.S. technical standards. As Microsoft President Brad Smith pointed out, this gives China a significant foothold in the crucial battlefield of the global developer community.
Ultimately, the long-term competitive landscape can be seen not as ‘U.S. vs. China,’ but as a confrontation between two conflicting philosophies of technological advancement: ‘Scale’ and ‘Efficiency.’ U.S. hyperscalers represent the ‘Scale’ philosophy, pushing the scaling laws to their logical conclusion by investing overwhelming resources. DeepSeek symbolizes the ‘Efficiency’ philosophy, creating clever algorithmic and architectural shortcuts that deliver most of the performance at a fraction of the cost. In the short term, ‘Scale’ will win the race for the ultimate frontier model, but the much larger market of everyday AI applications is likely to be dominated by the ‘Efficiency’ paradigm. The future will likely see a cyclical pattern of disruption and integration, where efficiency innovations from Tier 2 are periodically absorbed and scaled by the giants of Tier 1.
 

Conclusion - The Starting Gun for Full-Scale AI Industrial Competition
 

The 'DeepSeek Shock' of January 2025 was not the prelude to the end of America's AI hegemony. Rather, it was the end of the initial phase and the opening of an era of full-scale competition. This event shook off the industry's complacency and more clearly revealed the true nature and direction of the technological race.
DeepSeek's astonishing efficiency did not negate the scaling law; on the contrary, it simultaneously proved both its validity and its limits. By showing the world how quickly a determined and radical technological pursuer can achieve innovation, DeepSeek paradoxically gave U.S. hyperscalers the strategic justification to fully mobilize their core competitive strengths: capital power and infrastructure.
As a result, the global AI competition has been reshaped into a comprehensive, high-intensity industrial and geopolitical contest that goes beyond simple algorithmic performance to encompass infrastructure, supply chains, capital, regulatory frameworks, and even diplomatic strategy. The 'shock' itself was a one-time event, but its aftershocks will structurally define the landscape of the AI industry and the international order for the next decade or more.