The 'DeepSeek Shock': The Starting Gun for the True AI Arms Race
By Yongkwon Lee | Partner, Gorilla PE | Feb 2025
The Earthquake That Shook Silicon Valley
In late January 2025, the heart of Silicon Valley was struck by an unprecedented shockwave. On January 27th, the Nasdaq plunged 3.1%, and Nvidia—the undisputed king of AI chips—saw a staggering $600 billion wiped from its market cap in a single day. It was the largest single-day loss for an individual company in US stock market history. Across the broader US tech sector, over $1 trillion in value simply evaporated.
The epicenter of this seismic event wasn't a macroeconomic crash or a sudden market collapse. It was a single AI model released by DeepSeek, a Chinese startup less than a year old with fewer than 200 employees.
The core question driving this piece is this: DeepSeek’s breakthrough seemed to herald a cooling-off period for the exorbitantly expensive AI development race. However, we argue the exact opposite. Rather than ending the race, this event has clarified the strategic landscape and acted as a catalyst, escalating the capital-intensive AI arms race led by US hyperscalers to unprecedented heights.
The DeepSeek shock certainly proved China's capacity for innovation. But paradoxically, it also reaffirmed the absolute dominance of the Scaling Laws. While the barrier to entry for building a "good enough" AI model has drastically lowered, the cost to develop a true frontier model that commands the market is now higher than ever.
DeepSeek’s Innovation Playbook
What: A Frontier Model at a Radical Price
At the center of the DeepSeek shock is the R1 model, released on January 20, 2025. Across core benchmarks like reasoning, math, and coding, R1 matched—and in some cases exceeded—OpenAI’s o1 model.
But the most staggering metric was the cost. While the training cost for a US-made frontier model like GPT-4 is estimated at roughly $100 million, DeepSeek R1 was trained for a mere $6 million. Even more remarkably, while US models rely on clusters of 16,000+ top-tier Nvidia H100 GPUs, DeepSeek achieved this using only about 2,000 H800 GPUs—a downgraded chip designed specifically to comply with US export sanctions to China.
How to (1): The Paradigm Shift to Reinforcement Learning (RL)
DeepSeek’s success wasn’t just about cutting costs; it posed a fundamental question to the AI training methodology itself. The dominant approach among US models has been Reinforcement Learning from Human Feedback (RLHF). This is a complex, wildly expensive process that involves training a separate reward model using massive amounts of human-labeled data, which is then used to optimize the language model.
DeepSeek charted two innovative alternative paths:
Pure Reinforcement Learning: With the DeepSeek-R1-Zero model, they proved to the world for the first time that reasoning capabilities could be unlocked purely through RL, completely bypassing the Supervised Fine-Tuning (SFT) stage. This is a foundational breakthrough in AI research.
GRPO (Group Relative Policy Optimization): For the officially released R1 model, they applied a novel technique called GRPO. Instead of relying on a human-crafted reward model, GRPO optimizes the model directly using automatically verifiable rewards—like "Did the code compile?" or "Is the math answer correct?". This dramatically slashes the time and immense capital required for human data labeling.
How to (2): Maximizing Architectural Efficiency with MoE
DeepSeek also masterfully leveraged the Mixture of Experts (MoE) architecture. Instead of building one monolithic model, MoE structures the model into multiple smaller "expert" sub-networks. When given a prompt, a gating network activates only the handful of experts most relevant to the task. The goal here is clever: increase the total parameter count of the model to boost capability, while keeping active compute low during inference, thus preventing computational costs from scaling linearly.
The Strategic Bet on Open Source
Crucially, DeepSeek immediately open-sourced the weights for the R1 model. Timing this release with the US Presidential Inauguration Day was a glaringly obvious strategic signal. It positioned DeepSeek not just as a competitor, but as a core contributor to the global developer community, launching a frontal assault on the closed-model paradigm championed by OpenAI and Anthropic.
These technical and strategic maneuvers prove that DeepSeek’s innovation wasn't mere imitation—it was a creative masterclass in operating under severe constraints. Blocked from accessing top-tier H100 GPUs by US export controls, they were forced to squeeze every drop of performance out of H800 chips. Lacking the massive budgets required for RLHF, they invented GRPO.
Constraints bred asymmetric innovation.
The Empire Strikes Back: Why Scaling Laws Remain Unbroken
The Unwavering Law of Scale
DeepSeek’s efficiency breakthroughs shocked the market, but the fundamental physics of AI development remain unchanged: The Scaling Laws. This is the empirically observed rule that a model's performance improves predictably as the number of parameters (N), dataset size (D), and compute volume (C) increase. To reach the absolute bleeding edge of AI capability, all three variables must be pushed to unprecedented scales—and the cost to do so grows exponentially.
Even DeepSeek's own developers openly acknowledged that their work is grounded in these very laws. The opening sentence of the paper released alongside their model, "DeepSeek LLM: Scaling Open-Source Language Models with Longtermism," reads:
“Guided by the scaling laws, we introduce DeepSeek LLM...”
Furthermore, the narrative of DeepSeek’s extreme cost efficiency leaves out hidden variables. The cold-start data and synthetic data used during training were not open-sourced like the model weights, and the true cost of generating that data remains unknown. The publicized $6 million training tag is likely only a fraction of the total development cost.
From Panic to Reaffirmation: The Hyperscaler Response
The initial market plunge was a knee-jerk reaction driven by fear: If a tiny startup can compete this effectively, are the billions poured into AI infrastructure a bad bet? But the market rebounded quickly. Investors realized that while DeepSeek dramatically lowered the "floor" for competitive AI, the "ceiling"—the threshold for the next-generation models that will truly dominate the market—is still governed by the brute force of Scaling Laws.
If anything, the capex plans of the hyperscalers following the DeepSeek shock have only calcified:
Microsoft: ~$80B for FY2025, projecting >$100B the following year.
Meta: $60B – $65B.
Amazon: Targeting $100B.
Alphabet (Google): $85B.
These aren't just budgets anymore. The conversation has shifted from "Is this spending prudent?" to "Is this spending a strategic imperative for national and economic security?" Paradoxically, DeepSeek handed US hyperscalers the ultimate strategic justification to double down.
The Moats: Data and Distribution
Beyond raw compute, hyperscalers possess two insurmountable advantages: proprietary data and global distribution channels. xAI leverages X’s (formerly Twitter) real-time data stream to train Grok; Microsoft injects OpenAI’s tech directly into its ubiquitous enterprise software suite. The holy trinity of compute, proprietary data, and customer access creates a multi-layered moat that algorithmic efficiency alone cannot bridge.
The Two Tiers of the AI Race
The DeepSeek shock has cleanly bifurcated the global AI market into two distinct tiers:
Tier 1: The Frontier Model Layer. This is a high-stakes, hyper-capital-intensive war for absolute state-of-the-art performance. It is the arena where the geopolitical "AI arms race" is actually being fought—primarily among US hyperscalers and, potentially, state-backed Chinese champions.
Tier 2: The Open-Source Application Layer. Thanks to high-quality open-source models like DeepSeek R1 and Meta’s Llama, a vibrant ecosystem is emerging where innovation and value creation happen at the application level. Instead of trying to build the best foundation model from scratch, the vast majority of companies will compete on how effectively they integrate these models into specific products and workflows.
The essence of the AI race is shifting from algorithmic competition to industrial-scale supply chain warfare. The core KPIs are now capex budgets, GPU cluster sizes, and data throughput. This is a game only the trillion-dollar club can play.
Final Thoughts
The 'DeepSeek Shock' of January 2025 did not signal the end of US AI hegemony. It signaled the end of the beginning, and the dawn of an all-out, industrialized race.
DeepSeek’s phenomenal efficiency didn't disprove the Scaling Laws; it simultaneously proved their validity and their harsh realities. By demonstrating exactly how fast a determined challenger can innovate, DeepSeek paradoxically gave US hyperscalers the perfect justification to mobilize their ultimate weapons: bottomless capital and massive physical infrastructure.
But as I write this, a lingering question remains. If scaling isn't dead but actually accelerating—where is all this compute going to be consumed?
Post-DeepSeek, compute is migrating from the training phase outward into inference, multi-agent systems, and complex tool use. The next big question we must answer is: Where will this dispersion create the next massive bottleneck, and who is positioned to capture it?