Originally written: August 2025 | Published on Gorilla PE Insights: March 2026

This article is a public version of an internal partner memo shared in August 2025.

 


 

August 2025. Seven months have passed since the "DeepSeek Shock" in January. The market has moved rapidly during this time—some in anticipated directions, others in ways far more complex than expected.

 

This piece is not a victory lap declaring "we were right." Rather, it is an observation of what is actually unfolding in the AI infrastructure market right now, and a guide to where our focus should shift next.

 


 

Part 1. Post-DeepSeek — The Bottleneck Has Shifted

 

Inference Became the New Battleground

 

Immediately after the DeepSeek shock, the market jumped to a single conclusion: training compute is no longer needed, and GPU demand will plummet. NVIDIA's stock dropped 17% in a single day.

 

Fast forward seven months—what actually happened?

 

Inference traffic exploded. As compute costs dropped, the number of services integrating AI surged, meaning total GPU demand didn't shrink. What changed was the direction. Compute, once hyper-focused on single-run massive model training, dispersed into the time models spend "thinking" during inference (test-time compute scaling). If DeepSeek's GRPO revolutionized training efficiency, massive inference computing is now piling up on top of those optimized techniques.

 

This is a shift in the locus of demand. The total volume of compute hasn't decreased; the layer consuming it has simply changed

 

HBM — The First Wave of the Memory Wall

 

As inference compute exploded, a new bottleneck emerged: the KV Cache. With context windows expanding from 128K to over 1M tokens, single inference sessions began exceeding the H100's VRAM (80GB) by hundreds of gigabytes. Demand for HBM (High Bandwidth Memory) has hit an all-time high. SK Hynix's HBM3E supply couldn't keep pace, stretching waitlists past six months.

 

DeepSeek may have triggered fears that "GPUs are obsolete," but the ultimate outcome was a massive amplification of memory demand.

 

Computing is Starting to Sit Idle

 

But here, a highly unexpected phenomenon began to surface.

 

We are seeing cases where clusters are fully installed but cannot operate at full capacity. This isn't due to a GPU shortage, but rather a lack of power and insufficient cooling.

 

A single NVIDIA Rubin GPU has a TDP (Thermal Design Power) of around 1,200W. A single data center packed with H100 clusters consumes as much power as a small-to-medium-sized city. In parts of the US, the wait time for a new data center power grid connection exceeds five years. We are literally seeing fully built facilities sitting idle, waiting for electricity.

 

For the first time, we are entering a phase where compute (the silicon itself) is not the choke point. This is the signal that the bottleneck is shifting from silicon to physical infrastructure.

 


 

Part 2. The AI Model Race — A More Complex Landscape

 

The Fallout of Mid-Tier Players

 

Most mid-to-small foundation model startups that positioned themselves as the "next ChatGPT" in 2023–2024 failed to survive as independent businesses. Inflection AI saw its core team absorbed by Microsoft, while Adept AI transferred its talent and tech to Amazon. It has been empirically proven that without proprietary data, massive compute, or strong monetization channels, competing is nearly impossible.

This consolidation happened much faster and more radically than anticipated.

 

The Big Player Dynamics — Harder to Call Than Expected

 

The real question lies with the survivors. Ranking them is significantly harder now than it was 18 months ago.

 

OpenAI: With an ARR surpassing $3.4B, performance improvements continue through GPT-4o and the 'o' series. Their deep integration with Microsoft gives them the widest distribution channel in the enterprise market.

 

Google Gemini: Rebounded stronger than expected. The 1M token context window was a structural leap ahead of competitors. The vertical integration of their proprietary TPU infrastructure, YouTube data, and Search traffic is actually working. They might be more than just a runner-up.

 

Meta: The biggest wild card. Their Llama open-source strategy is genuinely driving ecosystem standardization. They've secured a physical-world touchpoint by integrating AI into Ray-Ban smart glasses and proved ROI by fully deploying AI into their core ad systems. Top-tier AI researchers are migrating to Meta at an impressive pace.

 

Anthropic: Enterprise contracts are growing steeply. Their research on Constitutional AI and Interpretability is emerging as a strong differentiator in certain corporate procurement standards.

 

Conclusion: Focusing on the "Foundation Model Gate" early was the right move. However, determining the ultimate winner among them has become even more uncertain. The competition itself has grown far more complex.

 

Physical AI — Right Direction, Longer Timeline

 

The asymmetric direction of the two waves we proposed in April 2024 remains valid. Tesla's FSD v13.2 achieved over 700 miles of zero-intervention highway driving, and footage of Optimus Gen 2 working in factories has been released.

 

Frankly speaking, however, the full-scale economic impact of Physical AI will likely take 2 to 3 years longer than we initially thought. The direction is correct, but the timeline to reach commercial scale is being pushed back.

 


 

Part 3. Where We Are Looking Next — Physical Bottlenecks

 

Heat · Power · Memory

 

While inference compute explodes and AI clusters are rapidly expanding, GPUs are starting to sit idle because of power shortages and inadequate cooling. This is the primary signal we have been watching closely since the second half of 2025.

 

Heat (The Cooling Bottleneck): Traditional air cooling has reached its physical limits. We are approaching a phase where transitioning to liquid and immersion cooling becomes inevitable.

 

Power (Energy Sovereignty): The pace of data center construction outstrips the speed of grid connection. Renewable energy is too intermittent. SMRs (Small Modular Reactors) are the right direction, but commercialization is still 5 to 10 years away. We need bridging power solutions. This is the exact context behind the Amogy deal (ammonia-to-hydrogen-to-power technology) we reviewed in September.

 

Memory (The Interconnect Bottleneck): If the HBM demand explosion was the first wave, the second wave will be memory pooling via high-speed interconnects (like CXL and other next-gen memory bus standards) and inter-server memory sharing. As the number of AI agents skyrockets and context windows keep expanding, demand for memory and bandwidth will explode yet again.

 

The pace of computing expansion is at an all-time high, but the electricity and cooling required to actually run that compute are lagging behind. For the first time, we're seeing phases where silicon isn't the primary constraint. The principle that "value accrues at the bottleneck" applies here as well.

 


 

Conclusion: A Summary for First-Time Readers

 

Oligopoly convergence was correct: Most mid-to-small players either failed or were absorbed by larger tech companies.

 

The shift to inference compute was correct: Post-DeepSeek, HBM demand reached an all-time high.

 

The winner's circle is more complex than expected: OpenAI, Google Gemini, Meta, and Anthropic are all building valid positions along entirely different trajectories. It’s hard to pick a single definitive winner right now.

 

The next bottleneck is physical infrastructure: Power, heat, and memory/interconnects. The next major constraints are forming across these three layers.

 

The "Foundation Model Gate" premium remains valid: Companies that entered early retain their premium, but their internal rankings require further observation.

 

Observation continues. The next theme is physical bottlenecks.

 


 

[Subsequent Developments] 

 

In March 2026, the TurboQuant incident (when Google's announcement of a 6x KV Cache compression algorithm sent SK Hynix and Samsung stock plunging) reaffirmed the logic of this piece. Efficiency breakthroughs do not eliminate memory demand. The expansion of context windows and the explosion of agents quickly absorbed the efficiency savings.

 


 

This article documents the perspective of Gorilla PE based on public information as of August 2025. It does not constitute investment advice.