Insight

Yongkwon Lee | Partner, Gorilla PE

Every time an AI efficiency breakthrough hits the market, investors sell first and ask questions later.

DeepSeek. SaaSpocalypse. Last week's TurboQuant. Same pattern, different ticker symbols.

The question markets keep missing is not whether efficiency improved. It is where that efficiency relocates demand — and where it creates the next bottleneck.

Technology does not destroy demand. It relocates it. When costs fall in one layer, usage explodes. When usage explodes, a new layer becomes the constraint. And whoever owns that constraint captures the value.

Three events. One structure.

DeepSeek & Kimi K2 — Scaling Didn't End. It Distributed.

In January 2025, DeepSeek erased roughly $590 billion from NVIDIA's market cap in a single day. A Chinese startup appeared to build a frontier-class model at a fraction of the cost. Markets drew the obvious conclusion: GPU demand must be over.

Seven months later, Moonshot AI's Kimi K2 outperformed GPT-class models on coding and reasoning benchmarks. Nature called it "another DeepSeek moment." Markets sold again.

The logic was straightforward: efficiency improved → GPU demand falls. Markets stopped at the first link in the chain.

But DeepSeek and K2 did not prove the end of scaling laws. They proved that the location of scaling had changed.

Compute that had been concentrated in pre-training — building large models once — shifted toward inference-time reinforcement learning, iterative refinement, tool use, and agentic workflows. The principle itself remained intact: more parameters, more data, and more compute still produce capability jumps. What changed was where that compute began to be consumed.

Cheaper model development does not necessarily reduce the economic value of compute. It often broadens the number of products, services, and workflows that can consume it. Lower cost per model run can translate into more models in production, more real-time inference, and much more downstream usage. NVIDIA recovered most of its losses within roughly a month.

What markets initially read as demand destruction turned out to be demand migration.

SaaSpocalypse — Agents Automate Judgment. They Cannot Automate Accountability.

In February 2026, fears around Anthropic-driven workflow automation triggered roughly $285 billion in selloffs across software and financial names in a single day. Similar concerns resurfaced in March. Workday, ADP, Intuit, and ServiceNow were all hit as investors priced in a world where AI agents would replace enterprise software altogether.

Again, markets stopped too early: agents perform software functions → software becomes unnecessary.

But enterprise software sits behind at least three layers of lock-in.

First, architectural lock-in. A payroll engine like Workday is not just a UI. It is deeply embedded server-side business logic: tax rules, labor regulations, exception handling, approvals, and country-specific workflows across dozens of jurisdictions. An agent may replace the human at the input layer. It does not automatically replace the system executing the rules.

Second, legal and institutional lock-in. When an AI agent makes a payroll or compliance mistake, who is accountable? Enterprises still need controls, approvals, audit trails, and traceability. SOX environments still require evidence of review and internal control. The EU AI Act pushes in the same direction for high-risk systems. The insurance market is moving similarly — drawing clearer lines around AI-related liability and reinforcing the need for deterministic systems in core processes. Probabilistic AI outputs do not remove the need for those requirements.

Third, structural demand expansion. The more agents enterprises deploy, the more probabilistic processes they introduce into workflows that were previously human-driven. That does not reduce the need for deterministic systems at the core. It increases it. The more agentic the front end becomes, the more valuable the system becomes that can validate, record, authorize, reconcile, and defend the result.

Agents automate judgment. They cannot automate accountability. Markets imagined the death of enterprise software. What may actually strengthen is the value of the system of record underneath it.

TurboQuant — 6x Compression, 10x–15x Demand Growth

Last week, Google Research released TurboQuant, an algorithm that compresses KV-cache memory by 6x. The headline alone was enough to pressure memory-related semiconductor names, including Samsung Electronics and SK Hynix.

Once again: memory efficiency improved → memory demand falls → sell memory.

Start with the technical distinction. LLM inference has two broad phases. Prefill processes the input context and is compute-heavy. Decode generates tokens one by one while repeatedly referencing the previously computed KV cache; it is memory-heavy. TurboQuant targets that decode-side KV cache.

What it does not directly touch matters just as much. It does not address model weights in training. It does not make training workloads disappear. It does not cut total memory demand across all workloads by one-sixth. It addresses a specific bottleneck inside a specific stage of inference.

Markets saw what was compressed. They missed what was expanding much faster.

Over the past two decades, server compute performance has grown by roughly 3x every two years, while memory bandwidth has increased by only about 1.6x over the same period. In decode-heavy inference, a large share of time is often spent waiting on memory rather than compute. This is not a trivial software problem. It is a structural imbalance.

Now add the demand side. Context windows have expanded from roughly 128K-token contexts to 1M–2M-token ranges in commercial models over the past one to two years. Google DeepMind has demonstrated research experiments extending this toward 10M tokens. Larger context windows drive KV-cache demand sharply higher.

Meanwhile, agentic workflows generate KV cache at a fundamentally different scale than simple conversations. An agent executing a multi-step workflow — calling tools, tracking intermediate state, revisiting context, and coordinating multiple decisions — produces orders of magnitude more KV-cache footprint than a single chat exchange. If agent volumes scale by orders of magnitude, a 6x efficiency gain can be absorbed surprisingly quickly.

The market's mistake was not misreading the compression ratio. It was ignoring the expansion rate on the other side.

One Structure Behind Three Shocks

Efficiency improves → unit costs fall → usage explodes → bottlenecks shift → whoever owns the new bottleneck captures the value.

Markets stop at link one. Investors need to get to link five.

DeepSeek moved compute demand from training toward inference. SaaSpocalypse changed who uses enterprise software — from humans to agents — while making systems of record more important, not less. TurboQuant compressed one layer of inference memory, even as longer context windows and agentic workloads threatened to absorb those savings and then some.

In all three cases, the bottleneck did not disappear. It moved.

To be clear, not every asset survives. Some UI-layer SaaS products will come under real pressure. Some hardware categories may face genuine short-term compression. But markets keep making the same category error: they extrapolate product-level weakness into industry-level extinction.

Technology usually relocates demand more often than it destroys it. AI lowers the cost of intelligence. It does not eliminate the scarcity of physical infrastructure, institutional frameworks, or accountability systems.

That is where value concentrates.

The winners of the AI era will not be the companies that sell intelligence itself. They will be the companies that own the bottlenecks intelligence must pass through.

What Comes Next

This essay raises the next question: where, exactly, are those bottlenecks now?

I'm a Partner at Gorilla PE, a Korea-based private equity fund focused on pre-IPO investments in US technology companies. In this newsletter, I write about where value actually accrues as AI reshapes industry structure.

Next essay: "The bottleneck intelligence must pass through — the Memory Wall."

Gorilla Private Equity

Insight

Intelligence Got Cheap. Bottlenecks Got Expensive. - What DeepSeek, SaaSpocalypse, and TurboQuant Reveal About Markets