Executive Summary

What This Report Is

This is a supply chain research report on AI infrastructure. It answers one question: of the $725 billion that the world’s largest technology companies plan to spend building AI systems in 2026, where does the money actually go, and where does it get stuck?

The answer is a map. The following chapters trace the physical supply chain from raw quartz in North Carolina to operating GPU clusters in Virginia, covering every layer in between: semiconductor materials, lithography equipment, chip design, foundries, memory, packaging, networking, photonics, power generation, cooling, construction, and deployment. Each bottleneck is scored using engineering risk methodology (FMEA).

The Core Argument

The AI buildout is constrained by physics, not capital. Money is abundant. What is scarce is the physical capacity to convert that money into operational AI infrastructure. At every layer of the supply chain, a small number of companies control irreplaceable steps. Removing any one of four companies (TSMC, ASML, Carl Zeiss SMT, Ajinomoto) would halt global AI chip production with no workaround on any relevant timeframe. A fifth dependency, Spruce Pine high-purity quartz (operated by Sibelco and The Quartz Corp, 70-90% of semiconductor-grade supply), has synthetic alternatives at 5-10x the cost that would take years to scale; its disruption would cause a severe supply shock rather than a permanent halt, but the cost and timeline implications would cascade through the entire semiconductor chain. These are not competitive advantages; they are physical monopolies rooted in materials science, precision engineering, and decades of accumulated process knowledge.

The Supply Chain in Four Pillars

For readers new to the subject, the AI infrastructure supply chain can be understood as three enabling supply chains that converge at a fourth integration point.

Semiconductors (Chapters 2-9) Raw silicon, specialty chemicals, lithography equipment, chip design, foundry fabrication, memory, and advanced packaging. This is where sand becomes a GPU. The critical chokepoints are ASML (sole EUV lithography supplier), TSMC (sole high-volume advanced node foundry), and Ajinomoto (sole ABF substrate film manufacturer). A single company in each case. Three companies (SK Hynix, Samsung, Micron) produce all of the world’s HBM memory; supply is fully allocated through 2026.

Networking & Connectivity (Chapters 10-12) Switch silicon, optical transceivers, fiber optic cables, and connectors that link thousands of GPUs into a single training cluster. A 100,000-GPU cluster requires over 100,000 optical transceivers at the server-to-switch layer alone, with additional transceivers at every spine and inter-cluster link. Broadcom supplies the switch silicon in virtually every AI data center regardless of which networking vendor wins the contract.

Power & Cooling (Chapters 13-16) Electricity generation, transformers, switchgear, UPS systems, and thermal management. This is where the supply chain hits its hardest wall. Transformer lead times have reached 128-144 weeks (2.5-3 years). The US manufactures only 20% of its power transformers domestically. Cleveland-Cliffs operates the only US facility producing the specialty steel (GOES) that transformer cores require. Liquid cooling is now mandatory for AI racks above 40 kW; air cooling is physically insufficient.

Data Centers (Chapters 17-20) The integration point where all three supply chains converge. Physical construction, operators, servers, storage, and system software. $156 billion in data center projects are delayed or blocked by permitting, power, and community opposition.

Edge inference (Chapter 21) partially bypasses the data center stack entirely. Smartphones, autonomous vehicles, and defense systems consume chips from the same semiconductor supply chain but deploy them outside data centers, compounding demand on the upstream bottlenecks without passing through the integration point.

Five Findings

1. The real fragility is in companies nobody watches. Carl Zeiss SMT and Trumpf (both private, both in Germany) scored the highest risk in our analysis, higher than TSMC or NVIDIA. They make components inside ASML’s EUV lithography machines that nobody else can make. Zeiss, Trumpf, and ASML are clustered within 200 km of each other. Three points of failure in one geographic region controlling all leading-edge chip manufacturing on earth.

2. Apparent redundancy is illusory. Three companies make HBM memory. That sounds diversified. But all three depend on the same TSMC packaging, the same Ajinomoto substrate film, the same ASML lithography. A disruption at the packaging layer disables all three memory suppliers simultaneously. The surface structure suggests resilience. The deep structure reveals fragility.

3. Bottlenecks migrate over time. In 2025-2026, the binding constraint is HBM memory and CoWoS packaging. By 2027-2028, those ease as capacity expands. But transformer lead times and grid interconnection queues (4-10 years) persist. The constraint migrates from semiconductor scarcity to power scarcity. Positioning for the 2025 bottleneck is different from positioning for the 2027 bottleneck.

4. The capex cycle is self-reinforcing until it hits a wall. Cheaper AI compute opens new use cases, which drive more demand, which justifies more spending. This cycle breaks at one of three walls: financing (negative free cash flow at hyperscalers, $1.5T projected debt), permitting (200+ opposition groups, 14 state moratoriums), or revenue (if AI fails to deliver returns justifying $725B/year in spending).

5. Edge inference amplifies the same bottlenecks. Smartphones, autonomous vehicles, and defense systems all require chips from the same TSMC fabs, using the same ASML lithography, consuming the same materials. TSMC’s non-data-center revenue (smartphones 29%, IoT 5%, automotive 5%) represents 40% of advanced node demand competing for the same constrained capacity. Edge AI does not have its own supply chain. It shares this one.