Chapter 6: Chip Design, GPUs, CPUs & Custom Accelerators

6.1 Overview

This chapter covers the companies that design the silicon at the center of the AI buildout. The chip designers are the direct customers of the EDA vendors (Chapter 5), the foundries (Chapter 7), the memory makers (Chapter 8), and the packaging houses (Chapter 9). They consume the equipment and materials described in Chapters 3 and 4. Their products go into the servers and compute platforms described in Chapter 18.

The AI accelerator market has two structural layers. The first is the merchant GPU market, dominated by NVIDIA with an estimated 80-92% share of data center AI accelerators ¹²¹, followed by AMD’s Instinct line. The second is the custom ASIC market, where hyperscalers design their own chips (Google TPU, Amazon Trainium, Microsoft Maia, Meta MTIA) with the help of ASIC design partners, primarily Broadcom (~70% share of custom AI accelerators) ¹⁴⁶⁷ and Marvell (~25%) ⁷¹⁴. These two layers are complementary rather than substitutional. Hyperscalers buy NVIDIA GPUs for frontier training while deploying custom ASICs for cost-optimized inference and specific workloads.

The AI buildout creates demand for chip design in four ways. First, training frontier models requires the most powerful GPUs available (NVIDIA Blackwell, AMD MI350). Second, inference at scale requires cost-efficient accelerators (custom ASICs, inference-optimized GPUs). Third, the explosion of AI workloads drives demand for high-performance server CPUs (AMD EPYC, Intel Xeon, Arm Neoverse-based custom CPUs). Fourth, every hyperscaler is investing in custom silicon to reduce dependence on a single GPU vendor and optimize total cost of ownership.

6.2 Market Sizing & Growth

AI accelerator market: NVIDIA’s data center revenue, the best proxy for the AI GPU market, reached $197.3 billion for full FY2026 (ending January 2026), up 71% year-over-year. Total NVIDIA revenue for FY2026 was $215.9 billion (+65% YoY). Q4 FY2026 alone was $68.1 billion in total revenue ($62.3 billion data center, +75% YoY), and Q1 FY2027 guidance is $78.0 billion ¹²³.

Custom AI ASIC market: Broadcom’s AI semiconductor revenue reached $8.4 billion in Q1 FY2026 (up 106% YoY), with Q2 guidance of $10.7 billion ⁴. Analysts project Broadcom’s AI semiconductor revenue at $46 billion in calendar 2026, driven by a $73 billion backlog from hyperscalers ⁵. TrendForce projects custom chip sales will increase 45% in 2026, compared with 16% growth in GPU shipments ⁶. Broadcom and Marvell together control approximately 95% of the custom ASIC co-design market ⁷.

Server CPU market (agentic AI inflection): AMD’s Data Center segment revenue was $16.6 billion in FY2025 (+32% YoY), driven by EPYC CPU and Instinct GPU sales ⁸⁹. In Q1 2026, AMD posted data center revenue of $5.8 billion, surpassing Intel’s Data Center and AI group ($5.1 billion) for the first time ¹⁷. AMD’s server CPU market share has grown to approximately 29%, up from single digits five years ago ¹⁰. Intel remains the volume leader in server CPUs but is losing share steadily.

The structural driver is agentic AI. As AI workloads shift from monolithic training runs to multi-step agentic execution (planning, tool calling, memory management, parallel sub-agent orchestration), the CPU:GPU ratio is changing fundamentally. Historically, AI data centers operated at approximately 1 CPU per 12 GPUs. AMD CEO Lisa Su stated the ratio is “changing from 1:4 or 1:8 to getting closer to 1:1” for agentic configurations ¹⁷¹⁸. NVIDIA’s Rubin platform approaches a 1:2 CPU:GPU ratio, and the aggressive “Rubin Ultra” configuration could invert it entirely to 2 CPUs per GPU ²⁰. Morgan Stanley (April 2026) projects agentic AI will create $32.5-60 billion of incremental CPU TAM by 2030, bringing the total data center CPU market to $82.5-110 billion. This represents a structural value shift: CPU-side orchestration accounts for 50-90% of total task execution latency in agentic systems, making the CPU the performance bottleneck rather than the GPU ²⁰. Without additional foundry capacity, Morgan Stanley estimates the incremental agentic CPU demand will widen the 2030 supply gap from 7% to 15% ²⁰.

AMD raised its long-term server CPU market outlook to $120 billion by 2030 (35% CAGR, up from prior 18% forecast) explicitly citing agentic AI demand ¹⁷. Lisa Su described the mechanism on AMD’s Q1 2026 call: “As inferencing scales and you do more — you have more agents and Agentic AI, they all require CPUs for all of the orchestration and the data processing and these other tasks” ²³. Server CPU prices have risen 10-20% since March 2026, with lead times stretching to six months as Intel shifts production from consumer to Xeon and TSMC prioritizes GPU/ASIC capacity over CPU wafers ¹⁸¹⁹. AMD stock rose over 88% from March 2026 lows; Intel surged over 160% from the same trough, reflecting the market’s repricing of CPU relevance in the agentic era ²⁰.

6.3 Supply Chain Flowchart

CHIP DESIGN COMPANIES
    |
    |---> MERCHANT GPU / ACCELERATOR DESIGNERS
    |    NVIDIA: Blackwell (B200, GB200), Rubin (2026), Grace CPU
    |    AMD: Instinct MI350/MI400, EPYC CPU, Pensando DPU
    |    Intel: Gaudi accelerator, Xeon CPU, Falcon Shores (planned)
    |
    |---> CUSTOM ASIC DESIGN (Hyperscaler In-House + Partners)
    |    Google TPU ---- designed with Broadcom, fab at TSMC
    |    Amazon Trainium/Inferentia ---- designed with Marvell (Annapurna Labs)
    |    Microsoft Maia ---- designed with Marvell, fab at TSMC
    |    Meta MTIA ---- designed with Broadcom (4 generations planned)
    |    OpenAI custom chip ---- designed with Broadcom (rumored), fab at TSMC
    |    ByteDance custom accelerator ---- designed with Broadcom
    |
    |---> CUSTOM ASIC DESIGN PARTNERS
    |    Broadcom: ~70% share of custom AI ASIC market
    |    Marvell: ~25% share (Amazon Trainium, Microsoft Maia, Google Axion)
    |    MediaTek: emerging (Google partnership for custom chips)
    |
    |---> AI CHIP STARTUPS
    |    Cerebras Systems (wafer-scale engine) -- IPO filing April 2026
    |    Groq (LPU for inference) -- Acquired by NVIDIA (~$20B, Dec 2025)
    |    SambaNova (RDU for enterprise AI) -- Private (~$5B+, $350M raise)
    |    Tenstorrent (RISC-V based AI) -- Private (~$3.2B, Jim Keller-led)
    |    SiFive (RISC-V CPU + AI core IP) -- Private ($3.65B, Apr 2026)
    |    Graphcore -- acquired by SoftBank (2024)
    |    d-Matrix (digital in-memory compute) -- Private
    |
    +---> CPU/SOC DESIGNERS (for AI-adjacent server workloads)
         Arm Holdings: CPU architecture IP [See Chapter 5]
         Qualcomm: Arm-based server CPUs (Qualcomm Cloud AI 100)
         Ampere Computing: Arm-based cloud-native CPUs
              |
              v
         EDA TOOLS (Chapter 5) --> FOUNDRIES (Chapter 7: TSMC, Samsung, Intel)
              |                         |
              v                         v
         PACKAGING (Chapter 9)      MEMORY (Chapter 8: HBM, DDR5)
              |
              v
         SERVER INTEGRATION (Chapter 18: Dell, HPE, Supermicro, ODMs)

6.4 Key Companies

6.4.1 Merchant GPU / AI Accelerator Companies

Company	Ticker	Exchange	Approx. Mkt Cap	Role	Key Metric
NVIDIA	NVDA	NASDAQ	~$5.2T	Dominant AI GPU maker; 80-92% AI accelerator market share	Q3 FY2026 revenue $57.0B (DC $51.2B, +66% YoY); FY2026 DC revenue $197.3B (+71% YoY); Blackwell “sold out”; $500B+ order backlog for Blackwell+Rubin ¹²³
AMD	AMD	NASDAQ	~$742B	#2 AI accelerator (Instinct MI350/MI400); #2 server CPU (EPYC); DPU (Pensando)	FY2025 revenue $34.6B (+34%); DC segment $16.6B (+32%); MI350 with 288GB HBM3E; OpenAI 6GW AMD GPU deal; acquired ZT Systems ⁸⁹¹⁰
Intel	INTC	NASDAQ	~$628B	Server CPU (Xeon); AI accelerator (Gaudi); foundry services (Intel Foundry)	Gaudi accelerators struggled to gain traction (<$500M revenue target missed in 2024); Xeon losing share to EPYC; Intel 18A foundry process in pilot ¹⁰¹¹

NVIDIA is the central company of the entire AI buildout thesis. Its data center revenue has grown from roughly $15 billion in FY2024 to $197.3 billion in FY2026, a more than 13x increase in two years ¹²³. The Blackwell architecture (GB200 NVL72, B200) ramps as the primary AI training platform, while the Rubin architecture (unveiled CES 2026) features a rack-scale design with 72 GPUs, optimized for energy efficiency (40% better per watt) ¹². NVIDIA’s moat extends beyond hardware. The CUDA software ecosystem (20+ years of developer tools, libraries, frameworks) creates switching costs that AMD’s ROCm has struggled to overcome. NVIDIA has also expanded into networking (NVLink, InfiniBand, Spectrum-X Ethernet) and CPU (Grace, based on Arm Neoverse).

Strategic investments: NVIDIA invested $2B each in Coherent and Lumentum (photonics, Chapter 11) in early 2026 (see Chapter 3). Partnership with OpenAI for 10+ GW of NVIDIA infrastructure. Partnership with Anthropic for 1 GW of Grace Blackwell and Vera Rubin systems. Collaboration with Intel on NVLink-based custom products ².

AMD has emerged as the credible #2 in AI accelerators. The MI350 series (CDNA 4 architecture, 288GB HBM3E, 8 TB/s bandwidth, 35x inference improvement over prior generation) launched mid-2025 and drove strong Data Center revenue growth ⁸⁹. Eight of the world’s top ten AI companies now use AMD Instinct accelerators for production workloads ⁹. OpenAI announced AMD as a “core preferred partner” to deploy 6 GW of AMD GPUs, with the first 1 GW of MI450 GPUs beginning H2 2026 ¹¹. AMD’s acquisition of ZT Systems (closed 2025) transforms it from a component vendor into a full-stack data center solution provider ¹⁰.

Intel’s position in AI accelerators is the weakest of the three major chip companies. The Gaudi series has not gained significant traction, falling short of its modest $500 million revenue target in 2024 due to software maturity issues ¹⁰¹¹. Intel’s primary relevance to the AI buildout is through its Xeon server CPU line (still the volume leader despite share losses) and Intel Foundry Services (which provides manufacturing alternatives to TSMC). Intel’s 18A process node is in pilot production with customers like NVIDIA and Arm-based designers (see Chapter 7).

6.4.2 Custom ASIC Design Partners

Company	Ticker	Exchange	Approx. Mkt Cap	Role	Key Metric
Broadcom	AVGO	NASDAQ	~$2.0T	Dominant custom AI ASIC partner (~70% share); networking ASICs; VMware	Q1 FY2026 AI revenue $8.4B (+106% YoY); Q2 guide $10.7B; $73B AI backlog; partners: Google (TPU), Meta (MTIA, 4 gens), ByteDance, OpenAI (rumored), Apple (rumored) ⁴⁵¹³
Marvell Technology	MRVL	NASDAQ	~$149B	#2 custom AI ASIC partner (~25% share); networking, storage controllers	FY2026 (Feb) data center revenue $6.1B (record); total revenue $8.2B (+42%); partners: Amazon (Trainium), Microsoft (Maia), Google (Axion CPU, new inference TPU), Meta (DPU) ⁶¹⁴
MediaTek	2454	TWSE	~$60.0B	Emerging ASIC design partner for Google; dominant in smartphone AP	Google partnership for custom chip design; strong in advanced node design (TSMC N3)

Broadcom is the “arms dealer” of the hyperscaler custom silicon race. It holds approximately 70% of the custom AI accelerator co-design market ⁷¹³. Google alone spends an estimated $8B/year with Broadcom on TPU development ⁷. Meta announced an expanded partnership for four new generations of MTIA chips with a 1GW+ deployment commitment ¹³. Broadcom’s CEO Hock Tan transitioned from Meta’s board to an advisor role given the scale of the relationship ¹³. The commercial pattern is now established: hyperscalers use NVIDIA GPUs for frontier training and Broadcom-designed custom ASICs for high-volume inference.

Marvell is the emerging #2, with a fast-growing custom AI silicon business. Its partners include Amazon (Trainium/Inferentia), Microsoft (Maia), and Google (Axion Arm CPU, plus new inference TPU in discussions) ⁶¹⁴. Marvell’s data center revenue reached a record $6.1 billion in FY2026 (Feb), with total revenue of $8.2 billion (+42% YoY) ⁶. The key strategic question is whether Marvell can capture enough hyperscaler design wins to narrow the gap with Broadcom.

6.4.3 Hyperscaler Custom Silicon Programs

Company	Program	Status	Key Metric	ASIC Partner
Google (Alphabet)	TPU (Tensor Processing Unit)	Production; v7p latest; >75% of Gemini inference on TPUs	Anthropic committed to ~1M TPUs; deployed at massive scale	Broadcom (primary), Marvell (new inference TPU), MediaTek
Amazon (AWS)	Trainium / Inferentia	Production; Trainium3 ramping (3nm, 144GB HBM3E)	500K+ Trainium chips deployed; “UltraServer” 144-chip rack	Marvell (Annapurna Labs, AWS subsidiary)
Microsoft	Maia	Early production; Maia 100 deployed	Custom for Azure AI workloads	Marvell
Meta	MTIA	Production (v2 “Artemis” for inference); training chip in 2026 roadmap	4 new generations with Broadcom; 1GW+ deployment	Broadcom
OpenAI	Custom accelerator	In development	Reported $10B commitment; design partner Broadcom (rumored)	Broadcom (rumored)
ByteDance	Custom accelerator	In development	Broadcom partner	Broadcom

This table reveals the real dependency graph. Every major hyperscaler is simultaneously buying NVIDIA GPUs and developing custom silicon with Broadcom or Marvell. The custom chip programs are not replacing NVIDIA; they are supplementing it for cost-optimized inference. TSMC fabricates all of these custom chips at advanced nodes (3nm and below); no hyperscaler custom ASIC uses an alternative foundry ¹⁵.

6.4.4 AI Chip Startups

Company	Ticker	Exchange	Approx. Mkt Cap	Role	Key Metric
Cerebras Systems	CBRS (pending)	NASDAQ (IPO filing)	~$22-25B (S-1 April 2026)	Wafer-scale AI engine (entire wafer as single chip)	S-1 filed Apr 17, 2026; ~$510M revenue; $10B+ OpenAI partnership
Groq	N/A	Acquired	~$20.0B (NVIDIA, Dec 2025)	LPU (Language Processing Unit) for ultra-low-latency inference	Acquired by NVIDIA Dec 2025; IP and talent absorbed into NVIDIA inference stack
Tenstorrent	Private	Private	~$2.6B (Series D, Dec 2024)	RISC-V based AI accelerator and CPU IP; led by Jim Keller	TT-Ascalon RISC-V CPU IP; Samsung/LG/Hyundai designing on platform; acquired Blue Cheetah Analog (UCIe/BoW) Jul 2025
SambaNova	Private	Private	~$5.0B (2025 raise)	RDU for enterprise AI; full-stack AI platform	SN50 chip (Feb 2026, 5x compute vs SN40L); $350M raise 2025
SiFive	Private	Private	~$3.6B (Series G, Apr 2026)	RISC-V CPU + AI core IP licensing; XM matrix accelerator	$400M Series G; Google TPU partnership; pre-IPO stage
Astera Labs	ALAB	NASDAQ	~$34.2B	CXL smart memory controllers; PCIe retimers; AI fabric switches	Revenue $852M (2025, +115% YoY); Scorpio 320-lane fabric switch
d-Matrix	Private	Private	~$500M	Digital in-memory compute for inference	Pre-revenue; targeting inference efficiency

6.4.5 Server CPU Designers

Company	Ticker	Exchange	Approx. Mkt Cap	Role	Key Metric
AMD	AMD	NASDAQ	~$742B	EPYC server CPUs; ~29% server market share and growing	5th Gen EPYC “Turin”; cloud instances nearly doubled in 2 years
Intel	INTC	NASDAQ	~$628B	Xeon server CPUs; still volume leader despite share losses	Xeon losing share; 18A process node in pilot; Gaudi AI accelerator underperforming
Ampere Computing	Private	Private (acquired by SoftBank)	~$6.5B (acquisition price)	Arm-based cloud-native CPUs; Oracle partnership	AmpereOne with 192 cores; designed for cloud workloads. SoftBank acquired Ampere (Nov 2025, $6.5B all-cash) to complement its Arm Holdings stake.
Qualcomm	QCOM	NASDAQ	~$200B	Snapdragon server/AI chips; Arm-based. Acquired Alphawave Semi (~$2.4B ¹⁵, Dec 2025) for connectivity IP.	Cloud AI 100 Ultra for inference; primary revenue from mobile

6.4.6 Chinese AI Chip Ecosystem

Company	Ticker	Exchange	Approx. Mkt Cap	Role	Key Metric
Huawei (HiSilicon)	Private	Private	Private (Huawei group)	Ascend 910C/910D AI accelerators; China’s primary NVIDIA alternative	Sole-source for Chinese hyperscalers under US export controls. Ascend 910C reportedly used in Huawei CloudMatrix 384 system. Performance lags H100 but improving.
Cambricon Technologies	688256	SSE STAR	~$15.0B	AI inference and training chips (MLU370, MLU590)	Founded by former Chinese Academy of Sciences researchers. Originally designed Huawei’s first neural processing unit. Revenue growing but profitability remains elusive.
Biren Technology	Private	Private	~$2.0B (est.)	BR100/BR104 GPUs targeting training workloads	Added to US Entity List. Claimed performance comparable to A100 but independent verification limited. Fab access constrained by export controls.
Moore Threads	Private	Private	~$3.0B (est.)	MTT S4000 GPUs for inference and graphics	Added to US Entity List. Focused on inference and domestic cloud deployment. Limited to mature process nodes.
MetaX Integrated Circuits	Private	Private	~$2.0B (est.)	C-Series GPUs for AI training	Shenzhen-based. Claims CUDA compatibility layer. Limited public benchmarks.
Enflame Technology	Private	Private	~$2.0B (est.)	CloudBlazer DTU training accelerators	Backed by Tencent and state funds. Deployed in Tencent Cloud. Pre-revenue scale.

The Chinese AI chip ecosystem exists because US export controls have cut Chinese hyperscalers off from NVIDIA’s most advanced GPUs (H100, A100, and now H20 restricted). Huawei’s Ascend line is the most credible domestic alternative, backed by the resources of the broader Huawei organization. The others (Cambricon, Biren, Moore Threads, MetaX, Enflame) are smaller and constrained by their inability to access leading-edge foundry capacity (TSMC is off-limits; SMIC can fabricate at 7nm but with yield penalties). The collective significance of these companies is less about their individual competitiveness and more about the pace at which China can build a parallel, self-contained AI chip supply chain. That pace remains slow by global standards, but the direction of investment is clear.

6.5 Bottleneck Analysis

NVIDIA GPU supply (EXTREME, moderated by packaging). NVIDIA’s AI accelerator supply is the most sought-after resource in the global technology industry. Jensen Huang stated “Blackwell sales are off the charts, and cloud GPUs are sold out” ². CFO Colette Kress reinforced the persistence of this constraint: “While we expect tightness in the supply for our advanced architectures to persist, we remain confident in our ability to capitalize on the growth opportunity ahead” ²¹. The binding constraint, however, is not NVIDIA’s chip design capacity but TSMC’s manufacturing and CoWoS packaging capacity (see Chapters 7, 9). NVIDIA’s $500B+ backlog for Blackwell and Rubin exceeds what can be delivered in the near term. This creates a pricing power dynamic where NVIDIA can charge premium prices and allocate supply preferentially.

CUDA software ecosystem (SEVERE for competitors): NVIDIA’s CUDA platform represents 20+ years of developer tools, optimized libraries (cuDNN, TensorRT, NCCL), and framework integration (PyTorch, TensorFlow). AMD’s ROCm is making progress (ROCm 7.0 delivered ~3.5x inference uplift on MI300X ¹⁰) but remains behind in maturity and ecosystem breadth. For chip designers competing with NVIDIA, the software moat is arguably harder to breach than the hardware gap.

Custom ASIC design capacity (MODERATE-HIGH): Only two companies (Broadcom and Marvell) can design custom AI accelerators at scale for hyperscalers. The combined 95% market share means hyperscalers have limited options ⁷. Broadcom CEO Hock Tan confirmed the demand surge: “We have never seen bookings of the nature that what we have seen over the past three months” ²². Design cycles for custom ASICs take 18-24 months from specification to tapeout, creating pipeline constraints. However, the existence of two credible providers and the hyperscalers’ own growing internal design teams (Annapurna Labs at AWS, Google’s chip design group) prevents monopoly-level pricing.

Advanced node access at TSMC (SEVERE): All of the chips described in this chapter are fabricated at TSMC (3nm, 5nm). NVIDIA, AMD, Broadcom, Marvell, Google, Amazon, and Apple all compete for TSMC’s limited leading-edge wafer capacity. TSMC allocates capacity based on long-term relationships and prepayment agreements. A new chip designer (startup or hyperscaler) cannot simply order 3nm wafers without a multi-year capacity agreement. (see Chapter 7)

6.6 Risks

NVIDIA concentration risk: An estimated 80-92% market share in any technology segment is unusual and potentially fragile. AMD’s Instinct line is gaining share. Custom ASICs from hyperscalers offer 40-65% TCO advantages over GPUs for certain inference workloads ⁷. If inference becomes a larger share of AI compute than training (which is widely expected), the addressable market shifts toward cheaper, specialized chips and away from NVIDIA’s premium GPUs. This does not eliminate NVIDIA’s dominance but could compress its market share from 90% to 60-70% over time.

DeepSeek efficiency thesis (see Chapter 1): If efficiency improvements reduce the compute required per AI task faster than demand grows, the total number of GPUs needed could plateau or decline. This is the central bear case for the entire chip design layer.

Custom ASIC cannibalization of GPU demand: As hyperscaler custom silicon programs mature (Google TPU v7p, Amazon Trainium3, Meta MTIA training chip), they could absorb an increasing share of AI compute demand, reducing the need for merchant GPUs. The counterargument: custom ASICs take 2-3 years to design and are optimized for specific workloads, while GPUs offer flexibility. Most hyperscalers will maintain a portfolio approach (GPUs for general-purpose and training, ASICs for high-volume inference).

Intel’s potential recovery: Intel’s 18A process node, if successful, could make Intel Foundry a credible alternative to TSMC for AI chip fabrication. Intel’s Falcon Shores accelerator (combining GPU and HPC capabilities) could improve its AI accelerator position. This is speculative; Intel has a long history of promising turnarounds that underdeliver.

RISC-V disruption of Arm-based server CPUs: If RISC-V processors achieve competitive performance in server workloads, the Arm-based custom CPU trend (Graviton, Grace, Cobalt) could face open-source competition that eliminates licensing costs. This is a 3-5 year horizon risk.

First principles check: Does NVIDIA’s dominance make sense? Yes, on two dimensions. First, the hardware (CUDA cores, Tensor Cores, NVLink interconnect, HBM integration) represents billions in cumulative R&D. Second, and more importantly, the CUDA software ecosystem is a network effect: more developers write for CUDA because more GPUs run CUDA, and more GPUs are deployed because more software is CUDA-optimized. Breaking this cycle requires better hardware and a critical mass of software adoption, both of which take years to build.