Chapter 6
Chip Design
Chapter 6: Chip Design, GPUs, CPUs & Custom Accelerators
6.1 Overview
This chapter covers the companies that design the silicon at the center of the AI buildout. The chip designers are the direct customers of the EDA vendors (Chapter 5), the foundries (Chapter 7), the memory makers (Chapter 8), and the packaging houses (Chapter 9). They consume the equipment and materials described in Chapters 3 and 4. Their products go into the servers and compute platforms described in Chapter 18.
The AI accelerator market has two structural layers. The first is the merchant GPU market, dominated by NVIDIA with an estimated 80-92% share of data center AI accelerators 121, followed by AMD’s Instinct line. The second is the custom ASIC market, where hyperscalers design their own chips (Google TPU, Amazon Trainium, Microsoft Maia, Meta MTIA) with the help of ASIC design partners, primarily Broadcom (~70% share of custom AI accelerators) 1467 and Marvell (~25%) 714. These two layers are complementary rather than substitutional. Hyperscalers buy NVIDIA GPUs for frontier training while deploying custom ASICs for cost-optimized inference and specific workloads.
The AI buildout creates demand for chip design in four ways. First, training frontier models requires the most powerful GPUs available (NVIDIA Blackwell, AMD MI350). Second, inference at scale requires cost-efficient accelerators (custom ASICs, inference-optimized GPUs). Third, the explosion of AI workloads drives demand for high-performance server CPUs (AMD EPYC, Intel Xeon, Arm Neoverse-based custom CPUs). Fourth, every hyperscaler is investing in custom silicon to reduce dependence on a single GPU vendor and optimize total cost of ownership.
6.2 Market Sizing & Growth
AI accelerator market: NVIDIA’s data center revenue, the best proxy for the AI GPU market, reached $197.3 billion for full FY2026 (ending January 2026), up 71% year-over-year. Total NVIDIA revenue for FY2026 was $215.9 billion (+65% YoY). Q4 FY2026 alone was $68.1 billion in total revenue ($62.3 billion data center, +75% YoY), and Q1 FY2027 guidance is $78.0 billion 123.
Custom AI ASIC market: Broadcom’s AI semiconductor revenue reached $8.4 billion in Q1 FY2026 (up 106% YoY), with Q2 guidance of $10.7 billion 4. Analysts project Broadcom’s AI semiconductor revenue at $46 billion in calendar 2026, driven by a $73 billion backlog from hyperscalers 5. TrendForce projects custom chip sales will increase 45% in 2026, compared with 16% growth in GPU shipments 6. Broadcom and Marvell together control approximately 95% of the custom ASIC co-design market 7.
Server CPU market (agentic AI inflection): AMD’s Data Center segment revenue was $16.6 billion in FY2025 (+32% YoY), driven by EPYC CPU and Instinct GPU sales 89. In Q1 2026, AMD posted data center revenue of $5.8 billion, surpassing Intel’s Data Center and AI group ($5.1 billion) for the first time 17. AMD’s server CPU market share has grown to approximately 29%, up from single digits five years ago 10. Intel remains the volume leader in server CPUs but is losing share steadily.
The structural driver is agentic AI. As AI workloads shift from monolithic training runs to multi-step agentic execution (planning, tool calling, memory management, parallel sub-agent orchestration), the CPU:GPU ratio is changing fundamentally. Historically, AI data centers operated at approximately 1 CPU per 12 GPUs. AMD CEO Lisa Su stated the ratio is “changing from 1:4 or 1:8 to getting closer to 1:1” for agentic configurations 1718. NVIDIA’s Rubin platform approaches a 1:2 CPU:GPU ratio, and the aggressive “Rubin Ultra” configuration could invert it entirely to 2 CPUs per GPU 20. Morgan Stanley (April 2026) projects agentic AI will create $32.5-60 billion of incremental CPU TAM by 2030, bringing the total data center CPU market to $82.5-110 billion. This represents a structural value shift: CPU-side orchestration accounts for 50-90% of total task execution latency in agentic systems, making the CPU the performance bottleneck rather than the GPU 20. Without additional foundry capacity, Morgan Stanley estimates the incremental agentic CPU demand will widen the 2030 supply gap from 7% to 15% 20.
AMD raised its long-term server CPU market outlook to $120 billion by 2030 (35% CAGR, up from prior 18% forecast) explicitly citing agentic AI demand 17. Lisa Su described the mechanism on AMD’s Q1 2026 call: “As inferencing scales and you do more — you have more agents and Agentic AI, they all require CPUs for all of the orchestration and the data processing and these other tasks” 23. Server CPU prices have risen 10-20% since March 2026, with lead times stretching to six months as Intel shifts production from consumer to Xeon and TSMC prioritizes GPU/ASIC capacity over CPU wafers 1819. AMD stock rose over 88% from March 2026 lows; Intel surged over 160% from the same trough, reflecting the market’s repricing of CPU relevance in the agentic era 20.
6.3 Supply Chain Flowchart
CHIP DESIGN COMPANIES
|
|---> MERCHANT GPU / ACCELERATOR DESIGNERS
| NVIDIA: Blackwell (B200, GB200), Rubin (2026), Grace CPU
| AMD: Instinct MI350/MI400, EPYC CPU, Pensando DPU
| Intel: Gaudi accelerator, Xeon CPU, Falcon Shores (planned)
|
|---> CUSTOM ASIC DESIGN (Hyperscaler In-House + Partners)
| Google TPU ---- designed with Broadcom, fab at TSMC
| Amazon Trainium/Inferentia ---- designed with Marvell (Annapurna Labs)
| Microsoft Maia ---- designed with Marvell, fab at TSMC
| Meta MTIA ---- designed with Broadcom (4 generations planned)
| OpenAI custom chip ---- designed with Broadcom (rumored), fab at TSMC
| ByteDance custom accelerator ---- designed with Broadcom
|
|---> CUSTOM ASIC DESIGN PARTNERS
| Broadcom: ~70% share of custom AI ASIC market
| Marvell: ~25% share (Amazon Trainium, Microsoft Maia, Google Axion)
| MediaTek: emerging (Google partnership for custom chips)
|
|---> AI CHIP STARTUPS
| Cerebras Systems (wafer-scale engine) -- IPO filing April 2026
| Groq (LPU for inference) -- Acquired by NVIDIA (~$20B, Dec 2025)
| SambaNova (RDU for enterprise AI) -- Private (~$5B+, $350M raise)
| Tenstorrent (RISC-V based AI) -- Private (~$3.2B, Jim Keller-led)
| SiFive (RISC-V CPU + AI core IP) -- Private ($3.65B, Apr 2026)
| Graphcore -- acquired by SoftBank (2024)
| d-Matrix (digital in-memory compute) -- Private
|
+---> CPU/SOC DESIGNERS (for AI-adjacent server workloads)
Arm Holdings: CPU architecture IP [See Chapter 5]
Qualcomm: Arm-based server CPUs (Qualcomm Cloud AI 100)
Ampere Computing: Arm-based cloud-native CPUs
|
v
EDA TOOLS (Chapter 5) --> FOUNDRIES (Chapter 7: TSMC, Samsung, Intel)
| |
v v
PACKAGING (Chapter 9) MEMORY (Chapter 8: HBM, DDR5)
|
v
SERVER INTEGRATION (Chapter 18: Dell, HPE, Supermicro, ODMs)
6.4 Key Companies
6.4.1 Merchant GPU / AI Accelerator Companies
| Company | Ticker | Exchange | Approx. Mkt Cap | Role | Key Metric |
|---|---|---|---|---|---|
| NVIDIA | NVDA | NASDAQ | ~$5.2T | Dominant AI GPU maker; 80-92% AI accelerator market share | Q3 FY2026 revenue $57.0B (DC $51.2B, +66% YoY); FY2026 DC revenue $197.3B (+71% YoY); Blackwell “sold out”; $500B+ order backlog for Blackwell+Rubin 123 |
| AMD | AMD | NASDAQ | ~$742B | #2 AI accelerator (Instinct MI350/MI400); #2 server CPU (EPYC); DPU (Pensando) | FY2025 revenue $34.6B (+34%); DC segment $16.6B (+32%); MI350 with 288GB HBM3E; OpenAI 6GW AMD GPU deal; acquired ZT Systems 8910 |
| Intel | INTC | NASDAQ | ~$628B | Server CPU (Xeon); AI accelerator (Gaudi); foundry services (Intel Foundry) | Gaudi accelerators struggled to gain traction (<$500M revenue target missed in 2024); Xeon losing share to EPYC; Intel 18A foundry process in pilot 1011 |
NVIDIA is the central company of the entire AI buildout thesis. Its data center revenue has grown from roughly $15 billion in FY2024 to $197.3 billion in FY2026, a more than 13x increase in two years 123. The Blackwell architecture (GB200 NVL72, B200) ramps as the primary AI training platform, while the Rubin architecture (unveiled CES 2026) features a rack-scale design with 72 GPUs, optimized for energy efficiency (40% better per watt) 12. NVIDIA’s moat extends beyond hardware. The CUDA software ecosystem (20+ years of developer tools, libraries, frameworks) creates switching costs that AMD’s ROCm has struggled to overcome. NVIDIA has also expanded into networking (NVLink, InfiniBand, Spectrum-X Ethernet) and CPU (Grace, based on Arm Neoverse).
Strategic investments: NVIDIA invested $2B each in Coherent and Lumentum (photonics, Chapter 11) in early 2026 (see Chapter 3). Partnership with OpenAI for 10+ GW of NVIDIA infrastructure. Partnership with Anthropic for 1 GW of Grace Blackwell and Vera Rubin systems. Collaboration with Intel on NVLink-based custom products 2.
AMD has emerged as the credible #2 in AI accelerators. The MI350 series (CDNA 4 architecture, 288GB HBM3E, 8 TB/s bandwidth, 35x inference improvement over prior generation) launched mid-2025 and drove strong Data Center revenue growth 89. Eight of the world’s top ten AI companies now use AMD Instinct accelerators for production workloads 9. OpenAI announced AMD as a “core preferred partner” to deploy 6 GW of AMD GPUs, with the first 1 GW of MI450 GPUs beginning H2 2026 11. AMD’s acquisition of ZT Systems (closed 2025) transforms it from a component vendor into a full-stack data center solution provider 10.
Intel’s position in AI accelerators is the weakest of the three major chip companies. The Gaudi series has not gained significant traction, falling short of its modest $500 million revenue target in 2024 due to software maturity issues 1011. Intel’s primary relevance to the AI buildout is through its Xeon server CPU line (still the volume leader despite share losses) and Intel Foundry Services (which provides manufacturing alternatives to TSMC). Intel’s 18A process node is in pilot production with customers like NVIDIA and Arm-based designers (see Chapter 7).
6.4.2 Custom ASIC Design Partners
| Company | Ticker | Exchange | Approx. Mkt Cap | Role | Key Metric |
|---|---|---|---|---|---|
| Broadcom | AVGO | NASDAQ | ~$2.0T | Dominant custom AI ASIC partner (~70% share); networking ASICs; VMware | Q1 FY2026 AI revenue $8.4B (+106% YoY); Q2 guide $10.7B; $73B AI backlog; partners: Google (TPU), Meta (MTIA, 4 gens), ByteDance, OpenAI (rumored), Apple (rumored) 4513 |
| Marvell Technology | MRVL | NASDAQ | ~$149B | #2 custom AI ASIC partner (~25% share); networking, storage controllers | FY2026 (Feb) data center revenue $6.1B (record); total revenue $8.2B (+42%); partners: Amazon (Trainium), Microsoft (Maia), Google (Axion CPU, new inference TPU), Meta (DPU) 614 |
| MediaTek | 2454 | TWSE | ~$60.0B | Emerging ASIC design partner for Google; dominant in smartphone AP | Google partnership for custom chip design; strong in advanced node design (TSMC N3) |
Broadcom is the “arms dealer” of the hyperscaler custom silicon race. It holds approximately 70% of the custom AI accelerator co-design market 713. Google alone spends an estimated $8B/year with Broadcom on TPU development 7. Meta announced an expanded partnership for four new generations of MTIA chips with a 1GW+ deployment commitment 13. Broadcom’s CEO Hock Tan transitioned from Meta’s board to an advisor role given the scale of the relationship 13. The commercial pattern is now established: hyperscalers use NVIDIA GPUs for frontier training and Broadcom-designed custom ASICs for high-volume inference.
Marvell is the emerging #2, with a fast-growing custom AI silicon business. Its partners include Amazon (Trainium/Inferentia), Microsoft (Maia), and Google (Axion Arm CPU, plus new inference TPU in discussions) 614. Marvell’s data center revenue reached a record $6.1 billion in FY2026 (Feb), with total revenue of $8.2 billion (+42% YoY) 6. The key strategic question is whether Marvell can capture enough hyperscaler design wins to narrow the gap with Broadcom.
6.4.3 Hyperscaler Custom Silicon Programs
| Company | Program | Status | Key Metric | ASIC Partner |
|---|---|---|---|---|
| Google (Alphabet) | TPU (Tensor Processing Unit) | Production; v7p latest; >75% of Gemini inference on TPUs | Anthropic committed to ~1M TPUs; deployed at massive scale | Broadcom (primary), Marvell (new inference TPU), MediaTek |
| Amazon (AWS) | Trainium / Inferentia | Production; Trainium3 ramping (3nm, 144GB HBM3E) | 500K+ Trainium chips deployed; “UltraServer” 144-chip rack | Marvell (Annapurna Labs, AWS subsidiary) |
| Microsoft | Maia | Early production; Maia 100 deployed | Custom for Azure AI workloads | Marvell |
| Meta | MTIA | Production (v2 “Artemis” for inference); training chip in 2026 roadmap | 4 new generations with Broadcom; 1GW+ deployment | Broadcom |
| OpenAI | Custom accelerator | In development | Reported $10B commitment; design partner Broadcom (rumored) | Broadcom (rumored) |
| ByteDance | Custom accelerator | In development | Broadcom partner | Broadcom |
This table reveals the real dependency graph. Every major hyperscaler is simultaneously buying NVIDIA GPUs and developing custom silicon with Broadcom or Marvell. The custom chip programs are not replacing NVIDIA; they are supplementing it for cost-optimized inference. TSMC fabricates all of these custom chips at advanced nodes (3nm and below); no hyperscaler custom ASIC uses an alternative foundry 15.
6.4.4 AI Chip Startups
| Company | Ticker | Exchange | Approx. Mkt Cap | Role | Key Metric |
|---|---|---|---|---|---|
| Cerebras Systems | CBRS (pending) | NASDAQ (IPO filing) | ~$22-25B (S-1 April 2026) | Wafer-scale AI engine (entire wafer as single chip) | S-1 filed Apr 17, 2026; ~$510M revenue; $10B+ OpenAI partnership |
| Groq | N/A | Acquired | ~$20.0B (NVIDIA, Dec 2025) | LPU (Language Processing Unit) for ultra-low-latency inference | Acquired by NVIDIA Dec 2025; IP and talent absorbed into NVIDIA inference stack |
| Tenstorrent | Private | Private | ~$2.6B (Series D, Dec 2024) | RISC-V based AI accelerator and CPU IP; led by Jim Keller | TT-Ascalon RISC-V CPU IP; Samsung/LG/Hyundai designing on platform; acquired Blue Cheetah Analog (UCIe/BoW) Jul 2025 |
| SambaNova | Private | Private | ~$5.0B (2025 raise) | RDU for enterprise AI; full-stack AI platform | SN50 chip (Feb 2026, 5x compute vs SN40L); $350M raise 2025 |
| SiFive | Private | Private | ~$3.6B (Series G, Apr 2026) | RISC-V CPU + AI core IP licensing; XM matrix accelerator | $400M Series G; Google TPU partnership; pre-IPO stage |
| Astera Labs | ALAB | NASDAQ | ~$34.2B | CXL smart memory controllers; PCIe retimers; AI fabric switches | Revenue $852M (2025, +115% YoY); Scorpio 320-lane fabric switch |
| d-Matrix | Private | Private | ~$500M | Digital in-memory compute for inference | Pre-revenue; targeting inference efficiency |
6.4.5 Server CPU Designers
| Company | Ticker | Exchange | Approx. Mkt Cap | Role | Key Metric |
|---|---|---|---|---|---|
| AMD | AMD | NASDAQ | ~$742B | EPYC server CPUs; ~29% server market share and growing | 5th Gen EPYC “Turin”; cloud instances nearly doubled in 2 years |
| Intel | INTC | NASDAQ | ~$628B | Xeon server CPUs; still volume leader despite share losses | Xeon losing share; 18A process node in pilot; Gaudi AI accelerator underperforming |
| Ampere Computing | Private | Private (acquired by SoftBank) | ~$6.5B (acquisition price) | Arm-based cloud-native CPUs; Oracle partnership | AmpereOne with 192 cores; designed for cloud workloads. SoftBank acquired Ampere (Nov 2025, $6.5B all-cash) to complement its Arm Holdings stake. |
| Qualcomm | QCOM | NASDAQ | ~$200B | Snapdragon server/AI chips; Arm-based. Acquired Alphawave Semi (~$2.4B 15, Dec 2025) for connectivity IP. | Cloud AI 100 Ultra for inference; primary revenue from mobile |
6.4.6 Chinese AI Chip Ecosystem
| Company | Ticker | Exchange | Approx. Mkt Cap | Role | Key Metric |
|---|---|---|---|---|---|
| Huawei (HiSilicon) | Private | Private | Private (Huawei group) | Ascend 910C/910D AI accelerators; China’s primary NVIDIA alternative | Sole-source for Chinese hyperscalers under US export controls. Ascend 910C reportedly used in Huawei CloudMatrix 384 system. Performance lags H100 but improving. |
| Cambricon Technologies | 688256 | SSE STAR | ~$15.0B | AI inference and training chips (MLU370, MLU590) | Founded by former Chinese Academy of Sciences researchers. Originally designed Huawei’s first neural processing unit. Revenue growing but profitability remains elusive. |
| Biren Technology | Private | Private | ~$2.0B (est.) | BR100/BR104 GPUs targeting training workloads | Added to US Entity List. Claimed performance comparable to A100 but independent verification limited. Fab access constrained by export controls. |
| Moore Threads | Private | Private | ~$3.0B (est.) | MTT S4000 GPUs for inference and graphics | Added to US Entity List. Focused on inference and domestic cloud deployment. Limited to mature process nodes. |
| MetaX Integrated Circuits | Private | Private | ~$2.0B (est.) | C-Series GPUs for AI training | Shenzhen-based. Claims CUDA compatibility layer. Limited public benchmarks. |
| Enflame Technology | Private | Private | ~$2.0B (est.) | CloudBlazer DTU training accelerators | Backed by Tencent and state funds. Deployed in Tencent Cloud. Pre-revenue scale. |
The Chinese AI chip ecosystem exists because US export controls have cut Chinese hyperscalers off from NVIDIA’s most advanced GPUs (H100, A100, and now H20 restricted). Huawei’s Ascend line is the most credible domestic alternative, backed by the resources of the broader Huawei organization. The others (Cambricon, Biren, Moore Threads, MetaX, Enflame) are smaller and constrained by their inability to access leading-edge foundry capacity (TSMC is off-limits; SMIC can fabricate at 7nm but with yield penalties). The collective significance of these companies is less about their individual competitiveness and more about the pace at which China can build a parallel, self-contained AI chip supply chain. That pace remains slow by global standards, but the direction of investment is clear.
6.5 Bottleneck Analysis
NVIDIA GPU supply (EXTREME, moderated by packaging). NVIDIA’s AI accelerator supply is the most sought-after resource in the global technology industry. Jensen Huang stated “Blackwell sales are off the charts, and cloud GPUs are sold out” 2. CFO Colette Kress reinforced the persistence of this constraint: “While we expect tightness in the supply for our advanced architectures to persist, we remain confident in our ability to capitalize on the growth opportunity ahead” 21. The binding constraint, however, is not NVIDIA’s chip design capacity but TSMC’s manufacturing and CoWoS packaging capacity (see Chapters 7, 9). NVIDIA’s $500B+ backlog for Blackwell and Rubin exceeds what can be delivered in the near term. This creates a pricing power dynamic where NVIDIA can charge premium prices and allocate supply preferentially.
CUDA software ecosystem (SEVERE for competitors): NVIDIA’s CUDA platform represents 20+ years of developer tools, optimized libraries (cuDNN, TensorRT, NCCL), and framework integration (PyTorch, TensorFlow). AMD’s ROCm is making progress (ROCm 7.0 delivered ~3.5x inference uplift on MI300X 10) but remains behind in maturity and ecosystem breadth. For chip designers competing with NVIDIA, the software moat is arguably harder to breach than the hardware gap.
Custom ASIC design capacity (MODERATE-HIGH): Only two companies (Broadcom and Marvell) can design custom AI accelerators at scale for hyperscalers. The combined 95% market share means hyperscalers have limited options 7. Broadcom CEO Hock Tan confirmed the demand surge: “We have never seen bookings of the nature that what we have seen over the past three months” 22. Design cycles for custom ASICs take 18-24 months from specification to tapeout, creating pipeline constraints. However, the existence of two credible providers and the hyperscalers’ own growing internal design teams (Annapurna Labs at AWS, Google’s chip design group) prevents monopoly-level pricing.
Advanced node access at TSMC (SEVERE): All of the chips described in this chapter are fabricated at TSMC (3nm, 5nm). NVIDIA, AMD, Broadcom, Marvell, Google, Amazon, and Apple all compete for TSMC’s limited leading-edge wafer capacity. TSMC allocates capacity based on long-term relationships and prepayment agreements. A new chip designer (startup or hyperscaler) cannot simply order 3nm wafers without a multi-year capacity agreement. (see Chapter 7)
6.6 Risks
NVIDIA concentration risk: An estimated 80-92% market share in any technology segment is unusual and potentially fragile. AMD’s Instinct line is gaining share. Custom ASICs from hyperscalers offer 40-65% TCO advantages over GPUs for certain inference workloads 7. If inference becomes a larger share of AI compute than training (which is widely expected), the addressable market shifts toward cheaper, specialized chips and away from NVIDIA’s premium GPUs. This does not eliminate NVIDIA’s dominance but could compress its market share from 90% to 60-70% over time.
DeepSeek efficiency thesis (see Chapter 1): If efficiency improvements reduce the compute required per AI task faster than demand grows, the total number of GPUs needed could plateau or decline. This is the central bear case for the entire chip design layer.
Custom ASIC cannibalization of GPU demand: As hyperscaler custom silicon programs mature (Google TPU v7p, Amazon Trainium3, Meta MTIA training chip), they could absorb an increasing share of AI compute demand, reducing the need for merchant GPUs. The counterargument: custom ASICs take 2-3 years to design and are optimized for specific workloads, while GPUs offer flexibility. Most hyperscalers will maintain a portfolio approach (GPUs for general-purpose and training, ASICs for high-volume inference).
Intel’s potential recovery: Intel’s 18A process node, if successful, could make Intel Foundry a credible alternative to TSMC for AI chip fabrication. Intel’s Falcon Shores accelerator (combining GPU and HPC capabilities) could improve its AI accelerator position. This is speculative; Intel has a long history of promising turnarounds that underdeliver.
RISC-V disruption of Arm-based server CPUs: If RISC-V processors achieve competitive performance in server workloads, the Arm-based custom CPU trend (Graviton, Grace, Cobalt) could face open-source competition that eliminates licensing costs. This is a 3-5 year horizon risk.
First principles check: Does NVIDIA’s dominance make sense? Yes, on two dimensions. First, the hardware (CUDA cores, Tensor Cores, NVLink interconnect, HBM integration) represents billions in cumulative R&D. Second, and more importantly, the CUDA software ecosystem is a network effect: more developers write for CUDA because more GPUs run CUDA, and more GPUs are deployed because more software is CUDA-optimized. Breaking this cycle requires better hardware and a critical mass of software adoption, both of which take years to build.