Chapter 10
Networking Silicon, Switches & DPUs
Chapter 10: Networking Silicon, Switches & DPUs
10.1 Overview
Every GPU cluster is only as fast as the network connecting its accelerators. A single NVIDIA GB200 NVL72 rack contains 72 GPUs that must exchange hundreds of terabytes of data per training step. Scale that to 100,000 GPUs across a hyperscaler campus and the networking fabric becomes the binding constraint on model training time, energy efficiency, and cost. This chapter covers the silicon, systems, and standards that move data between accelerators (scale-up), between servers (scale-out), and between data centers (scale-across).
The networking layer sits directly above the chip design layer (Chapter 6) and below the photonics layer (Chapter 11) and connectors/cables layer (Chapter 12). It consumes switch ASICs, NICs, and DPUs fabricated at leading-edge foundries (Chapter 7), packaged with advanced techniques (Chapter 9), and connected through optical transceivers and copper cables. Its products are integrated into the servers described in Chapter 18 and deployed in the data centers of Chapter 17.
There are three distinct interconnect domains inside an AI cluster. The first is the scale-up network, which connects GPUs within a single server or rack. NVIDIA’s proprietary NVLink dominates here, providing 1.8 TB/s of bidirectional bandwidth per GPU in the fifth-generation NVLink used in Blackwell systems. The UALink Consortium, backed by AMD, Intel, Google, Microsoft, Meta, AWS, and others, released its 200G 1.0 specification in April 2025 as an open alternative, targeting 1,024 accelerators per pod versus NVLink’s 576. Hardware based on UALink 1.0 is expected in the 2026-2027 timeframe 1.
The second domain is the scale-out network, which connects servers within a data center. This is the battleground between Ethernet and InfiniBand. InfiniBand commanded roughly 80% of AI back-end network deployments in 2023, but Ethernet surpassed it in 2025. According to Dell’Oro Group, Ethernet switch sales in AI back-end networks more than tripled in 2025, accounting for over two-thirds of AI cluster switch revenue for the full year. Amazon, Microsoft, Meta, Oracle, and xAI have all adopted Ethernet for their AI fabrics 23. NVIDIA sells both sides of this fight: InfiniBand through its Quantum switches and Ethernet through its Spectrum-X platform.
The third domain is the scale-across network, which connects separate data centers into a unified logical compute cluster. As AI training runs grow beyond what a single campus can host, distributed multi-site fabrics are emerging. Broadcom’s Jericho 4 router with HyperPort technology and NVIDIA’s Spectrum-XGS are purpose-built for this use case, enabling lossless data transfer across distances of up to 60 miles 45.
The merchant switch silicon market is dominated by Broadcom, with an estimated 70-90% share of cloud data center Ethernet switches depending on the segment 10. Its Tomahawk series (leaf/spine switches) and Jericho series (fabric routers) are the industry standard. NVIDIA’s Spectrum line, inherited from the Mellanox acquisition, is the primary challenger. Marvell’s Teralynx and Cisco’s Silicon One are credible but smaller players in merchant switch silicon. The competitive dynamic here extends beyond silicon performance to ecosystem control: Broadcom’s switches run in Arista, Cisco, and white-box platforms, giving it distribution leverage that pure silicon competitors lack.
Beyond switch ASICs, this chapter covers three adjacent silicon categories that are critical to AI networking. DPUs (Data Processing Units) offload network, storage, and security processing from the host CPU. NVIDIA’s BlueField and Marvell’s OCTEON are the main products. SmartNICs provide a lighter-weight version of the same function. And PCIe retimers, active electrical cables (AECs), and signal-conditioning chips from Astera Labs and Credo Technology ensure signal integrity across the increasingly complex PCIe and Ethernet links inside AI servers. These last two companies have been among the fastest-growing semiconductor firms in the buildout, with Astera Labs growing revenue from $116 million in 2023 to approximately $830 million in 2025, and Credo surging from $193 million to $437 million over the same period 67.
10.2 Market Sizing & Growth
Data center networking (total market): The data center networking market was valued at approximately $55.6 billion in 2025 and is projected to reach $139 billion by 2031, at a 16.5% CAGR 8.
AI back-end networking: 650 Group estimated the data center AI networking market (Ethernet + InfiniBand + optical transceivers for AI clusters) would reach nearly $20 billion in 2025. Dell’Oro Group projects cumulative data center switch revenue approaching $80 billion over the next five years, driven by AI infrastructure investments 39.
Ethernet switch silicon: The global Ethernet switch chip market was valued at approximately $5.5 billion in 2025, projected to reach $8.9 billion by 2032 at a 7.3% CAGR. The AI-optimized segment is growing far faster than the broader market 10.
NVIDIA networking revenue: NVIDIA’s networking segment (NVLink + InfiniBand + Spectrum-X Ethernet) generated $11.0 billion in Q4 FY2026 (ending January 2026), up 263% year-over-year. Full-year FY2026 networking revenue was $31.4 billion, up 142% from $13.0 billion in FY2025. Spectrum-X alone exceeded a $10 billion annualized run rate by Q2 FY2026 111213.
Broadcom networking: Broadcom’s AI networking revenue surged over 170% year-over-year in Q2 FY2025 (ending May 2025), driven by Tomahawk and Jericho sales. The company’s AI switch backlog exceeds $10 billion. Broadcom’s total AI semiconductor revenue (custom ASICs + networking) reached $8.4 billion in Q1 FY2026 1415.
Arista Networks: Full-year 2025 revenue reached $9.0 billion, up approximately 29% year-over-year. Arista raised its 2025 AI data center revenue target from $1.5 billion to $2.75 billion and expects to exceed $10 billion in annual revenue in 2026 1617. By Q1 2026, CEO Jayshree Ullal described conditions as unprecedented: “Our demand is actually the best I have ever seen in my Arista tenure,” adding that “demand is outstripping our supply this year.” Arista raised its 2026 AI networking target to $3.5 billion, more than doubling AI sales year-over-year 26.
Cisco AI networking: Cisco expects over $3 billion in AI infrastructure revenue from hyperscalers in FY2026 (ending July 2026) and $5 billion in AI orders booked. In Q3 FY2025 alone, Cisco booked over $600 million in AI-related product sales, more than double the year-ago quarter 1819.
Marvell data center: Marvell’s data center segment generated $1.52 billion in Q3 FY2026 (ending November 2025), up 38% year-over-year. Data center now represents approximately 73-76% of Marvell’s total revenue. The company’s AI-related revenue exceeded $1.5 billion in FY2025 and is expected to significantly surpass $2.5 billion in FY2026 2021.
Astera Labs: FY2025 revenue was $852.5 million (SEC filing confirms), up roughly 115% year-over-year from $396 million in FY2024. Q4 FY2025 revenue was $270.6 million. Q1 FY2026 guidance implies continued acceleration, with Q2 revenue projected at $355-365 million 622.
Credo Technology: FY2025 revenue (ending May 2025) was $436.8 million, up 126% year-over-year. Initial FY2026 guidance was >$800M 7; by Q3 FY2026, the run rate exceeded $400M per quarter, putting full-year FY2026 on track to surpass $1.3 billion. Q2 FY2026 (ending November 2025) revenue was $268.0 million, up 272% year-over-year 723.
10.3 Supply Chain Flowcharts
10.3a: Scale-Up Interconnect (GPU-to-GPU within server/rack)
SCALE-UP INTERCONNECT
|
|---> PROPRIETARY
| NVIDIA NVLink (Gen 5): 1.8 TB/s per GPU, up to 576 GPUs per domain
| +-- NVLink Fusion: extends NVLink to non-NVIDIA accelerators
| Partners: Astera Labs (NVLink connectivity solutions)
|
|---> OPEN STANDARD (emerging)
| UALink Consortium (200G 1.0 spec, April 2025)
| +-- 200 Gbps per lane, up to 1,024 accelerators per pod
| +-- Members: AMD, Intel, Meta, HPE, AWS, Apple, Cisco,
| Google, Lightmatter, Microsoft, Synopsys, Astera Labs
| +-- Hardware expected: 2026-2027
|
+---> PCIe (supporting interconnect, not primary GPU-to-GPU)
PCIe Gen 5 (current) → PCIe Gen 6 (ramping 2025-2026)
Retimers: Astera Labs (Aries family), Broadcom, Microchip
10.3b: Scale-Out Interconnect (Server-to-Server within data center)
SCALE-OUT INTERCONNECT
|
|---> ETHERNET (now >2/3 of AI back-end network, growing)
| |-- MERCHANT SWITCH SILICON
| | Broadcom Tomahawk 6 (102.4 Tbps) -- dominant, ~70-90% share
| | Broadcom Jericho 4 (51.2 Tbps fabric router)
| | NVIDIA Spectrum-4 (51.2 Tbps) -- in Spectrum-X systems
| | Marvell Teralynx 10 (51.2 Tbps)
| | Cisco Silicon One G200/P200 (51.2 Tbps)
| | |
| | v
| |-- SWITCH SYSTEM VENDORS (use merchant silicon above)
| | Arista Networks -- largest share in cloud/AI Ethernet
| | Cisco -- integrates own Silicon One + NVIDIA Spectrum
| | White-box/ODMs: Celestica, Edgecore, Delta Networks
| | NVIDIA -- sells complete Spectrum-X switch systems
| | |
| | v
| +-- NIC / SuperNIC / DPU (endpoint connectivity)
| NVIDIA ConnectX-8 (800G SuperNIC) / BlueField-3 DPU
| Marvell OCTEON 10 DPU / ConnectX competitor
| AMD Pensando (DPU for cloud infrastructure)
| Intel IPU (Infrastructure Processing Unit)
| Broadcom (SmartNICs)
|
|---> INFINIBAND (~1/3 of AI back-end, still growing but slower)
| NVIDIA Quantum (sole commercial supplier)
| Quantum-X800 InfiniBand switches (800G)
| ConnectX-8 HCAs (Host Channel Adapters)
| Primary users: frontier training clusters, HPC
|
+---> SIGNAL CONDITIONING / CONNECTIVITY SILICON
Astera Labs: Aries retimers, Taurus cable modules, Scorpio switches
Credo Technology: Active Electrical Cables (AECs), DSPs, SerDes
Marvell: PAM4 DSPs, AECs, retimers
Broadcom: SerDes IP, retimers
10.3c: Scale-Across Interconnect (Data center to data center)
SCALE-ACROSS INTERCONNECT (DCI for distributed AI)
|
|---> ETHERNET DCI ROUTERS
| Broadcom Jericho 4 -- HyperPort (3.2 Tbps logical ports, 60+ miles)
| Cisco 8223 -- 51.2 Tbps, powered by Silicon One P200
| Arista 7800R4 series
| NVIDIA Spectrum-XGS -- giga-scale AI factory interconnect
|
+---> OPTICAL TRANSPORT (covered in detail in Chapter 11)
Ciena -- long-haul coherent optical
Infinera (acquired by Nokia, Feb 2025, $2.3B) -- DCI optical transport
10.4 Key Companies
| Company | Ticker | Exchange | Approx. Mkt Cap | Role in Buildout | Key Metric |
|---|---|---|---|---|---|
| NVIDIA | NVDA | NASDAQ | ~$5.2T | NVLink scale-up, InfiniBand, Spectrum-X Ethernet, ConnectX/BlueField NICs/DPUs | Networking revenue $31.4B FY2026 (+142% YoY) |
| Broadcom | AVGO | NASDAQ | ~$2.0T | Dominant merchant switch silicon (Tomahawk, Jericho), SerDes IP, SmartNICs | AI switch backlog >$10B; ~70-90% DC switch share |
| Arista Networks | ANET | NYSE | ~$218B | Leading AI/cloud Ethernet switch systems vendor; EOS software platform | FY2025 revenue $9.0B; AI networking target $2.75B |
| Cisco Systems | CSCO | NASDAQ | ~$381B | Silicon One ASICs, Nexus switches, AI PODs, optics; enterprise + hyperscaler | FY2026 AI infra revenue expected >$3B from hyperscalers |
| Marvell Technology | MRVL | NASDAQ | ~$149B | Teralynx switches, OCTEON DPUs, custom silicon, PAM4 DSPs, AECs | DC revenue $1.52B Q3 FY2026 (+38% YoY); 73% of total |
| Astera Labs | ALAB | NASDAQ | ~$34.2B | PCIe retimers (Aries), Ethernet cable modules (Taurus), fabric switches (Scorpio), CXL controllers (Leo) | FY2025 revenue ~$830M (+115% YoY) |
| Credo Technology | CRDO | NASDAQ | ~$14.0B | Active Electrical Cables (AECs), SerDes IP, optical DSPs | FY2025 revenue $437M (+126% YoY); FY2026 target ~$1.3B |
| Juniper Networks | Private | Private (acquired by HPE) | Acquired ($14B) | HPE/Juniper: cloud-native switching, AI networking fabric | Acquired by HPE (announced Jan 2024, closed Jul 2025) for $14B |
| Celestica | CLS | NYSE/TSX | ~$43.2B | White-box/ODM switch systems for hyperscalers (AI back-end Ethernet) | Among top 3 AI networking vendors per 650 Group |
| Intel | INTC | NASDAQ | ~$628B | IPUs (Infrastructure Processing Units), Gaudi accelerators, Ethernet controllers | IPU program active but limited traction vs. NVIDIA DPUs |
| AMD | AMD | NASDAQ | ~$742B | Pensando DPU (acquired 2022), EPYC server CPUs, Instinct GPUs | Pensando DPU in AWS, Microsoft Azure deployments |
| Microchip Technology | MCHP | NASDAQ | ~$53.6B | PCIe switches, retimers, timing/synchronization for networking | Niche but critical PCIe switching silicon |
| Ciena | CIEN | NYSE | ~$77.5B | DCI coherent optical transport (scale-across networks) | FY2025 revenue $4.77B (+19% YoY) |
| Alphawave Semi (acquired by Qualcomm, Dec 2025, ~$2.4B) | fmr. AWE | fmr. LSE | Acquired | High-speed SerDes IP, connectivity silicon, multi-die chiplet interconnect | Now part of Qualcomm (QCOM, NASDAQ, ~$200B). 224G SerDes; 6x TSMC OIP Partner of Year. |
| SiTime | SITM | NASDAQ | ~$22.0B | MEMS precision timing oscillators; TimeFabric AI data center synchronization suite | Near-monopoly in MEMS-based AI cluster timing (IEEE 1588 PTP). GPU utilization 20-40% without proper nanosecond sync. FY2025 +45% revenue; gross margins expanding to 60%. |
| MaxLinear | MXL | NASDAQ | ~$8.9B | PAM4 DSPs for optical transceivers, Ethernet PHYs | Serves transceiver module makers; smaller share vs. Broadcom/Marvell |
10.5 Bottleneck Analysis
Broadcom’s switch silicon dominance (SEVERE): Broadcom controls an estimated 70-90% of the cloud data center Ethernet switch ASIC market. The Tomahawk series is the de facto standard for leaf/spine switching in hyperscale AI clusters. CEO Hock Tan confirmed the demand intensity on the Q4 FY2025 call: “Our current order backlog for AI switches exceeds $10 billion…Tomahawk 6…continues to book at record rates. This is one of the fastest-growing products in terms of deployment that we have ever seen” 25. When Broadcom launches a new generation (Tomahawk 6 at 102.4 Tbps shipped in volume in 2025), competitors are typically 12-18 months behind. NVIDIA’s Spectrum-X1600 at 102.4 Tbps is expected only in the second half of 2026 4. This gap gives Broadcom pricing power and allocation leverage. However, the bottleneck is partially mitigated by the fact that multiple system vendors (Arista, Cisco, Celestica, white-box ODMs) can build switches using Broadcom’s silicon, creating competition at the system level even if the chip layer is concentrated.
NVLink lock-in (SEVERE for scale-up): NVIDIA’s NVLink is the only commercially deployed high-bandwidth scale-up interconnect for GPU clusters. The NVLink 5 fabric in Blackwell systems provides 14x the bandwidth of PCIe Gen 5. Any customer deploying NVIDIA GPUs for large-scale training is locked into the NVLink ecosystem for intra-rack connectivity. The UALink Consortium is developing an open alternative, but UALink 1.0 hardware is not expected until 2026-2027, and it must prove competitive with NVLink’s next generation (expected with NVIDIA’s Rubin architecture). This is a feature, not a bug, from NVIDIA’s perspective: it deepens the moat around GPU sales 1.
NVIDIA InfiniBand sole-source (MODERATE, declining): NVIDIA (via Mellanox) is the sole commercial supplier of InfiniBand networking equipment. There are no alternative InfiniBand vendors. This gave NVIDIA extraordinary pricing power when InfiniBand dominated AI back-end networks. The bottleneck is easing as Ethernet rapidly displaces InfiniBand. By 2025, Ethernet accounted for over two-thirds of AI back-end switch sales 23. The shift to Ethernet is driven by hyperscalers wanting vendor diversity and cost advantages. Ironically, NVIDIA sells Ethernet too (Spectrum-X), so it captures revenue either way.
Active Electrical Cable (AEC) concentration (MODERATE-HIGH): Credo Technology holds an estimated 88% share of the AEC market, which is critical for intra-rack Ethernet connectivity in AI clusters 24. AECs are preferred over optical cables for short-reach connections because they are more reliable (zero link flaps) and cheaper. With Credo’s top customer representing 67% of FY2025 revenue and its top three customers representing 39%, 32%, and 17% in Q3 FY2026, the customer concentration is extreme. If a single hyperscaler slowed orders, Credo’s revenue would crater. Astera Labs and Marvell are entering the AEC market, which should diversify supply over time.
Retimer supply (MODERATE): Astera Labs dominates the PCIe retimer market through its Aries product family, which is designed into virtually every major AI server platform. As servers transition from PCIe Gen 5 to Gen 6, the retimer content per server increases (higher speeds require more signal conditioning). Astera Labs’ FY2025 revenue growth of 115% reflects this demand surge. Broadcom and Microchip offer competing retimers, but Astera Labs’ incumbency in NVIDIA and custom ASIC platforms gives it a structural advantage 6.
1.6 Tbps switch silicon transition (MODERATE, emerging): The industry is preparing for the jump from 800G to 1.6T port speeds, with volume shipments of 1.6T switches expected in H2 2026. Dell’Oro Group projects the 1.6T ramp will be faster than 800G, surpassing 5 million ports within one to two years of shipments 3. The transition creates both opportunity and risk: Broadcom, NVIDIA, Marvell, and Cisco are all developing 1.6T switch ASICs. Share shifts are possible at generational transitions. The first vendor to ship in volume at 1.6T will capture significant design wins that lock in for 2-3 years.
10.6 Risks
Ethernet commoditization pressures margins: As Ethernet becomes the dominant AI networking fabric, the switch silicon market risks commoditization. Broadcom’s Tomahawk and Jericho chips are powerful, but hyperscalers are increasingly sophisticated buyers who can threaten to develop in-house alternatives or qualify Cisco/Marvell silicon to negotiate lower prices. The white-box switch market is growing at 12-13% CAGR, putting additional pressure on branded system vendors like Arista and Cisco 3. If switch silicon margins compress toward commodity levels, the investment case for networking companies weakens.
NVLink Fusion and UALink fragment the scale-up market: NVIDIA’s NVLink Fusion initiative lets non-NVIDIA accelerators connect to NVLink fabrics, while UALink offers a fully open alternative. If both gain traction, the scale-up interconnect market could splinter into multiple standards, increasing complexity for system vendors and potentially slowing the adoption of any single standard. For networking silicon companies, fragmentation means more SKUs and lower volumes per product, which hurts margins.
Hyperscaler vertical integration in networking: Google already designs custom switches for its TPU clusters. Amazon uses custom networking silicon for its internal infrastructure. If more hyperscalers bring switch ASIC design in-house, the merchant silicon TAM shrinks. This is the same dynamic threatening merchant GPU sales (see Chapter 6). The counter-argument: networking silicon has lower margins than AI accelerators, so the incentive to vertically integrate is weaker. Most hyperscalers would rather buy switch silicon from Broadcom than invest in an internal team to replicate it.
AEC vs. optics competition: Credo’s AECs are gaining share against optical cables for short-reach connections (under 7 meters). But as AI clusters scale to campus-size deployments and rack power densities push servers farther apart, the relevant reach requirements increase. Optical cables, backed by Broadcom, Coherent, and others with co-packaged optics technology, may recapture volume as distances grow beyond AEC capabilities. Credo is hedging by acquiring Hyperlume (MicroLED-based optical interconnects) and developing its own optical DSPs, but the transition risk is real 24.
Astera Labs single-platform risk: Astera Labs’ growth has been heavily driven by NVIDIA’s Blackwell platform. If NVIDIA’s next-generation Rubin architecture changes the retimer architecture or brings signal conditioning in-house, Astera Labs would face a severe revenue hit. Astera is diversifying into custom ASIC platforms (Broadcom, Marvell customers), Ethernet connectivity (Taurus), CXL memory controllers (Leo), and fabric switches (Scorpio), but NVIDIA platform dependence remains the key risk through 2026 622.
Arista customer concentration: Nearly half of Arista’s revenue comes from what the company calls the “cloud titans,” primarily Microsoft and Meta. This concentration creates binary risk: a single hyperscaler pausing its network buildout could materially impact Arista’s growth rate. The company is diversifying into campus networking and acquiring VeloCloud (SD-WAN) from Broadcom, but the cloud titan concentration will persist for the foreseeable future 1617.
First principles check: Does the networking layer deserve this level of investment? Yes. Amdahl’s Law dictates that the performance of a parallel computing system is limited by its slowest component. In a 100,000-GPU cluster, even a 1% drop in network throughput translates into hours of wasted compute time per training run [illustrative derivation from Amdahl’s Law; see e.g. Hoefler et al. on collective communication scaling]. At GPU rental rates exceeding $2/hour per GPU, the cost of network inefficiency across a large cluster can reach millions of dollars per day. High-performance networking is not optional overhead; it is economically necessary.