Chapter 10: Networking Silicon, Switches & DPUs

10.1 Overview

Every GPU cluster is only as fast as the network connecting its accelerators. A single NVIDIA GB200 NVL72 rack contains 72 GPUs that must exchange hundreds of terabytes of data per training step. Scale that to 100,000 GPUs across a hyperscaler campus and the networking fabric becomes the binding constraint on model training time, energy efficiency, and cost. This chapter covers the silicon, systems, and standards that move data between accelerators (scale-up), between servers (scale-out), and between data centers (scale-across).

The networking layer sits directly above the chip design layer (Chapter 6) and below the photonics layer (Chapter 11) and connectors/cables layer (Chapter 12). It consumes switch ASICs, NICs, and DPUs fabricated at leading-edge foundries (Chapter 7), packaged with advanced techniques (Chapter 9), and connected through optical transceivers and copper cables. Its products are integrated into the servers described in Chapter 18 and deployed in the data centers of Chapter 17.

There are three distinct interconnect domains inside an AI cluster. The first is the scale-up network, which connects GPUs within a single server or rack. NVIDIA’s proprietary NVLink dominates here, providing 1.8 TB/s of bidirectional bandwidth per GPU in the fifth-generation NVLink used in Blackwell systems. The UALink Consortium, backed by AMD, Intel, Google, Microsoft, Meta, AWS, and others, released its 200G 1.0 specification in April 2025 as an open alternative, targeting 1,024 accelerators per pod versus NVLink’s 576. Hardware based on UALink 1.0 is expected in the 2026-2027 timeframe ¹.

The second domain is the scale-out network, which connects servers within a data center. This is the battleground between Ethernet and InfiniBand. InfiniBand commanded roughly 80% of AI back-end network deployments in 2023, but Ethernet surpassed it in 2025. According to Dell’Oro Group, Ethernet switch sales in AI back-end networks more than tripled in 2025, accounting for over two-thirds of AI cluster switch revenue for the full year. Amazon, Microsoft, Meta, Oracle, and xAI have all adopted Ethernet for their AI fabrics ²³. NVIDIA sells both sides of this fight: InfiniBand through its Quantum switches and Ethernet through its Spectrum-X platform.

The third domain is the scale-across network, which connects separate data centers into a unified logical compute cluster. As AI training runs grow beyond what a single campus can host, distributed multi-site fabrics are emerging. Broadcom’s Jericho 4 router with HyperPort technology and NVIDIA’s Spectrum-XGS are purpose-built for this use case, enabling lossless data transfer across distances of up to 60 miles ⁴⁵.

The merchant switch silicon market is dominated by Broadcom, with an estimated 70-90% share of cloud data center Ethernet switches depending on the segment ¹⁰. Its Tomahawk series (leaf/spine switches) and Jericho series (fabric routers) are the industry standard. NVIDIA’s Spectrum line, inherited from the Mellanox acquisition, is the primary challenger. Marvell’s Teralynx and Cisco’s Silicon One are credible but smaller players in merchant switch silicon. The competitive dynamic here extends beyond silicon performance to ecosystem control: Broadcom’s switches run in Arista, Cisco, and white-box platforms, giving it distribution leverage that pure silicon competitors lack.

Beyond switch ASICs, this chapter covers three adjacent silicon categories that are critical to AI networking. DPUs (Data Processing Units) offload network, storage, and security processing from the host CPU. NVIDIA’s BlueField and Marvell’s OCTEON are the main products. SmartNICs provide a lighter-weight version of the same function. And PCIe retimers, active electrical cables (AECs), and signal-conditioning chips from Astera Labs and Credo Technology ensure signal integrity across the increasingly complex PCIe and Ethernet links inside AI servers. These last two companies have been among the fastest-growing semiconductor firms in the buildout, with Astera Labs growing revenue from $116 million in 2023 to approximately $830 million in 2025, and Credo surging from $193 million to $437 million over the same period ⁶⁷.

10.2 Market Sizing & Growth

Data center networking (total market): The data center networking market was valued at approximately $55.6 billion in 2025 and is projected to reach $139 billion by 2031, at a 16.5% CAGR ⁸.

AI back-end networking: 650 Group estimated the data center AI networking market (Ethernet + InfiniBand + optical transceivers for AI clusters) would reach nearly $20 billion in 2025. Dell’Oro Group projects cumulative data center switch revenue approaching $80 billion over the next five years, driven by AI infrastructure investments ³⁹.

Ethernet switch silicon: The global Ethernet switch chip market was valued at approximately $5.5 billion in 2025, projected to reach $8.9 billion by 2032 at a 7.3% CAGR. The AI-optimized segment is growing far faster than the broader market ¹⁰.

NVIDIA networking revenue: NVIDIA’s networking segment (NVLink + InfiniBand + Spectrum-X Ethernet) generated $11.0 billion in Q4 FY2026 (ending January 2026), up 263% year-over-year. Full-year FY2026 networking revenue was $31.4 billion, up 142% from $13.0 billion in FY2025. Spectrum-X alone exceeded a $10 billion annualized run rate by Q2 FY2026 ¹¹¹²¹³.

Broadcom networking: Broadcom’s AI networking revenue surged over 170% year-over-year in Q2 FY2025 (ending May 2025), driven by Tomahawk and Jericho sales. The company’s AI switch backlog exceeds $10 billion. Broadcom’s total AI semiconductor revenue (custom ASICs + networking) reached $8.4 billion in Q1 FY2026 ¹⁴¹⁵.

Arista Networks: Full-year 2025 revenue reached $9.0 billion, up approximately 29% year-over-year. Arista raised its 2025 AI data center revenue target from $1.5 billion to $2.75 billion and expects to exceed $10 billion in annual revenue in 2026 ¹⁶¹⁷. By Q1 2026, CEO Jayshree Ullal described conditions as unprecedented: “Our demand is actually the best I have ever seen in my Arista tenure,” adding that “demand is outstripping our supply this year.” Arista raised its 2026 AI networking target to $3.5 billion, more than doubling AI sales year-over-year ²⁶.

Cisco AI networking: Cisco expects over $3 billion in AI infrastructure revenue from hyperscalers in FY2026 (ending July 2026) and $5 billion in AI orders booked. In Q3 FY2025 alone, Cisco booked over $600 million in AI-related product sales, more than double the year-ago quarter ¹⁸¹⁹.

Marvell data center: Marvell’s data center segment generated $1.52 billion in Q3 FY2026 (ending November 2025), up 38% year-over-year. Data center now represents approximately 73-76% of Marvell’s total revenue. The company’s AI-related revenue exceeded $1.5 billion in FY2025 and is expected to significantly surpass $2.5 billion in FY2026 ²⁰²¹.

Astera Labs: FY2025 revenue was $852.5 million (SEC filing confirms), up roughly 115% year-over-year from $396 million in FY2024. Q4 FY2025 revenue was $270.6 million. Q1 FY2026 guidance implies continued acceleration, with Q2 revenue projected at $355-365 million ⁶²².

Credo Technology: FY2025 revenue (ending May 2025) was $436.8 million, up 126% year-over-year. Initial FY2026 guidance was >$800M ⁷; by Q3 FY2026, the run rate exceeded $400M per quarter, putting full-year FY2026 on track to surpass $1.3 billion. Q2 FY2026 (ending November 2025) revenue was $268.0 million, up 272% year-over-year ⁷²³.

10.3 Supply Chain Flowcharts

10.3a: Scale-Up Interconnect (GPU-to-GPU within server/rack)

SCALE-UP INTERCONNECT
    |
    |---> PROPRIETARY
    |    NVIDIA NVLink (Gen 5): 1.8 TB/s per GPU, up to 576 GPUs per domain
    |         +-- NVLink Fusion: extends NVLink to non-NVIDIA accelerators
    |              Partners: Astera Labs (NVLink connectivity solutions)
    |
    |---> OPEN STANDARD (emerging)
    |    UALink Consortium (200G 1.0 spec, April 2025)
    |         +-- 200 Gbps per lane, up to 1,024 accelerators per pod
    |         +-- Members: AMD, Intel, Meta, HPE, AWS, Apple, Cisco,
    |              Google, Lightmatter, Microsoft, Synopsys, Astera Labs
    |         +-- Hardware expected: 2026-2027
    |
    +---> PCIe (supporting interconnect, not primary GPU-to-GPU)
         PCIe Gen 5 (current) → PCIe Gen 6 (ramping 2025-2026)
         Retimers: Astera Labs (Aries family), Broadcom, Microchip

10.3b: Scale-Out Interconnect (Server-to-Server within data center)

SCALE-OUT INTERCONNECT
    |
    |---> ETHERNET (now >2/3 of AI back-end network, growing)
    |    |-- MERCHANT SWITCH SILICON
    |    |    Broadcom Tomahawk 6 (102.4 Tbps) -- dominant, ~70-90% share
    |    |    Broadcom Jericho 4 (51.2 Tbps fabric router)
    |    |    NVIDIA Spectrum-4 (51.2 Tbps) -- in Spectrum-X systems
    |    |    Marvell Teralynx 10 (51.2 Tbps)
    |    |    Cisco Silicon One G200/P200 (51.2 Tbps)
    |    |         |
    |    |         v
    |    |-- SWITCH SYSTEM VENDORS (use merchant silicon above)
    |    |    Arista Networks -- largest share in cloud/AI Ethernet
    |    |    Cisco -- integrates own Silicon One + NVIDIA Spectrum
    |    |    White-box/ODMs: Celestica, Edgecore, Delta Networks
    |    |    NVIDIA -- sells complete Spectrum-X switch systems
    |    |         |
    |    |         v
    |    +-- NIC / SuperNIC / DPU (endpoint connectivity)
    |         NVIDIA ConnectX-8 (800G SuperNIC) / BlueField-3 DPU
    |         Marvell OCTEON 10 DPU / ConnectX competitor
    |         AMD Pensando (DPU for cloud infrastructure)
    |         Intel IPU (Infrastructure Processing Unit)
    |         Broadcom (SmartNICs)
    |
    |---> INFINIBAND (~1/3 of AI back-end, still growing but slower)
    |    NVIDIA Quantum (sole commercial supplier)
    |    Quantum-X800 InfiniBand switches (800G)
    |    ConnectX-8 HCAs (Host Channel Adapters)
    |    Primary users: frontier training clusters, HPC
    |
    +---> SIGNAL CONDITIONING / CONNECTIVITY SILICON
         Astera Labs: Aries retimers, Taurus cable modules, Scorpio switches
         Credo Technology: Active Electrical Cables (AECs), DSPs, SerDes
         Marvell: PAM4 DSPs, AECs, retimers
         Broadcom: SerDes IP, retimers

10.3c: Scale-Across Interconnect (Data center to data center)

SCALE-ACROSS INTERCONNECT (DCI for distributed AI)
    |
    |---> ETHERNET DCI ROUTERS
    |    Broadcom Jericho 4 -- HyperPort (3.2 Tbps logical ports, 60+ miles)
    |    Cisco 8223 -- 51.2 Tbps, powered by Silicon One P200
    |    Arista 7800R4 series
    |    NVIDIA Spectrum-XGS -- giga-scale AI factory interconnect
    |
    +---> OPTICAL TRANSPORT (covered in detail in Chapter 11)
         Ciena -- long-haul coherent optical
         Infinera (acquired by Nokia, Feb 2025, $2.3B) -- DCI optical transport

10.4 Key Companies

Company	Ticker	Exchange	Approx. Mkt Cap	Role in Buildout	Key Metric
NVIDIA	NVDA	NASDAQ	~$5.2T	NVLink scale-up, InfiniBand, Spectrum-X Ethernet, ConnectX/BlueField NICs/DPUs	Networking revenue $31.4B FY2026 (+142% YoY)
Broadcom	AVGO	NASDAQ	~$2.0T	Dominant merchant switch silicon (Tomahawk, Jericho), SerDes IP, SmartNICs	AI switch backlog >$10B; ~70-90% DC switch share
Arista Networks	ANET	NYSE	~$218B	Leading AI/cloud Ethernet switch systems vendor; EOS software platform	FY2025 revenue $9.0B; AI networking target $2.75B
Cisco Systems	CSCO	NASDAQ	~$381B	Silicon One ASICs, Nexus switches, AI PODs, optics; enterprise + hyperscaler	FY2026 AI infra revenue expected >$3B from hyperscalers
Marvell Technology	MRVL	NASDAQ	~$149B	Teralynx switches, OCTEON DPUs, custom silicon, PAM4 DSPs, AECs	DC revenue $1.52B Q3 FY2026 (+38% YoY); 73% of total
Astera Labs	ALAB	NASDAQ	~$34.2B	PCIe retimers (Aries), Ethernet cable modules (Taurus), fabric switches (Scorpio), CXL controllers (Leo)	FY2025 revenue ~$830M (+115% YoY)
Credo Technology	CRDO	NASDAQ	~$14.0B	Active Electrical Cables (AECs), SerDes IP, optical DSPs	FY2025 revenue $437M (+126% YoY); FY2026 target ~$1.3B
Juniper Networks	Private	Private (acquired by HPE)	Acquired ($14B)	HPE/Juniper: cloud-native switching, AI networking fabric	Acquired by HPE (announced Jan 2024, closed Jul 2025) for $14B
Celestica	CLS	NYSE/TSX	~$43.2B	White-box/ODM switch systems for hyperscalers (AI back-end Ethernet)	Among top 3 AI networking vendors per 650 Group
Intel	INTC	NASDAQ	~$628B	IPUs (Infrastructure Processing Units), Gaudi accelerators, Ethernet controllers	IPU program active but limited traction vs. NVIDIA DPUs
AMD	AMD	NASDAQ	~$742B	Pensando DPU (acquired 2022), EPYC server CPUs, Instinct GPUs	Pensando DPU in AWS, Microsoft Azure deployments
Microchip Technology	MCHP	NASDAQ	~$53.6B	PCIe switches, retimers, timing/synchronization for networking	Niche but critical PCIe switching silicon
Ciena	CIEN	NYSE	~$77.5B	DCI coherent optical transport (scale-across networks)	FY2025 revenue $4.77B (+19% YoY)
Alphawave Semi (acquired by Qualcomm, Dec 2025, ~$2.4B)	fmr. AWE	fmr. LSE	Acquired	High-speed SerDes IP, connectivity silicon, multi-die chiplet interconnect	Now part of Qualcomm (QCOM, NASDAQ, ~$200B). 224G SerDes; 6x TSMC OIP Partner of Year.
SiTime	SITM	NASDAQ	~$22.0B	MEMS precision timing oscillators; TimeFabric AI data center synchronization suite	Near-monopoly in MEMS-based AI cluster timing (IEEE 1588 PTP). GPU utilization 20-40% without proper nanosecond sync. FY2025 +45% revenue; gross margins expanding to 60%.
MaxLinear	MXL	NASDAQ	~$8.9B	PAM4 DSPs for optical transceivers, Ethernet PHYs	Serves transceiver module makers; smaller share vs. Broadcom/Marvell

10.5 Bottleneck Analysis

Broadcom’s switch silicon dominance (SEVERE): Broadcom controls an estimated 70-90% of the cloud data center Ethernet switch ASIC market. The Tomahawk series is the de facto standard for leaf/spine switching in hyperscale AI clusters. CEO Hock Tan confirmed the demand intensity on the Q4 FY2025 call: “Our current order backlog for AI switches exceeds $10 billion…Tomahawk 6…continues to book at record rates. This is one of the fastest-growing products in terms of deployment that we have ever seen” ²⁵. When Broadcom launches a new generation (Tomahawk 6 at 102.4 Tbps shipped in volume in 2025), competitors are typically 12-18 months behind. NVIDIA’s Spectrum-X1600 at 102.4 Tbps is expected only in the second half of 2026 ⁴. This gap gives Broadcom pricing power and allocation leverage. However, the bottleneck is partially mitigated by the fact that multiple system vendors (Arista, Cisco, Celestica, white-box ODMs) can build switches using Broadcom’s silicon, creating competition at the system level even if the chip layer is concentrated.

NVLink lock-in (SEVERE for scale-up): NVIDIA’s NVLink is the only commercially deployed high-bandwidth scale-up interconnect for GPU clusters. The NVLink 5 fabric in Blackwell systems provides 14x the bandwidth of PCIe Gen 5. Any customer deploying NVIDIA GPUs for large-scale training is locked into the NVLink ecosystem for intra-rack connectivity. The UALink Consortium is developing an open alternative, but UALink 1.0 hardware is not expected until 2026-2027, and it must prove competitive with NVLink’s next generation (expected with NVIDIA’s Rubin architecture). This is a feature, not a bug, from NVIDIA’s perspective: it deepens the moat around GPU sales ¹.

NVIDIA InfiniBand sole-source (MODERATE, declining): NVIDIA (via Mellanox) is the sole commercial supplier of InfiniBand networking equipment. There are no alternative InfiniBand vendors. This gave NVIDIA extraordinary pricing power when InfiniBand dominated AI back-end networks. The bottleneck is easing as Ethernet rapidly displaces InfiniBand. By 2025, Ethernet accounted for over two-thirds of AI back-end switch sales ²³. The shift to Ethernet is driven by hyperscalers wanting vendor diversity and cost advantages. Ironically, NVIDIA sells Ethernet too (Spectrum-X), so it captures revenue either way.

Active Electrical Cable (AEC) concentration (MODERATE-HIGH): Credo Technology holds an estimated 88% share of the AEC market, which is critical for intra-rack Ethernet connectivity in AI clusters ²⁴. AECs are preferred over optical cables for short-reach connections because they are more reliable (zero link flaps) and cheaper. With Credo’s top customer representing 67% of FY2025 revenue and its top three customers representing 39%, 32%, and 17% in Q3 FY2026, the customer concentration is extreme. If a single hyperscaler slowed orders, Credo’s revenue would crater. Astera Labs and Marvell are entering the AEC market, which should diversify supply over time.

Retimer supply (MODERATE): Astera Labs dominates the PCIe retimer market through its Aries product family, which is designed into virtually every major AI server platform. As servers transition from PCIe Gen 5 to Gen 6, the retimer content per server increases (higher speeds require more signal conditioning). Astera Labs’ FY2025 revenue growth of 115% reflects this demand surge. Broadcom and Microchip offer competing retimers, but Astera Labs’ incumbency in NVIDIA and custom ASIC platforms gives it a structural advantage ⁶.

1.6 Tbps switch silicon transition (MODERATE, emerging): The industry is preparing for the jump from 800G to 1.6T port speeds, with volume shipments of 1.6T switches expected in H2 2026. Dell’Oro Group projects the 1.6T ramp will be faster than 800G, surpassing 5 million ports within one to two years of shipments ³. The transition creates both opportunity and risk: Broadcom, NVIDIA, Marvell, and Cisco are all developing 1.6T switch ASICs. Share shifts are possible at generational transitions. The first vendor to ship in volume at 1.6T will capture significant design wins that lock in for 2-3 years.

10.6 Risks

Ethernet commoditization pressures margins: As Ethernet becomes the dominant AI networking fabric, the switch silicon market risks commoditization. Broadcom’s Tomahawk and Jericho chips are powerful, but hyperscalers are increasingly sophisticated buyers who can threaten to develop in-house alternatives or qualify Cisco/Marvell silicon to negotiate lower prices. The white-box switch market is growing at 12-13% CAGR, putting additional pressure on branded system vendors like Arista and Cisco ³. If switch silicon margins compress toward commodity levels, the investment case for networking companies weakens.

NVLink Fusion and UALink fragment the scale-up market: NVIDIA’s NVLink Fusion initiative lets non-NVIDIA accelerators connect to NVLink fabrics, while UALink offers a fully open alternative. If both gain traction, the scale-up interconnect market could splinter into multiple standards, increasing complexity for system vendors and potentially slowing the adoption of any single standard. For networking silicon companies, fragmentation means more SKUs and lower volumes per product, which hurts margins.

Hyperscaler vertical integration in networking: Google already designs custom switches for its TPU clusters. Amazon uses custom networking silicon for its internal infrastructure. If more hyperscalers bring switch ASIC design in-house, the merchant silicon TAM shrinks. This is the same dynamic threatening merchant GPU sales (see Chapter 6). The counter-argument: networking silicon has lower margins than AI accelerators, so the incentive to vertically integrate is weaker. Most hyperscalers would rather buy switch silicon from Broadcom than invest in an internal team to replicate it.

AEC vs. optics competition: Credo’s AECs are gaining share against optical cables for short-reach connections (under 7 meters). But as AI clusters scale to campus-size deployments and rack power densities push servers farther apart, the relevant reach requirements increase. Optical cables, backed by Broadcom, Coherent, and others with co-packaged optics technology, may recapture volume as distances grow beyond AEC capabilities. Credo is hedging by acquiring Hyperlume (MicroLED-based optical interconnects) and developing its own optical DSPs, but the transition risk is real ²⁴.

Astera Labs single-platform risk: Astera Labs’ growth has been heavily driven by NVIDIA’s Blackwell platform. If NVIDIA’s next-generation Rubin architecture changes the retimer architecture or brings signal conditioning in-house, Astera Labs would face a severe revenue hit. Astera is diversifying into custom ASIC platforms (Broadcom, Marvell customers), Ethernet connectivity (Taurus), CXL memory controllers (Leo), and fabric switches (Scorpio), but NVIDIA platform dependence remains the key risk through 2026 ⁶²².

Arista customer concentration: Nearly half of Arista’s revenue comes from what the company calls the “cloud titans,” primarily Microsoft and Meta. This concentration creates binary risk: a single hyperscaler pausing its network buildout could materially impact Arista’s growth rate. The company is diversifying into campus networking and acquiring VeloCloud (SD-WAN) from Broadcom, but the cloud titan concentration will persist for the foreseeable future ¹⁶¹⁷.

First principles check: Does the networking layer deserve this level of investment? Yes. Amdahl’s Law dictates that the performance of a parallel computing system is limited by its slowest component. In a 100,000-GPU cluster, even a 1% drop in network throughput translates into hours of wasted compute time per training run [illustrative derivation from Amdahl’s Law; see e.g. Hoefler et al. on collective communication scaling]. At GPU rental rates exceeding $2/hour per GPU, the cost of network inefficiency across a large cluster can reach millions of dollars per day. High-performance networking is not optional overhead; it is economically necessary.