Spectrum-X Networking Platform Administration
Private Instructor-led Remote training
Course Duration 3 sessions of 4 hours each Hours
Our hands-on training course explores the architecture, deployment, configuration, operation, and management of NVIDIA Spectrum-X networking platforms for AI factories. Participants will gain practical experience provisioning and monitoring AI clusters using Spectrum-X, NetQ, and Cumulus Linux through instructor-led sessions and labs in the NVIDIA Air environment.
• Explain the fundamentals of NVIDIA Spectrum-X Networking Platform, including its architecture, key components, and reference design for AI environments. • Gain hands-on experience with NVIDIA Air environment for simulating and testing Spectrum-X deployments. • Deploy the Spectrum-X platform, including IP addressing, QoS configurations, routing policies, and virtualized network setup for multi-tenancy. • Apply advanced networking concepts such as RoCE , Adaptive Routing, and Congestion Control in the context of AI workloads. • Monitor and troubleshoot Spectrum-X fabric using NVIDIA NetQ and Cumulus Linux CLI.
Day 1 Introduction to Spectrum-X Networking Platform • Unit 1 - Spectrum-X Networking Platform Overview • Unit 2 - Architecture Overview • Unit 3 - Reference Architecture • Unit 4 - NVIDIA Digital Twins with Air environment • Practice 1 – Accessing the Air environment Day 2 Spectrum-X Platform Deployment • Unit 5 - Deployment Guide o IP Addressing Overviewo QoS: RoCE, Adaptive Routing and Congestion Control o Routing Policies o Underlay Network o Virtualized Network and Multitenancy • Practice 2: Deploying the Spectrum-X Platform Day 3: Monitoring and Troubleshooting • Unit 6 – Spectrum-X Fabric Telemetry with NetQ o NetQ features o Installing and configuring the NetQ agent o Validation checks for network health o Fabric Monitoring Methods: ▪ ASIC monitoring tools ▪ OTLP (Open Telemetry) ▪ DTS – DOCA Telemetry Service • Practice 3: Managing fabric telemetry with NetQ • Practice 4: Troubleshooting Spectrum-X platform deployment
Refer to the course outline or Learning Objectives for more details.
The course is designed for network administrators, DevOps professionals, and IT-related roles who want to gain the knowledge and skills necessary to deploy and maintain Spectrum-X networking platform-based AI data centers.
• Knowledge of networking concepts and principles, including technologies used in data centers and high-performance computing environments. • Basic understanding of artificial intelligence (AI) concepts and terminology. This may include knowledge of topics such as machine learning, deep learning, neural networks, and common AI applications. Practical experience in configuring and managing Cumulus Linux based network environments. Equivalent knowledge to “Cumulus Linux Professional” course. • Familiarity with installing DOCA OFED on the host • Equivalent knowledge to AI for All: From Basics to GenAI Practice course.