# SGLang亮相NVIDIA GTC 2026：三天五场活动展示开源AI基础设施实力

- 来源：LMSYS：Blog（Chatbot Arena 团队）
- 发布时间：2026-03-25 00:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnxbjke5006asln0g07v83rw
- 原文链接：https://www.lmsys.org/blog/2026-03-25-gtc2026

## AI 摘要

SGLang亮相Jensen Huang主题演讲AI生态图谱，三天内密集举办五场活动。团队与RadixArk合办200人技术聚会，在LinkedIn总部举办搜索与推荐LLM系统研讨，并参与700人规模的Novita行业论坛。官方培训实验室发布Miles RL框架，解决生产环境训练-推理不匹配难题；LinkedIn工程师分享预填充优化方案，在H100上实现2–3倍吞吐量提升并回馈上游。

## 正文

SGLang came to NVIDIA GTC 2026 with panels, a happy hour, a 200-person meetup, and a hands-on training lab. Three days, five events, one packed week at the center of the LLM ecosystem and left with a lot to share. If you missed it, here's the full recap.

SGLang at GTC 2026: five events, three days. At the Main Conference SGLang Featured in the GTC Keynote

SGLang was featured on the NVIDIA AI ecosystem slide during Jensen Huang's GTC keynote. We are honored to be recognized as part of the infrastructure stack behind AI-native applications.

SGLang on NVIDIA's AI ecosystem slide during the GTC 2026 keynote.

📝 X recap post Open-Source AI Panel at GTC

On Tuesday, **Ying Sheng** joined the GTC panel **"The State of Open-Source AI"** alongside Vartika Singh (Strategic AI Lead, NVIDIA), Jonathan Cohen (VP of Applied Research, NVIDIA), Ion Stoica (Professor, EECS, UC Berkeley), Jeff Boudier (VP of Product, Hugging Face), and Ranjay Krishna (Director of Multimodal and Embodied AI, Ai2).

The panel examined open-source AI's growing role as the primary R&D engine for sophisticated AI systems: what makes open ecosystems trustworthy, scalable, and production-ready, and the community infrastructure enabling reproducible, auditable research.

Ying Sheng (second from left) on the "The State of Open-Source AI" panel at GTC 2026.

🎬 Watch the recording on NVIDIA On-Demand SGLang Training Lab at GTC 2026

On Thursday morning, the **RadixArk team** led an official GTC training lab: **"High-Performance LLM Serving and Training with SGLang"**.

The lab covered three areas:

1. **Performance tuning with the SGLangCookbook**: practical techniques for improving serving throughput and latency in real deployments 2. **Profiling and bottleneck analysis**: a developer-oriented walkthrough of identifying and resolving performance bottlenecks in LLM serving systems 3. **SGLang × Miles RL integration**: a live demonstration of running SGLang as the inference backend inside a real RL training loop using the Miles framework

The SGLang Training Lab at GTC 2026: hands-on LLM performance tuning and RL training.

🎬 Watch full recording on NVIDIA On-Demand

📁 Download the training lab materials Side Events SGLang × RadixArk GTC Happy Hour

On Tuesday evening, SGLang and RadixArk co-hosted a GTC Happy Hour that brought together builders, researchers, and founders from across the inference and training ecosystem, including friends from OpenAI, xAI, DeepMind, Meta, NVIDIA, Ollama, and more.

SGLang × RadixArk Happy Hour.

The evening featured two technical spotlights: **Banghua Zhu (RadixArk)** introduced RadixArk and **Miles**, SGLang's native RL training framework purpose-built for large-scale MoE post-training workloads. **Jason Zhao (ScitiX)** presented **SiMM**, an open-source in-memory KV cache engine integrated with SGLang for long-context serving.

Banghua introducing RadixArk and the Miles RL framework.

Thank you to **Z Potentials** and **ScitiX** for sponsoring the event and making it possible.

📝 X recap post Banghua at Novita's GTC Event

**Banghua Zhu** joined Novita's GTC event with over 700 attendees. The discussion covered Jensen Huang's remarks on the inflection point between inference cost and demand, the key drivers behind the agentic AI movement, and what it takes for AI products to deliver real value. Banghua shared his perspective on how SGLang is shaping the future of inference infrastructure, enabling next-generation use cases from OpenClaw to agentic inference, and driving the evolution of open models and open infrastructure.

Banghua presenting at Novita's GTC event.

Partners represented included NVIDIA, RadixArk, OpenRouter, Google DeepMind, Kimi (Moonshot AI), Alibaba Cloud, MiniMax, Z.ai, Hugging Face, and Kilo Code. LinkedIn × SGLang Meetup: LLMs for Search & Recommendation

On Wednesday evening, we hosted approximately 200 engineers at LinkedIn's Mountain View headquarters alongside teams from LinkedIn, TikTok, Meta, and NVIDIA for a deep dive into production LLM systems for search and recommendation.

SGLang swag at the LinkedIn meetup. LinkedIn Engineering Talks

LinkedIn opened with three engineering presentations: **Fedor Borisyuk**: Semantic search at scale **Zhipeng Wang**: Modeling optimizations for LLM-driven ranking **Sundara Raman Ramachandran**: LLM inference infrastructure optimizations, including a prefill-only serving path delivering **2–3× throughput gains on H100s**, upstreamed back to SGLang

LinkedIn engineers presenting on semantic search, ranking, and inference infrastructure.

Relevant work from LinkedIn's engineering team: [[1]](https://arxiv.org/abs/2502.14305)[[2]](https://arxiv.org/abs/2602.07309)[[3]](https://arxiv.org/abs/2510.22101)[[4]](https://arxiv.org/abs/2512.07846)[[5]](https://github.com/linkedin/fmchisel)[[6]](https://openreview.net/forum?id=tyGfwG6xTh)[[7]](https://www.linkedin.com/blog/engineering/ai/scaling-llm-based-ranking-systems-with-sglang-at-linkedin) SGLang: Roadmap and Miles Framework

SGLang core developer **Liangsheng Yin** walked through SGLang's H1 2026 roadmap.

**Mao Cheng** then presented the **Miles RL framework**, addressing training–inference mismatch in production through three core techniques:

1. **Importance sampling corrections**: compensating for distribution shift between training and inference 2. **Inference-training alignment**: ensuring consistency between rollout behavior and gradient updates 3. **Rollout Routing Replay (R3)**: replay-based routing for efficient use of generated rollout data

Mao Cheng presenting the Miles RL framework and its approach to training–inference alignment. Industry Speakers **Hongyu Lu (TikTok)**: LLM search at scale **Luke Simon and Xi Liu (Meta)**: Generative Reasoning Reranker [[paper link]](https://lnkd.in/gGFwdkJw) **Anish Maddipoti (NVIDIA)**: Dynamo + NeMoRL Panel Discussion

The closing panel, hosted by Qing Lan, featured Wenfeng Zhuo, Fedor Borisyuk, Luke Simon, and Mao Cheng. Topics included: Semantic ID vs. embedding retrieval Whether unified retrieval + ranking (OneRec-style systems) is production-ready Inference and training challenges in LLM recsys Recent breakthroughs accelerating LLM adoption for recommendations The role of continuous learning in production recommendation systems

The closing panel: Wenfeng Zhuo, Fedor Borisyuk, Luke Simon, and Mao Cheng, moderated by Qing Lan.

This is exactly the kind of collaboration that will define the next generation of recommendation systems: production teams and open-source infrastructure co-evolving together.

📝 LinkedIn recap post Looking Ahead

GTC 2026 made clear how much the production ecosystem is converging around open-source infrastructure. From semantic search at LinkedIn scale to RL post-training for frontier MoE models, SGLang is increasingly the shared layer underneath.

We'll keep building in the open. Follow our Luma calendar for future meetups, office hours, and community events.
