OME：以模型驱动架构革新 LLM 基础设施

2025-07-08 00:00·360天前

AI 摘要

Oracle Cloud Infrastructure 推出 OME（Open Model Engine），一款 Kubernetes-native 的模型服务框架。该系统采用模型驱动架构，通过 BaseModel、ServingRuntime 等自定义资源将模型视为一等公民，有效弥合 ML 工程师与生产团队之间的鸿沟。OME 将模型上线周期从数月压缩至数天，显著减少配置错误，并原生支持多节点推理、Prefill-decode 分离、Serverless 自动扩缩容及 Multi-LoRA 等企业级特性，集成 SGLang 运行时，实现复杂部署策略的编码复用与一键部署。

原文 · 未翻译

Contents

The Tale of Two Teams: Why Model Serving Is Broken

The Birth of OME

The OME Architecture: Models at the Center

Layer 1: Kubernetes API Layer

Custom Resources - The Foundation of Model-Driven Architecture

BaseModel/ClusterBaseModel: Models as First-Class Citizens

ServingRuntime: The Brain of Runtime Selection

InferenceService: Orchestrating Model Deployments and Ingress

BenchmarkJob: Performance Testing as a First-Class Operation

Admission Webhooks: Validation and Mutation

Layer 2: Control Plane - The Orchestrator

OME Controller Manager: The Orchestration Brain

Layer 3: Data Plane - Where Models Come to Life

Model Agent: Model Distribution

Inference Workloads

Layer 4: External Integrations - Ecosystem Power

SGLang: First-Class Runtime Support

Native Router Integration

Load Balancing Capabilities

Deployment Flexibility

Production-Grade Features: Built for Scale

Native Benchmarking

Multi-LoRA Serving: One Model, Many Adapters

High-Performance Serving at Scale

Enterprise Security: Defense in Depth

Real-World Impact: From Months to Days

Operational Transformation at Scale

Closing the Gap: How OME Bridges Two Worlds

The Path Forward: Challenges Ahead

Accelerator-Aware Runtime Selection

LMSYS：Blog（Chatbot Arena 团队）

导出 Markdown