GLM-5.2 (max) 性能、价格与开源发布
阅读原文· artificialanalysis.ai智谱AI于2026年6月发布开源推理模型GLM-5.2 (max),总参数753B,活跃参数40B,支持文本输入输出,上下文窗口1M tokens。在Artificial Analysis Intelligence Index上以51分位居同类92款模型第一。输出速度111 tokens/s(第15/92)。价格输入$1.40/百万tokens、输出$4.40/百万tokens,属较贵一档;缓存命中$0.26/百万tokens(便宜81%)。模型权重以MIT协议开源发布于HuggingFace。
GLM-5.2 (max) - Intelligence, Performance & Price Analysis
• Open weights model
• Released June 2026
GLM-5.2 (max) Intelligence, Performance & Price Analysis
Model summary
Intelligence Updated
#1 / 92
51
Artificial Analysis Intelligence Index
4 out of 4 units for Intelligence.
Speed
#15 / 92
110.7
Output tokens per second
4 out of 4 units for Speed.
Price
#77 / 92
Input
$1.40
per 1M tokens
Output
$4.40
per 1M tokens
4 out of 4 units for Price.
Cache Hit Price
#29 / 92
$0.26(-81%)
USD per 1M tokens
3 out of 4 units for Cache Hit Price.
Verbosity
#9 / 92
140M
Output tokens from Intelligence Index
3 out of 4 units for Verbosity.
Comparison Summary
GLM-5.2 (max) is amongst the leading models in intelligence, but particularly expensive when comparing to other open weight models of similar size. It's also notably fast, however somewhat verbose. The model supports text input, outputs text, and has a 1m tokens context window.
GLM-5.2 (max) scores 51 on the Artificial Analysis Intelligence Index, placing it well above average among comparable models (averaging 24).When evaluating the Intelligence Index, it generated 140M tokens, which is somewhat verbose in comparison to the average of 110M.
Pricing for GLM-5.2 (max) is $1.40 per 1M input tokens (expensive, average:$0.42) and $4.40 per 1M output tokens (expensive, average:$1.25).In total, it cost $867.88 to evaluate GLM-5.2 (max) on the Intelligence Index.
At 111 tokens per second, GLM-5.2 (max) is notably fast (59).
Technical specifications
| Reasoning | Yes This page shows the reasoning version of this model. A non-reasoning variant may also exist. | | --- | | Input modality | Supports:text | | Output modality | Supports:text | | Context window | 1m ~1500 A4 pages of size 12 Arial font | | Total parameters | 753B | | Active parameters | 40B Number of parameters active per token during inference | | License | Mit | | Model weights | Hugging Face |
92 models in this class
Metrics are compared against models of the same class:
Non-reasoning models → compared only with other non-reasoning models
Reasoning models → compared across both reasoning and non-reasoning
Open weights models → compared only with other open weights models of the same size class:
- Tiny: ≤4B parameters
- Small: 4B–40B parameters
- Medium: 40B–150B parameters
- Large: >150B parameters
Proprietary models → compared across proprietary and open weights models of the same price range, using a blended 3:1 input/output price ratio:
- <$0.15 per 1M tokens
- $0.15–$1 per 1M tokens
$1 per 1M tokens
Highlights
Updated
Intelligence
Artificial Analysis Intelligence Index · Higher is better
Not currently available
Speed
Output tokens per second · Higher is better
New
Cost per Task
Weighted average cost (USD) per Intelligence Index task · Lower is better
Not currently available
Prompt Options
Intelligence Updated
Artificial Analysis Intelligence Index Updated Coding Index Agentic Index
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR
21 of 540 models
Add model from specific provider
Not currently available
Reasoning models are indicated by a lightbulb icon
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. SeeIntelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
Open Weights / Proprietary Reasoning / Non-Reasoning
Artificial Analysis Intelligence Index by Open Weights / Proprietary
Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR
21 of 540 models
Add model from specific provider
Not currently available
Proprietary Open Weights Open Weights (Commercial Use Restricted)
Reasoning models are indicated by a lightbulb icon
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. SeeIntelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
Open Weights
Indicates whether the model weights are available. Models are labelled as 'Commercial Use Restricted' if the weights are available but commercial use is limited (typically requires obtaining a paid license).
Intelligence Evaluations
Intelligence evaluations measured independently by Artificial Analysis · Higher is better
14 of 18 quality evaluations
21 of 540 models
Add model from specific provider
GDPval-AA v2Updated
Agentic real-world work tasks, (Elo-500)/2000
Agentic coding & terminal use
𝜏³-BankingNew
Agentic tool use
Long context reasoning
Knowledge
AA-Omniscience Non-Hallucination Rate
1 - hallucination rate
Reasoning & knowledge
Scientific reasoning
Coding
Instruction following
Physics reasoning
Long-horizon agentic tasks
Kubernetes incident root-cause analysis
Visual reasoning
Reasoning models are indicated by a lightbulb icon.
Intelligence Evaluation Relevance
While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. SeeIntelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
Openness
Openness Index Openness Index Components Openness vs. Intelligence
Artificial Analysis Openness Index: Score
Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)
21 of 540 models
Add model from specific provider
Reasoning models are indicated by a lightbulb icon
Intelligence Index Comparisons
Intelligence vs. Cost per Task New Intelligence vs. Time per Task New Intelligence vs. Output Speed Intelligence vs. End-to-End Response Time
Intelligence vs. Cost per Intelligence Index Task
Artificial Analysis Intelligence Index · Weighted average cost (USD) per Artificial Analysis Intelligence Index task
21 of 540 models
Add model from specific provider
Most attractive quadrant
Z AI Google Anthropic OpenAI DeepSeek xAI Kimi NVIDIA MiniMax Xiaomi Alibaba InclusionAI StepFun
Reasoning models are indicated by a lightbulb icon.
Cost per Intelligence Index Task
Weighted average cost per Intelligence Index task. Each evaluation’s cost is calculated from input, cache hit, cache write, reasoning, and answer token prices, divided by task count, and weighted by its Intelligence Index weight.
Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. SeeIntelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.
Token Use Updated
Output Tokens per Task New Intelligence vs. Output Tokens per Task New Intelligence Index Token Use Intelligence vs. Token Use
Output Tokens per Intelligence Index Task
Weighted average number of output tokens used to run one task in the Artificial Analysis Intelligence Index
21 of 540 models
Add model from specific provider
Answer Reasoning
Reasoning models are indicated by a lightbulb icon
Output Tokens per Intelligence Index Task
The number of tokens required per Intelligence Index task. This is calculated by multiplying the output tokens per eval by the relative weights of each benchmark in the Intelligence Index, then dividing by task count (excluding repeats).
Price and Cost Updated
Cost per Task New Intelligence vs. Cost per Task New Evaluation Breakdown New
Cost per Intelligence Index Task
Weighted average cost (USD) per Artificial Analysis Intelligence Index task, segmented by token type. Lower is better
21 of 540 models
Add model from specific provider
Answer Reasoning Cache Write Cache Hit Input
Reasoning models are indicated by a lightbulb icon
Cost per Intelligence Index Task
Weighted average cost per Intelligence Index task. Each evaluation’s cost is calculated from input, cache hit, cache write, reasoning, and answer token prices, divided by task count, and weighted by its Intelligence Index weight.
Intelligence Index Total Cost Intelligence vs. Total Cost
Cost to Run Artificial Analysis Intelligence Index
Cost (USD) to run all evaluations in the Artificial Analysis Intelligence Index
21 of 540 models
Add model from specific provider
Output Reasoning Cache Write Cache Read Non-Cache Input
Reasoning models are indicated by a lightbulb icon
Cost to Run Artificial Analysis Intelligence Index
The cost to run the evaluations in the Artificial Analysis Intelligence Index, calculated using the model's input, cache hit, cache write, reasoning, and answer token prices and the number of tokens used across evaluations (excluding repeats).
Cache Hit, Input, and Output Pricing Blended Price Blended Price (Stacked)Cache Discount Intelligence vs. Price Intelligence vs. Price (Log, Inverted)Image Input Pricing
Pricing: Cache Hit, Input, and Output
Price (USD per M Tokens)
21 of 540 models
Add model from specific provider
Cache Hit Input Output
Reasoning models are indicated by a lightbulb icon
Cache Hit
Price per token for cached prompts (previously processed), typically offering a significant discount compared to regular input price, represented as USD per million tokens. The values shown here are the cache hit price; cache write and cache storage are billed separately and vary by provider — see "Cache pricing by provider" for detail.
Input Price
Price per token included in the request/message sent to the API, represented as USD per million Tokens.
Cache Pricing by Provider
The blended cache price shown here uses cache hit price only. Other caching costs differ by provider:
- Anthropic: charges a separate cache write fee, with different rates for 5-minute and 1-hour TTLs (1-hour TTL is more expensive).
- Google (Vertex/Gemini): charges a per-hour cache storage fee in addition to cache hit pricing. Some providers also use tiered pricing for prompts above 200K tokens.
- OpenAI, DeepSeek, others: typically charge only cache hit pricing with no write or storage fee.
See Prompt Caching for the full breakdown.
Output Price
Price per token generated by the model (received from the API), represented as USD per million Tokens.
Model Performance Representation
Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).
Context Window
Context Window Intelligence vs. Context Window
Context Window
Context window: tokens limit · Higher is better
21 of 540 models
Add model from specific provider
Reasoning models are indicated by a lightbulb icon
Context Window for RAG
Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.
Context Window
Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).
Speed Updated
Measured by Output Speed (tokens per second)
Output Speed Output Speed by Input Token Count Output Speed Variance Output Speed Over Time Output Speed vs. Price Latency vs. Output Speed
Output Speed
Output tokens per second · Higher is better
21 of 540 models
Add model from specific provider
Reasoning models are indicated by a lightbulb icon
Output Speed
Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).
Model Performance Representation
Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).
Time per Task New Intelligence vs. Time per Task New Cost vs. Time per Task New
Time per Intelligence Index Task
Weighted average wall clock time (minutes) per task; excludes TTFT and execution time · Lower is better
21 of 540 models
Add model from specific provider
Reasoning models are indicated by a lightbulb icon
Time per Intelligence Index Task
The weighted average time (seconds) per Artificial Analysis Intelligence Index task. This is calculated by dividing output tokens per task by output speed, weighted by the relative weights of each benchmark in the Intelligence Index.
Latency
Measured by Time (seconds) to First Token
Time To First Answer Token Time To First Token Latency by Input Token Count Latency Variance Latency Over Time
Latency: Time To First Answer Token
Seconds to first answer token received · Accounts for reasoning model 'thinking' time
21 of 540 models
Add model from specific provider
Thinking (reasoning models, when applicable)Input processing
Reasoning models are indicated by a lightbulb icon
Time to First Answer Token
Time to first answer token received, in seconds, after API request sent. For reasoning models, this includes the 'thinking' time of the model before providing an answer. For models which do not support streaming, this represents time to receive the completion.
End-to-End Response Time
Seconds to output 500 tokens, calculated based on time to first token, 'thinking' time for reasoning models, and output speed
End-to-End Response Time End-to-End Response Time by Input Token Count End-to-End Response Time Over Time
End-to-End Response Time
Seconds to output 500 tokens, including reasoning model 'thinking' time · Lower is better
21 of 540 models
Add model from specific provider
Outputting time'Thinking' time (reasoning models)Input processing time
Reasoning models are indicated by a lightbulb icon
End-to-End Response Time
Seconds to receive a 500 token response. Key components:
- Input time: Time to receive the first response token
- Thinking time (only for reasoning models): Time reasoning models spend outputting tokens to reason prior to providing an answer. Amount of tokens based on the average reasoning tokens across a diverse set of 60 prompts (methodology details).
- Answer time: Time to generate 500 output tokens, based on output speed
Model Performance Representation
Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).
Model Size (Open Weights Models Only)
Total & Active Parameters Intelligence vs. Active Parameters Intelligence vs. Total Parameters
Model Size: Total and Active Parameters
Comparison between total model parameters and parameters active during inference
21 of 540 models
Add model from specific provider
Passive Parameters Active Parameters
Reasoning models are indicated by a lightbulb icon
Total Parameters
The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.
Active Parameters at Inference Time
The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.
Frequently Asked Questions
Common questions about GLM-5.2 (max)
When was GLM-5.2 (max) released?
GLM-5.2 (max) was released on June 16, 2026.
Who created GLM-5.2 (max)?
GLM-5.2 (max) was created by Z AI.
How intelligent is GLM-5.2 (max)?
GLM-5.2 (max) scores 51 on the Artificial Analysis Intelligence Index, placing it well above average among other open weight models of similar size (median: 24).
How fast is GLM-5.2 (max)?
GLM-5.2 (max) generates output at 110.7 tokens per second (based on the median across providers serving the model), which is well above average compared to other open weight models of similar size (median: 59.4 t/s).
What is the latency of GLM-5.2 (max)?
GLM-5.2 (max) has a time to first token (TTFT) of 2.38s (based on the median across providers serving the model), which is somewhat higher than average compared to other open weight models of similar size (median: 2.37s).
How much does GLM-5.2 (max) cost?
GLM-5.2 (max) costs $1.40 per 1M input tokens (at the higher end, median: $0.55) and $4.40 per 1M output tokens (at the higher end, median: $1.85), based on the median across providers serving the model.
What is GLM-5.2 (max) API pricing?
GLM-5.2 (max) costs $1.40 per 1M input tokens and $4.40 per 1M output tokens (based on the median across providers serving the model). For a blended rate (7:2:1 cache hit/input/output ratio), this is $0.90 per 1M tokens. Pricing may vary by provider.Compare provider pricing
How verbose is GLM-5.2 (max)?
When evaluated on the Intelligence Index, GLM-5.2 (max) generated 140M output tokens, which is somewhat higher than average compared to other open weight models of similar size (median: 110M).
Is GLM-5.2 (max) a reasoning model?
Yes, GLM-5.2 (max) is a reasoning model. It uses extended thinking or chain-of-thought reasoning to work through complex problems before providing an answer.
What input modalities does GLM-5.2 (max) support?
GLM-5.2 (max) supports text input.
What output modalities does GLM-5.2 (max) support?
GLM-5.2 (max) supports text output.
Can GLM-5.2 (max) process images?
No, GLM-5.2 (max) does not support image input. It can only process text.
Is GLM-5.2 (max) multimodal?
No, GLM-5.2 (max) is not multimodal. It only supports text input.
What is the context window of GLM-5.2 (max)?
GLM-5.2 (max) has a context window of 1.0M tokens. This determines how much text and conversation history the model can process in a single request.
Is GLM-5.2 (max) open source?
Yes, GLM-5.2 (max) is open weights. The model weights are publicly available and can be downloaded for self-hosting.
How many parameters does GLM-5.2 (max) have?
GLM-5.2 (max) has 753 billion parameters (40 billion active).
What are the active parameters of GLM-5.2 (max)?
GLM-5.2 (max) is a Mixture of Experts (MoE) model with 753 billion total parameters, but only 40 billion active parameters are used during inference.
What is the license for GLM-5.2 (max)?
GLM-5.2 (max) is released under the Mit license. This license allows commercial use.View license
How does GLM-5.2 (max) perform on benchmarks?
GLM-5.2 (max) achieves a score of 51 on the Artificial Analysis Intelligence Index. This composite benchmark evaluates models across reasoning, knowledge, mathematics, and coding.
Is GLM-5.2 (max) available via API?
GLM-5.2 (max) is an open weights model that can be self-hosted.View providers
Where can I use GLM-5.2 (max)?
GLM-5.2 (max) is an open weights model that can be downloaded and self-hosted.Compare providers
$0.06