# GLM-5.2 （max） 性能、价格与开源发布

- 来源：Hacker News 热门（buzzing.cc 中文翻译）
- 作者：theanonymousone
- 发布时间：2026-06-18 00:12
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmqiahrrm0749slf0eaa9twpo
- 原文链接：https://artificialanalysis.ai/models/glm-5-2

## AI 摘要

智谱AI于2026年6月发布开源推理模型GLM-5.2 (max)，总参数753B，活跃参数40B，支持文本输入输出，上下文窗口1M tokens。在Artificial Analysis Intelligence Index上以51分位居同类92款模型第一。输出速度111 tokens/s（第15/92）。价格输入$1.40/百万tokens、输出$4.40/百万tokens，属较贵一档；缓存命中$0.26/百万tokens（便宜81%）。模型权重以MIT协议开源发布于HuggingFace。

## 正文

GLM-5.2 (max) - Intelligence, Performance & Price Analysis

Artificial Analysis

Z AI

• Open weights model

• Released June 2026

GLM-5.2 (max) Intelligence, Performance & Price Analysis

API Provider Benchmarks

Model summary

Intelligence Updated

#1 / 92

51

Artificial Analysis Intelligence Index

4 out of 4 units for Intelligence.

Speed

#15 / 92

110.7

Output tokens per second

4 out of 4 units for Speed.

Price

#77 / 92

Input

$1.40

per 1M tokens

Output

$4.40

per 1M tokens

4 out of 4 units for Price.

Cache Hit Price

#29 / 92

$0.26(-81%)

USD per 1M tokens

3 out of 4 units for Cache Hit Price.

Verbosity

#9 / 92

140M

Output tokens from Intelligence Index

3 out of 4 units for Verbosity.

Comparison Summary

GLM-5.2 (max) is amongst the leading models in intelligence, but particularly expensive when comparing to other open weight models of similar size. It's also notably fast, however somewhat verbose. The model supports text input, outputs text, and has a 1m tokens context window.

GLM-5.2 (max) scores 51 on the Artificial Analysis Intelligence Index, placing it well above average among comparable models (averaging 24).When evaluating the Intelligence Index, it generated 140M tokens, which is somewhat verbose in comparison to the average of 110M.

Pricing for GLM-5.2 (max) is $1.40 per 1M input tokens (expensive, average:$0.42) and $4.40 per 1M output tokens (expensive, average:$1.25).In total, it cost $867.88 to evaluate GLM-5.2 (max) on the Intelligence Index.

At 111 tokens per second, GLM-5.2 (max) is notably fast (59).

Technical specifications

| Reasoning | Yes This page shows the reasoning version of this model. A non-reasoning variant may also exist. | | --- | | Input modality | Supports:text | | Output modality | Supports:text | | Context window | 1m ~1500 A4 pages of size 12 Arial font | | Total parameters | 753B | | Active parameters | 40B Number of parameters active per token during inference | | License | Mit | | Model weights | Hugging Face |

92 models in this class

Metrics are compared against models of the same class:

Non-reasoning models → compared only with other non-reasoning models

Reasoning models → compared across both reasoning and non-reasoning

Open weights models → compared only with other open weights models of the same size class:

Tiny: ≤4B parameters

Small: 4B–40B parameters

Medium: 40B–150B parameters

Large: >150B parameters

Proprietary models → compared across proprietary and open weights models of the same price range, using a blended 3:1 input/output price ratio:

<$0.15 per 1M tokens

$0.15–$1 per 1M tokens

$1 per 1M tokens

Highlights

Updated

Intelligence

Artificial Analysis Intelligence Index · Higher is better

Not currently available

Speed

Output tokens per second · Higher is better

New

Cost per Task

Weighted average cost (USD) per Intelligence Index task · Lower is better

Not currently available

Prompt Options

Intelligence Updated

Artificial Analysis Intelligence Index Updated Coding Index Agentic Index

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR

21 of 540 models

Add model from specific provider

Not currently available

Reasoning models are indicated by a lightbulb icon

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. SeeIntelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Open Weights / Proprietary Reasoning / Non-Reasoning

Artificial Analysis Intelligence Index by Open Weights / Proprietary

Artificial Analysis Intelligence Index v4.1 incorporates 9 evaluations: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR

21 of 540 models

Add model from specific provider

Not currently available

Proprietary Open Weights Open Weights (Commercial Use Restricted)

Reasoning models are indicated by a lightbulb icon

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. SeeIntelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Open Weights

Indicates whether the model weights are available. Models are labelled as 'Commercial Use Restricted' if the weights are available but commercial use is limited (typically requires obtaining a paid license).

Intelligence Evaluations

Intelligence evaluations measured independently by Artificial Analysis · Higher is better

14 of 18 quality evaluations

21 of 540 models

Add model from specific provider

GDPval-AA v2Updated

Agentic real-world work tasks, (Elo-500)/2000

Terminal-Bench v2.1New

Agentic coding & terminal use

𝜏³-BankingNew

Agentic tool use

AA-LCR

Long context reasoning

AA-Omniscience Accuracy

Knowledge

AA-Omniscience Non-Hallucination Rate

1 - hallucination rate

Humanity's Last Exam

Reasoning & knowledge

GPQA Diamond

Scientific reasoning

SciCode

Coding

IFBench

Instruction following

CritPt

Physics reasoning

APEX-Agents-AA

Long-horizon agentic tasks

ITBench-AA

Kubernetes incident root-cause analysis

MMMU-Pro

Visual reasoning

Reasoning models are indicated by a lightbulb icon.

Intelligence Evaluation Relevance

While model intelligence generally translates across use cases, specific evaluations may be more relevant for certain use cases.

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. SeeIntelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Openness

Openness Index Openness Index Components Openness vs. Intelligence

Artificial Analysis Openness Index: Score

Openness Index assesses model openness on a 0 to 100 normalized scale (higher is more open)

21 of 540 models

Add model from specific provider

Reasoning models are indicated by a lightbulb icon

Intelligence Index Comparisons

Intelligence vs. Cost per Task New Intelligence vs. Time per Task New Intelligence vs. Output Speed Intelligence vs. End-to-End Response Time

Intelligence vs. Cost per Intelligence Index Task

Artificial Analysis Intelligence Index · Weighted average cost (USD) per Artificial Analysis Intelligence Index task

21 of 540 models

Add model from specific provider

Most attractive quadrant

Z AI Google Anthropic OpenAI DeepSeek xAI Kimi NVIDIA MiniMax Xiaomi Alibaba InclusionAI StepFun

Reasoning models are indicated by a lightbulb icon.

Cost per Intelligence Index Task

Weighted average cost per Intelligence Index task. Each evaluation’s cost is calculated from input, cache hit, cache write, reasoning, and answer token prices, divided by task count, and weighted by its Intelligence Index weight.

Artificial Analysis Intelligence Index

Artificial Analysis Intelligence Index v4.1 includes: GDPval-AA v2, 𝜏³-Banking, Terminal-Bench v2.1, SciCode, Humanity's Last Exam, GPQA Diamond, CritPt, AA-Omniscience, AA-LCR. SeeIntelligence Index methodology for further details, including a breakdown of each evaluation and how we run them.

Token Use Updated

Output Tokens per Task New Intelligence vs. Output Tokens per Task New Intelligence Index Token Use Intelligence vs. Token Use

Output Tokens per Intelligence Index Task

Weighted average number of output tokens used to run one task in the Artificial Analysis Intelligence Index

21 of 540 models

Add model from specific provider

Answer Reasoning

Reasoning models are indicated by a lightbulb icon

Output Tokens per Intelligence Index Task

The number of tokens required per Intelligence Index task. This is calculated by multiplying the output tokens per eval by the relative weights of each benchmark in the Intelligence Index, then dividing by task count (excluding repeats).

Price and Cost Updated

Cost per Task New Intelligence vs. Cost per Task New Evaluation Breakdown New

Cost per Intelligence Index Task

Weighted average cost (USD) per Artificial Analysis Intelligence Index task, segmented by token type. Lower is better

21 of 540 models

Add model from specific provider

Answer Reasoning Cache Write Cache Hit Input

Reasoning models are indicated by a lightbulb icon

Cost per Intelligence Index Task

Weighted average cost per Intelligence Index task. Each evaluation’s cost is calculated from input, cache hit, cache write, reasoning, and answer token prices, divided by task count, and weighted by its Intelligence Index weight.

Intelligence Index Total Cost Intelligence vs. Total Cost

Cost to Run Artificial Analysis Intelligence Index

Cost (USD) to run all evaluations in the Artificial Analysis Intelligence Index

21 of 540 models

Add model from specific provider

Output Reasoning Cache Write Cache Read Non-Cache Input

Reasoning models are indicated by a lightbulb icon

Cost to Run Artificial Analysis Intelligence Index

The cost to run the evaluations in the Artificial Analysis Intelligence Index, calculated using the model's input, cache hit, cache write, reasoning, and answer token prices and the number of tokens used across evaluations (excluding repeats).

Cache Hit, Input, and Output Pricing Blended Price Blended Price (Stacked)Cache Discount Intelligence vs. Price Intelligence vs. Price (Log, Inverted)Image Input Pricing

Pricing: Cache Hit, Input, and Output

Price (USD per M Tokens)

21 of 540 models

Add model from specific provider

Cache Hit Input Output

Reasoning models are indicated by a lightbulb icon

Cache Hit

Price per token for cached prompts (previously processed), typically offering a significant discount compared to regular input price, represented as USD per million tokens. The values shown here are the cache hit price; cache write and cache storage are billed separately and vary by provider — see "Cache pricing by provider" for detail.

Input Price

Price per token included in the request/message sent to the API, represented as USD per million Tokens.

Cache Pricing by Provider

The blended cache price shown here uses cache hit price only. Other caching costs differ by provider:

Anthropic: charges a separate cache write fee, with different rates for 5-minute and 1-hour TTLs (1-hour TTL is more expensive).

Google (Vertex/Gemini): charges a per-hour cache storage fee in addition to cache hit pricing. Some providers also use tiered pricing for prompts above 200K tokens.

OpenAI, DeepSeek, others: typically charge only cache hit pricing with no write or storage fee.

See Prompt Caching for the full breakdown.

Output Price

Price per token generated by the model (received from the API), represented as USD per million Tokens.

Model Performance Representation

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Context Window

Context Window Intelligence vs. Context Window

Context Window

Context window: tokens limit · Higher is better

21 of 540 models

Add model from specific provider

Reasoning models are indicated by a lightbulb icon

Context Window for RAG

Larger context windows are relevant to RAG (Retrieval Augmented Generation) LLM workflows which typically involve reasoning and information retrieval of large amounts of data.

Context Window

Maximum number of combined input & output tokens. Output tokens commonly have a significantly lower limit (varied by model).

Speed Updated

Measured by Output Speed (tokens per second)

Output Speed Output Speed by Input Token Count Output Speed Variance Output Speed Over Time Output Speed vs. Price Latency vs. Output Speed

Output Speed

Output tokens per second · Higher is better

21 of 540 models

Add model from specific provider

Reasoning models are indicated by a lightbulb icon

Output Speed

Tokens per second received while the model is generating tokens (ie. after first chunk has been received from the API for models which support streaming).

Model Performance Representation

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Time per Task New Intelligence vs. Time per Task New Cost vs. Time per Task New

Time per Intelligence Index Task

Weighted average wall clock time (minutes) per task; excludes TTFT and execution time · Lower is better

21 of 540 models

Add model from specific provider

Reasoning models are indicated by a lightbulb icon

Time per Intelligence Index Task

The weighted average time (seconds) per Artificial Analysis Intelligence Index task. This is calculated by dividing output tokens per task by output speed, weighted by the relative weights of each benchmark in the Intelligence Index.

Latency

Measured by Time (seconds) to First Token

Time To First Answer Token Time To First Token Latency by Input Token Count Latency Variance Latency Over Time

Latency: Time To First Answer Token

Seconds to first answer token received · Accounts for reasoning model 'thinking' time

21 of 540 models

Add model from specific provider

Thinking (reasoning models, when applicable)Input processing

Reasoning models are indicated by a lightbulb icon

Time to First Answer Token

Time to first answer token received, in seconds, after API request sent. For reasoning models, this includes the 'thinking' time of the model before providing an answer. For models which do not support streaming, this represents time to receive the completion.

End-to-End Response Time

Seconds to output 500 tokens, calculated based on time to first token, 'thinking' time for reasoning models, and output speed

End-to-End Response Time End-to-End Response Time by Input Token Count End-to-End Response Time Over Time

End-to-End Response Time

Seconds to output 500 tokens, including reasoning model 'thinking' time · Lower is better

21 of 540 models

Add model from specific provider

Outputting time'Thinking' time (reasoning models)Input processing time

Reasoning models are indicated by a lightbulb icon

End-to-End Response Time

Seconds to receive a 500 token response. Key components:

Input time: Time to receive the first response token

Thinking time (only for reasoning models): Time reasoning models spend outputting tokens to reason prior to providing an answer. Amount of tokens based on the average reasoning tokens across a diverse set of 60 prompts (methodology details).

Answer time: Time to generate 500 output tokens, based on output speed

Model Performance Representation

Figures represent performance of the model's first-party API (e.g. OpenAI for o1) or the median across providers where a first-party API is not available (e.g. Meta's Llama models).

Model Size (Open Weights Models Only)

Total & Active Parameters Intelligence vs. Active Parameters Intelligence vs. Total Parameters

Model Size: Total and Active Parameters

Comparison between total model parameters and parameters active during inference

21 of 540 models

Add model from specific provider

Passive Parameters Active Parameters

Reasoning models are indicated by a lightbulb icon

Total Parameters

The total number of trainable weights and biases in the model, expressed in billions. These parameters are learned during training and determine the model's ability to process and generate responses.

Active Parameters at Inference Time

The number of parameters actually executed during each inference forward pass, expressed in billions. For Mixture of Experts (MoE) models, a routing mechanism selects a subset of experts per token, resulting in fewer active than total parameters. Dense models use all parameters, so active equals total.

Frequently Asked Questions

Common questions about GLM-5.2 (max)

When was GLM-5.2 (max) released?

GLM-5.2 (max) was released on June 16, 2026.

Who created GLM-5.2 (max)?

GLM-5.2 (max) was created by Z AI.

How intelligent is GLM-5.2 (max)?

GLM-5.2 (max) scores 51 on the Artificial Analysis Intelligence Index, placing it well above average among other open weight models of similar size (median: 24).

How fast is GLM-5.2 (max)?

GLM-5.2 (max) generates output at 110.7 tokens per second (based on the median across providers serving the model), which is well above average compared to other open weight models of similar size (median: 59.4 t/s).

What is the latency of GLM-5.2 (max)?

GLM-5.2 (max) has a time to first token (TTFT) of 2.38s (based on the median across providers serving the model), which is somewhat higher than average compared to other open weight models of similar size (median: 2.37s).

How much does GLM-5.2 (max) cost?

GLM-5.2 (max) costs $1.40 per 1M input tokens (at the higher end, median: $0.55) and $4.40 per 1M output tokens (at the higher end, median: $1.85), based on the median across providers serving the model.

What is GLM-5.2 (max) API pricing?

GLM-5.2 (max) costs $1.40 per 1M input tokens and $4.40 per 1M output tokens (based on the median across providers serving the model). For a blended rate (7:2:1 cache hit/input/output ratio), this is $0.90 per 1M tokens. Pricing may vary by provider.Compare provider pricing

How verbose is GLM-5.2 (max)?

When evaluated on the Intelligence Index, GLM-5.2 (max) generated 140M output tokens, which is somewhat higher than average compared to other open weight models of similar size (median: 110M).

Is GLM-5.2 (max) a reasoning model?

Yes, GLM-5.2 (max) is a reasoning model. It uses extended thinking or chain-of-thought reasoning to work through complex problems before providing an answer.

What input modalities does GLM-5.2 (max) support?

GLM-5.2 (max) supports text input.

What output modalities does GLM-5.2 (max) support?

GLM-5.2 (max) supports text output.

Can GLM-5.2 (max) process images?

No, GLM-5.2 (max) does not support image input. It can only process text.

Is GLM-5.2 (max) multimodal?

No, GLM-5.2 (max) is not multimodal. It only supports text input.

What is the context window of GLM-5.2 (max)?

GLM-5.2 (max) has a context window of 1.0M tokens. This determines how much text and conversation history the model can process in a single request.

Is GLM-5.2 (max) open source?

Yes, GLM-5.2 (max) is open weights. The model weights are publicly available and can be downloaded for self-hosting.

How many parameters does GLM-5.2 (max) have?

GLM-5.2 (max) has 753 billion parameters (40 billion active).

What are the active parameters of GLM-5.2 (max)?

GLM-5.2 (max) is a Mixture of Experts (MoE) model with 753 billion total parameters, but only 40 billion active parameters are used during inference.

What is the license for GLM-5.2 (max)?

GLM-5.2 (max) is released under the Mit license. This license allows commercial use.View license

How does GLM-5.2 (max) perform on benchmarks?

GLM-5.2 (max) achieves a score of 51 on the Artificial Analysis Intelligence Index. This composite benchmark evaluates models across reasoning, knowledge, mathematics, and coding.

Is GLM-5.2 (max) available via API?

GLM-5.2 (max) is an open weights model that can be self-hosted.View providers

Where can I use GLM-5.2 (max)?

GLM-5.2 (max) is an open weights model that can be downloaded and self-hosted.Compare providers

$0.06
