# 尽管使用频率不高，Pro Max 5x 的流量配额仍在 1.5 小时内用尽

- 来源：Hacker News 热门（buzzing.cc 中文翻译）
- 作者：cmaster11
- 发布时间：2026-04-12 21:55
- AIHOT 链接：https://aihot.virxact.com/items/cmnw1z0fk020aslc38vmc9pod
- 原文链接：https://github.com/anthropics/claude-code/issues/45756

## AI 摘要

Claude Code Pro Max 5x 用户反馈，在 moderate usage（中等使用强度）下，流量配额仅 1.5 小时即耗尽。该问题已提交至 GitHub issue，引发对配额限制合理性的质疑。

## 正文

Notifications You must be signed in to change notification settings

Fork 21.4k

Star 132k

[BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage #45756

Description

Preflight Checklist

I have searched existing issues and this hasn't been reported yet

This is a single bug report (please file separate reports for different bugs)

I am using the latest version of Claude Code

What's Wrong?

Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage

Summary

On a Pro Max 5x (Opus) plan, quota resets at a fixed interval. After reset, with moderate usage (mostly Q&A, light development), quota was exhausted within 1.5 hours. Prior to reset, 5 hours of heavy development (multi-file implementation, graphify pipeline, multi-agent spawns) consumed the previous quota window — but that was expected given the workload. The post-reset exhaustion was not.

Investigation reveals the likely root cause: cache_read tokens appear to count at full rate against the rate limit, negating the cost benefit of prompt caching for quota purposes.

Environment

Plan: Pro Max 5x

Model: claude-opus-4-6 (1M context)

Platform: Claude Code CLI on WSL2

Session: Single continued session with 2 auto-compacts

Data Collection Method

All data extracted from ~/.claude/projects/*//*.jsonl session files, specifically the usage object on each API response:

~/.claude/projects/*//*.jsonl

usage

{ "cache_read_input_tokens": ..., "cache_creation_input_tokens": ..., "input_tokens": ..., "output_tokens": ... }

Measured Token Consumption

Window 1: 15:00-20:00 (5 hours, heavy development)

Metric Value API calls 2,715 Cache read 1,044M tokens Cache create 16.8M tokens Input tokens 8.9k tokens Output tokens 1.15M tokens Peak context 966,078 tokens Effective input (cache_read at 1/10) 121.8M tokens

Workload: Full feature implementation (Express server + iOS app), graphify knowledge graph pipeline, SPEC-driven multi-agent coordination. 2 auto-compacts as context hit ~960k.

Window 2: 20:00-21:30 (1.5 hours, moderate usage)

Main session (vibehq):

Metric Value API calls 222 Cache read 23.2M tokens Cache create 1.4M tokens Input tokens 304 tokens Output tokens 91k tokens Peak context 182,302 tokens Effective input (cache_read at 1/10) 2.8M tokens

Other sessions running (background, not actively used by user):

Session API Calls Cache Read Eff Input Output token-analysis 296 57.6M 6.5M 145k career-ops 173 23.1M 3.8M 148k Total (all sessions) 691 103.9M 13.1M 387k

The Problem

If cache_read counts at 1/10 rate (expected):

Window 2 total: 13.1M effective tokens in 1.5 hours = 8.7M/hr

Window 2 total: 13.1M effective tokens in 1.5 hours = 8.7M/hr

This should NOT exhaust a Pro Max 5x quota. For comparison, Window 1 consumed 24.4M effective tokens/ho ur during heavy development and used the previous quota window — but that was 2.8x more intense.

If cache_read counts at full rate (suspected actual behavior):

Window 2 total: 103.9M + 1.4M + 387k = 105.7M tokens in 1.5 hours = 70.5M/hr

Window 2 total: 103.9M + 1.4M + 387k = 105.7M tokens in 1.5 hours = 70.5M/hr

This would explain quota exhaustion, but means prompt caching provides zero benefit for rate limiting.

Context Size Progression

The session file shows context growing and compacting cyclically:

Segment 1: 32k → 783k (835 calls) → auto-compact Segment 2: 39k → 966k (1,842 calls) → auto-compact Segment 3: 55k → 182k (222 calls) → still active

Segment 1: 32k → 783k (835 calls) → auto-compact Segment 2: 39k → 966k (1,842 calls) → auto-compact Segment 3: 55k → 182k (222 calls) → still active

Each API call sends the full context as input. With a 1M context window, calls near the compact threshold send ~960k tokens each. Even with prompt caching, if cache_read counts at full rate against quota, a single call costs ~960k quota tokens.

Specific Issues

1. Cache read token accounting against rate limits

Expected: cache_read tokens should count at reduced rate (1/10) against rate limits, matching the reduced cost.

Observed: Quota exhaustion rate is consistent with cache_read counting at full rate.

Impact: On a 1M context window, each API call sends ~100-960k tokens. With 200+ calls per hour (normal for tool-heavy Claude Code usage), quota depletes in minutes regardless of caching.

2. Background sessions consume shared quota

Sessions left open in other terminals continue making API calls (compacts, retros, hook processing) even when the user is not actively interacting. These consume from the same quota pool.

In this case, token-analysis (296 calls) and career-ops (173 calls) were running without active user interaction but still consuming significant quota.

token-analysis

career-ops

3. Auto-compact creates expensive spikes

Each auto-compact event results in one API call with the full pre-compact context (~966k tokens) as cache_creation, followed by a fresh start. This means the most expensive single call happens automatically, without user action.

4. 1M context window amplifies the problem

Larger context window = more tokens per call = faster quota depletion. The 1M window is marketed as a feature but becomes counterproductive when cache_read tokens count at full rate against quota.

Reproduction

Start Claude Code with Opus on Pro Max 5x

Have ~/.claude/rules/ with ~30 rule files (~19k tokens fixed overhead)

~/.claude/rules/

Work on a project with tool-heavy operations (file reads, builds, tests)

Observe context growing via /context command

/context

After 200-300 tool calls, check quota — it will be significantly depleted

Leave 2-3 other Claude Code sessions open in other terminals

After quota reset, observe quota depleting even with minimal active usage

Expected Behavior

cache_read tokens should count at their reduced rate (1/10) against rate limits

Background/idle sessions should not consume significant quota

Auto-compact should not create outsized quota spikes

Pro Max 5x should sustain at least 2-3 hours of moderate Opus usage per quota window

Actual Behavior

Quota exhausted in 1.5 hours with moderate usage (8.7M effective tokens/hour)

Background sessions consumed 78% of post-reset quota

Total raw tokens sent (105.7M) is consistent with cache_read counting at full rate

Suggested Improvements

Clarify cache_read quota accounting: Document whether cache_read tokens count at full or reduced rate against rate limits

Rate limit by effective tokens: Count cache_read at 1/10 rate for rate limiting, matching the cost reduction

Session idle detection: Don't count idle session overhead against quota, or warn users about open sessions

Quota visibility: Show real-time token consumption breakdown in Claude Code (cache_read vs cache_create vs input vs output)

Context-aware quota estimates: Before operations, estimate quota cost based on current context size

What Should Happen?

The token usage cannot be consumed at this speed.

Error Messages/Logs

Steps to Reproduce

It's hard to reproduct but I can provdie the log.

Claude Model

None

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

v2.1.97

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

WSL (Windows Subsystem for Linux)

Additional Information

No response

Hey all, Boris from the Claude Code team here. We've been investigating these reports, and a few of the top issues we've found are:

Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WI…

CLAUDE_CODE_AUTO_COMPACT_WI…

Metadata

Metadata

Assignees

notitatall

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions
