Claude Code Pro Max 5x 用户反馈,在 moderate usage(中等使用强度)下,流量配额仅 1.5 小时即耗尽。该问题已提交至 GitHub issue,引发对配额限制合理性的质疑。
原文 · 未翻译
Notifications You must be signed in to change notification settings
Fork 21.4k
Star 132k
[BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage #45756
Description
Preflight Checklist
I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code
What's Wrong?
Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage
Summary
On a Pro Max 5x (Opus) plan, quota resets at a fixed interval. After reset, with moderate usage (mostly Q&A, light development), quota was exhausted within 1.5 hours. Prior to reset, 5 hours of heavy development (multi-file implementation, graphify pipeline, multi-agent spawns) consumed the previous quota window — but that was expected given the workload. The post-reset exhaustion was not.
Investigation reveals the likely root cause: cache_read tokens appear to count at full rate against the rate limit, negating the cost benefit of prompt caching for quota purposes.
Environment
Plan: Pro Max 5x
Model: claude-opus-4-6 (1M context)
Platform: Claude Code CLI on WSL2
Session: Single continued session with 2 auto-compacts
Data Collection Method
All data extracted from ~/.claude/projects/*//*.jsonl session files, specifically the usage object on each API response:
Claude Code Pro Max 5x 用户反馈,在 moderate usage(中等使用强度)下,流量配额仅 1.5 小时即耗尽。该问题已提交至 GitHub issue,引发对配额限制合理性的质疑。
原文 · 保持原样,未翻译
Notifications You must be signed in to change notification settings
Fork 21.4k
Star 132k
[BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage #45756
Description
Preflight Checklist
I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code
What's Wrong?
Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage
Summary
On a Pro Max 5x (Opus) plan, quota resets at a fixed interval. After reset, with moderate usage (mostly Q&A, light development), quota was exhausted within 1.5 hours. Prior to reset, 5 hours of heavy development (multi-file implementation, graphify pipeline, multi-agent spawns) consumed the previous quota window — but that was expected given the workload. The post-reset exhaustion was not.
Other sessions running (background, not actively used by user):
This should NOT exhaust a Pro Max 5x quota. For comparison, Window 1 consumed 24.4M effective tokens/ho ur during heavy development and used the previous quota window — but that was 2.8x more intense.
If cache_read counts at full rate (suspected actual behavior):
Each API call sends the full context as input. With a 1M context window, calls near the compact threshold send ~960k tokens each. Even with prompt caching, if cache_read counts at full rate against quota, a single call costs ~960k quota tokens.
Specific Issues
Cache read token accounting against rate limits
Expected: cache_read tokens should count at reduced rate (1/10) against rate limits, matching the reduced cost.
Observed: Quota exhaustion rate is consistent with cache_read counting at full rate.
Impact: On a 1M context window, each API call sends ~100-960k tokens. With 200+ calls per hour (normal for tool-heavy Claude Code usage), quota depletes in minutes regardless of caching.
Background sessions consume shared quota
Sessions left open in other terminals continue making API calls (compacts, retros, hook processing) even when the user is not actively interacting. These consume from the same quota pool.
In this case, token-analysis (296 calls) and career-ops (173 calls) were running without active user interaction but still consuming significant quota.
token-analysis
career-ops
Auto-compact creates expensive spikes
Each auto-compact event results in one API call with the full pre-compact context (~966k tokens) as cache_creation, followed by a fresh start. This means the most expensive single call happens automatically, without user action.
1M context window amplifies the problem
Larger context window = more tokens per call = faster quota depletion. The 1M window is marketed as a feature but becomes counterproductive when cache_read tokens count at full rate against quota.
Reproduction
Start Claude Code with Opus on Pro Max 5x
Have ~/.claude/rules/ with ~30 rule files (~19k tokens fixed overhead)
~/.claude/rules/
Work on a project with tool-heavy operations (file reads, builds, tests)
Observe context growing via /context command
/context
After 200-300 tool calls, check quota — it will be significantly depleted
Leave 2-3 other Claude Code sessions open in other terminals
After quota reset, observe quota depleting even with minimal active usage
Expected Behavior
cache_read tokens should count at their reduced rate (1/10) against rate limits
Background/idle sessions should not consume significant quota
Auto-compact should not create outsized quota spikes
Pro Max 5x should sustain at least 2-3 hours of moderate Opus usage per quota window
Actual Behavior
Quota exhausted in 1.5 hours with moderate usage (8.7M effective tokens/hour)
Background sessions consumed 78% of post-reset quota
Total raw tokens sent (105.7M) is consistent with cache_read counting at full rate
Suggested Improvements
Clarify cache_read quota accounting: Document whether cache_read tokens count at full or reduced rate against rate limits
Rate limit by effective tokens: Count cache_read at 1/10 rate for rate limiting, matching the cost reduction
Session idle detection: Don't count idle session overhead against quota, or warn users about open sessions
Quota visibility: Show real-time token consumption breakdown in Claude Code (cache_read vs cache_create vs input vs output)
Context-aware quota estimates: Before operations, estimate quota cost based on current context size
What Should Happen?
The token usage cannot be consumed at this speed.
Error Messages/Logs
Steps to Reproduce
It's hard to reproduct but I can provdie the log.
Claude Model
None
Is this a regression?
Yes, this worked in a previous version
Last Working Version
No response
Claude Code Version
v2.1.97
Platform
Anthropic API
Operating System
Ubuntu/Debian Linux
Terminal/Shell
WSL (Windows Subsystem for Linux)
Additional Information
No response
Hey all, Boris from the Claude Code team here. We've been investigating these reports, and a few of the top issues we've found are:
Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WI…
CLAUDE_CODE_AUTO_COMPACT_WI…
Metadata
Metadata
Assignees
notitatall
Labels
Type
Fields
Projects
Milestone
Relationships
Development
Issue actions
Investigation reveals the likely root cause: cache_read tokens appear to count at full rate against the rate limit, negating the cost benefit of prompt caching for quota purposes.
Environment
Plan: Pro Max 5x
Model: claude-opus-4-6 (1M context)
Platform: Claude Code CLI on WSL2
Session: Single continued session with 2 auto-compacts
Data Collection Method
All data extracted from ~/.claude/projects/*//*.jsonl session files, specifically the usage object on each API response:
This should NOT exhaust a Pro Max 5x quota. For comparison, Window 1 consumed 24.4M effective tokens/ho ur during heavy development and used the previous quota window — but that was 2.8x more intense.
If cache_read counts at full rate (suspected actual behavior):
Each API call sends the full context as input. With a 1M context window, calls near the compact threshold send ~960k tokens each. Even with prompt caching, if cache_read counts at full rate against quota, a single call costs ~960k quota tokens.
Specific Issues
Cache read token accounting against rate limits
Expected: cache_read tokens should count at reduced rate (1/10) against rate limits, matching the reduced cost.
Observed: Quota exhaustion rate is consistent with cache_read counting at full rate.
Impact: On a 1M context window, each API call sends ~100-960k tokens. With 200+ calls per hour (normal for tool-heavy Claude Code usage), quota depletes in minutes regardless of caching.
Background sessions consume shared quota
Sessions left open in other terminals continue making API calls (compacts, retros, hook processing) even when the user is not actively interacting. These consume from the same quota pool.
In this case, token-analysis (296 calls) and career-ops (173 calls) were running without active user interaction but still consuming significant quota.
token-analysis
career-ops
Auto-compact creates expensive spikes
Each auto-compact event results in one API call with the full pre-compact context (~966k tokens) as cache_creation, followed by a fresh start. This means the most expensive single call happens automatically, without user action.
1M context window amplifies the problem
Larger context window = more tokens per call = faster quota depletion. The 1M window is marketed as a feature but becomes counterproductive when cache_read tokens count at full rate against quota.
Reproduction
Start Claude Code with Opus on Pro Max 5x
Have ~/.claude/rules/ with ~30 rule files (~19k tokens fixed overhead)
~/.claude/rules/
Work on a project with tool-heavy operations (file reads, builds, tests)
Observe context growing via /context command
/context
After 200-300 tool calls, check quota — it will be significantly depleted
Leave 2-3 other Claude Code sessions open in other terminals
After quota reset, observe quota depleting even with minimal active usage
Expected Behavior
cache_read tokens should count at their reduced rate (1/10) against rate limits
Background/idle sessions should not consume significant quota
Auto-compact should not create outsized quota spikes
Pro Max 5x should sustain at least 2-3 hours of moderate Opus usage per quota window
Actual Behavior
Quota exhausted in 1.5 hours with moderate usage (8.7M effective tokens/hour)
Background sessions consumed 78% of post-reset quota
Total raw tokens sent (105.7M) is consistent with cache_read counting at full rate
Suggested Improvements
Clarify cache_read quota accounting: Document whether cache_read tokens count at full or reduced rate against rate limits
Rate limit by effective tokens: Count cache_read at 1/10 rate for rate limiting, matching the cost reduction
Session idle detection: Don't count idle session overhead against quota, or warn users about open sessions
Quota visibility: Show real-time token consumption breakdown in Claude Code (cache_read vs cache_create vs input vs output)
Context-aware quota estimates: Before operations, estimate quota cost based on current context size
What Should Happen?
The token usage cannot be consumed at this speed.
Error Messages/Logs
Steps to Reproduce
It's hard to reproduct but I can provdie the log.
Claude Model
None
Is this a regression?
Yes, this worked in a previous version
Last Working Version
No response
Claude Code Version
v2.1.97
Platform
Anthropic API
Operating System
Ubuntu/Debian Linux
Terminal/Shell
WSL (Windows Subsystem for Linux)
Additional Information
No response
Hey all, Boris from the Claude Code team here. We've been investigating these reports, and a few of the top issues we've found are:
Prompt cache misses when using 1M token context window are expensive. Since Claude Code uses a 1 hour prompt cache window for the main agent, if you leave your computer for over an hour then continue a stale session, it's often a full cache miss. To improve this, we have shipped a few UX improvements (eg. to nudge you to /clear before continuing a long stale session), and are investigating defaulting to 400k context instead, with an option to configure your context window to up to 1M if preferred. To experiment with this now, try: CLAUDE_CODE_AUTO_COMPACT_WI…