SEVRA:面向预算感知推理的选择性验证服务层控制器
SEVRA是一种服务层控制器,使用冻结的Qwen3-4B求解器,通过训练可恢复性感知门控决定是否保留初始答案或调用主动验证。在MathFive基准上,选择性验证达76.3%准确率,高于始终验证的75.5%,后生成token减少26.8%,有害翻转从2.2%降至1.0%。但8192 token初始求解以76.0%准确率和28%更少总token胜出。在GSM上,选择性策略仅验证3.0%样本,准确率从93.4%提升至94.5%,验证token减少91.2%。部署规则:先调整初始预算,再在需要显式检查、有限重试、可审计或风险控制时使用选择性恢复。
Test-time reasoning is increasingly used as a serving-time control knob, but extra reasoning is not uniformly valuable: it can repair failed attempts, waste compute on already-correct answers, or introduce harmful answer changes. We study this as a deployment allocation problem rather than a new-verifier problem. We introduce \sevra, Selective Verification for Reasoning Allocation, a serving-layer controller that decides whether to preserve a frozen solver's initial answer or invoke active verification. Using a frozen Qwen3-4B solver, we log intervention outcomes and train recoverability-aware gates from serving-visible attempt state. On \mathfive, selective verification reaches 76.3\% accuracy, compared with 75.5\% for always verifying, while reducing post-generation tokens by 26.8\% and harmful flips from 2.2\% to 1.0\%. However, an 8,192-token initial solve reaches 76.0\% accuracy with 28\% fewer total model tokens, showing that selective recovery is useful but not the best tested cost frontier. In frozen transfer to \gsm, the selective policy verifies only 3.0\% of examples, improves accuracy from 93.4\% to 94.5\%, and reduces verification tokens by 91.2\% relative to always verifying; again, a longer initial solve matches its accuracy with fewer realized tokens. On CommonsenseQA, always-on verification hurts, while Self-Consistency@5 improves accuracy at about five times the realized token cost. The resulting deployment rule is: tune the initial budget first, then use selective recovery when explicit checks, bounded retries, auditability, or regression-risk control matter.