# Squeeze-Release：带精确结构最小化的迭代剪枝

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-12 08:00
- AIHOT 分数：49
- AIHOT 链接：https://aihot.virxact.com/items/cmqfazg60002vsl4r5ndswnl0
- 原文链接：https://arxiv.org/abs/2606.14346

## AI 摘要

Squeeze-Release 提出精确结构改写（minimization），将带掩码网络转换为更小密集网络，前向函数仅浮点舍入误差一致。循环迭代剪枝和最小化，中间释放步骤将压缩张量中原置零位置重置为小校准噪声，使容量重新可训练，后续循环发现单次剪枝无法触及的结构冗余。引入函数保持的 CompensatedLayerNorm，将通道缩减扩展到带 LayerNorm 的残差流。在 fully-connected 网络压缩至 1/39，ConvNeXt-Tiny 达 1/14.8，准确率相当，并可扩展至 Transformer 架构。

## 正文

Unstructured pruning produces sparse weight tensors, but the standard implementation keeps tensor shapes unchanged so the deployed model is no smaller than before pruning. We present an exact structural rewrite, which we call minimization, that converts a masked network into a smaller dense network with the same forward function up to floating-point rounding. The Squeeze-Release cycle iterates pruning and minimization with an intermediate release step that re-enables the exact-zero positions inside the compacted tensors as small calibrated noise, turning otherwise wasted capacity back into trainable parameters. Successive cycles use that capacity to find structural redundancy a single pass cannot reach. We additionally introduce CompensatedLayerNorm, a function-preserving replacement for LayerNorm that extends minimization to channel reduction across LayerNorm-equipped residual streams. Squeeze-Release compresses the deployable network to 39x smaller than the unpruned model on a fully-connected model network and 14.8x smaller on modern CNN (ConvNeXt-Tiny), at comparable accuracy. In addition we prove that the rewrite can be extended to transformer architectures.