# Stable-Layers：基于VLM评分强化学习的图像层分解模型微调框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-28 08:00
- AIHOT 分数：46
- AIHOT 链接：https://aihot.virxact.com/items/cmpzlv8qz04qhslkp3qi5wo60
- 原文链接：https://arxiv.org/abs/2605.30257

## AI 摘要

Stable-Layers 是一个强化学习框架，无需配对监督即可微调预训练层分解模型，仅使用视觉语言模型（VLM）的反馈。基于 Qwen-Image-Layered 初始模型，采用 Flow-GRPO 和 LoRA 适配，每张图像采样多个候选分解，由 VLM 评分并通过组相对优势优化策略。为解决 VLM 单独评分时判断集中导致 GRPO 难以学习的问题，设计两阶段评估流程：先按五项编辑标准逐样本评分，再进行网格校准让 VLM 并列重评所有候选。相比基模型，Stable-Layers 在 Crello 数据集上实现更强的层分离、更少的空白或伪影层，并降低每层重建误差。

## 正文

We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision by fine-tuning a pretrained layer decomposition model using only feedback from a vision-language model (VLM). Starting from Qwen-Image-Layered, we apply Flow-GRPO with LoRA adaptation, sampling multiple candidate decompositions per image, scoring them with a VLM, and optimising the policy from group-relative advantages. The key challenge lies in designing a reliable reward signal: VLMs scoring samples in isolation tend to compress their judgements into a narrow band, leaving GRPO with little within-group variance to learn from. We address this with a two-stage evaluation pipeline that pairs structured per-sample scoring across five edit-centric criteria with a grid-based calibration step in which the VLM re-scores all candidates side-by-side. Stable-Layers produces decompositions with stronger layer separation, fewer blank or artifact-heavy layers, and lower per-layer reconstruction error on the Crello dataset compared to the base model.