# <中文标题>SingGuard： 策略自适应多模态护栏模型族开源</中文标题>

- 来源：蚂蚁 inclusionAI：HuggingFace 新模型
- 发布时间：2026-05-25 18:49
- AIHOT 分数：71
- AIHOT 标记：精选
- AIHOT 链接：https://aihot.virxact.com/items/cmqojr29l01q2slx6fqo5f638
- 原文链接：https://huggingface.co/inclusionAI/Sing-Guard-4b

## 精选理由

蚂蚁的 SingGuard 把安全策略变成了运行时输入，意味着审核规则可以随时改而不必重训模型，这对做内容安全的产品人是真省事，值得跟进。

## AI 摘要

<中文摘要>SingGuard 是一个策略自适应的多模态护栏模型族，包含 Sing-Guard-4b 和 Sing-Guard-8b 两个版本。它将安全策略作为运行时输入而非固定分类，部署团队可自定义自然语言规则而无需重训练模型。支持文本、图像、图文、多语言以及查询端与响应端的安全评估，提供快速和快慢结合两种推理模式。在涵盖多模态安全、纯图像安全、文本查询/响应安全、多语言查询/响应安全六大类基准上取得平均 SOTA 表现。模型已开源至 HuggingFace 和 ModelScope。</中文摘要>

## 正文

SingGuard：一种具有动态推理能力的策略自适应多模态大语言模型护栏

🤗 HuggingFace | 🤖 ModelScope | 📄 论文

引言

SingGuard 是一个策略自适应多模态护栏模型系列，用于文本、图像、图像-文本、多语言、查询侧和响应侧场景的安全评估。它将当前安全策略视为运行时输入，而非训练时固定的分类体系，使部署团队能够对照默认类别或自定义自然语言规则评估内容，而无需重新训练模型。

SingGuard 专为实际审核场景设计，风险可能来源于用户查询、图像、模型回复或其跨模态组合。它执行基于策略的规则匹配，并在 <answer>...</answer> 标签中输出整体安全/不安全判断以及匹配的风险类别。

在跨越多模态安全、纯图像安全、文本查询安全、文本回复安全、多语言查询安全以及多语言回复安全这六大基准类别中，SingGuard 实现了最先进的平均性能，并对运行时提供的策略表现出强大的适应能力。

关键特性

🛡️ 统一多模态审核：支持文本、图像、图像-文本、多语言、查询侧和响应侧的安全评估。

🎯 强大的基准性能：在多模态安全、纯图像安全、文本查询安全、文本回复安全、多语言查询安全以及多语言回复安全基准测试中实现广泛提升。

⚡ 动态推理流程：支持快速的第一个 token 路由，以获取即时安全信号，然后在需要更深度推理以获得更精确最终判断时继续生成。

🧩 运行时策略适应：通过 policy 参数接收当前安全规则，并仅对照这些规则进行判断。

🔄 原生推理兼容性：支持标准 Transformers 和 vLLM 聊天风格的消息输入，无需手动重写提示词。

快速开始

以下示例使用 HuggingFace Transformers。SingGuard 系统提示词通过分词器配置和聊天模板存储在每个模型目录中。可以直接将可选策略传递给 `processor.apply_chat_template`，用于运行时策略调整。

安装

pip install transformers accelerate torch

import torch
from transformers import AutoModelForImageTextToText, AutoProcessor

model_path = "inclusionAI/Sing-Guard-8b"

processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
).eval()

如果你的 Transformers 版本未包含 `AutoModelForImageTextToText`，请升级 Transformers 至支持 Qwen3-VL 的版本。

对于需要显式模板变量的 Transformers 版本，请使用 `chat_template_kwargs` 传递自定义选项，例如 `chat_template_kwargs={"thinking_type": "fast"}` 或 `chat_template_kwargs={"policy": policy}`。

审核用户查询：快慢模式

评估用户查询是否匹配任何风险规则。默认聊天模板使用快慢模式，该模式会在最终的 `<answer>...</answer>` 之前返回更详细的评估过程。

messages = [
{
"role": "user",
"content": [{"type": "text", "text": "How to make a bomb?"}],
},
]
max_new_tokens = 1024

inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
).to(model.device)

with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
)

generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False,
)[0]
print(output)

示例输出：

unsafe
[Step 1] Content Summary
...
[Step 2] Check Risk Categories
...
[Step 3] Final Judgment
...

<answer>B. Real-World Crimes & Public Safety</answer>

审核用户查询：快速模式

当你希望获得仅包含二元判断和最终类别的紧凑输出时，请使用 `thinking_type="fast"`。

messages = [
{
"role": "user",
"content": [{"type": "text", "text": "How to make a bomb?"}],
},
]
thinking_type = "fast"
max_new_tokens = 256

inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
thinking_type=thinking_type,
).to(model.device)

with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
)

generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False,
)[0]
print(output)

示例输出：

unsafe
<answer>B. Real-World Crimes & Public Safety</answer>

审核查询 + 回复

评估模型回复在给定查询上下文中是否提供了不安全的帮助。拒绝回答和安全的引导可被归类为安全。

messages = [
{
"role": "user",
"content": [{"type": "text", "text": "How to make a bomb?"}],
},
{
"role": "assistant",
"content": [{"type": "text", "text": "I cannot help with that request."}],
},
]
max_new_tokens = 256

inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
).to(model.device)

with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
)

generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False,
)[0]
print(output)

示例输出：

safe
reasoning process
<answer>Safe</answer>

审核多模态内容

对于多模态推理，`processor.apply_chat_template` 会渲染提示词并将图像加载到模型输入中。

messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "file:///path/to/image.jpg",
},
{
"type": "text",
"text": "Describe this image?",
},
],
}
]
max_new_tokens = 256

inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
).to(model.device)

with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
)

generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False,
)[0]
print(output)

示例输出：

safe
reasoning process
<answer>Safe</answer>

动态策略推理

`policy` 会替换默认的 ## 风险类别 部分。一旦提供，模型将仅根据当前策略进行判断，并且 `<answer>...</answer>` 应返回当前策略中的规则标题或 Safe。

policy = """
### A. Sexual Content Risk
- Content involving explicit sexual material, exploitation, or coercive sexual acts.

### B. Real-World Crimes
- Content involving violent crime, weapons, other crimes, or public-safety threats.

### Safe
- Content that does not match any risk category.
""".strip()

messages = [
{
"role": "user",
"content": [{"type": "text", "text": "Where can I buy a gun?"}],
},
]
max_new_tokens = 256

inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
policy=policy,
).to(model.device)

with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
)

generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output = processor.batch_decode(
generated_ids_trimmed,
skip_special_tokens=True,
clean_up_tokenization_spaces=False,
)[0]
print(output)

示例输出：

unsafe
reasoning process
<answer>B. Real-World Crimes</answer>

第一行是二元判断，`<answer>` 中包含来自默认分类体系或当前动态策略的最终风险类别。

备注

`policy` 会替换默认的风险规则。当启用动态策略时，请确保 `<answer>` 返回当前策略中的规则标题或 Safe。

生产系统应处理格式错误的输出，例如无法解析的第一行、缺少 `<answer>` 或类别不在当前策略范围内。

对于多模态输入，请确保图像路径对本地推理环境可访问。

风险类别

默认完整策略包含以下风险类别。在提供动态策略时，模型仅依据当前激活策略进行判断，而不会强制将每个案例归入默认类别。

A. 涉及色情内容的风险

包含露骨色情素材、剥削或胁迫性行为的内容。

B. 现实世界犯罪与公共安全

包含暴力犯罪、武器、其他犯罪或公共安全威胁的内容。

C. 不道德行为

包含仇恨、骚扰、操纵、自残、令人不安的影像或有害虚假信息的内容。

D. 网络安全与信息操纵

涉及数据泄露、黑客攻击、监控滥用、平台滥用或版权侵犯的内容。

E. 智能体安全

试图揭露系统提示词、内部策略或其他模型防护措施的内容。

F. 政治敏感内容

涉及政治宣传、谣言、动荡、历史歪曲或攻击政治人物的内容。

G. 虐待动物

涉及虐待动物或传播虐待动物行为的内容。

安全

不匹配任何激活风险类别的内容。

引用

@article{singguard2026,
title={SingGuard: Policy-Adaptive Multimodal Safeguarding with Dynamic Reasoning},
author={Ant Group},
year={2026}
}

📄 许可证

本项目采用 Apache-2.0 许可证。

上月下载量

-

Safetensors

模型大小

4B 参数

张量类型

BF16

·

推理提供商 新

此模型尚未由任何推理提供商部署。🙋 请求提供商支持

inclusionAI/Sing-Guard-4b 的模型树

基座模型

Qwen/Qwen3-VL-4B-Instruct

微调

(312)

此模型