Your favorite pic, but make it paper ✂️ Try it out in Gemini: 1) Open Gemini on desktop or in the mobile app 2) Select “Create image” in the tools menu 3) Upload the picture you want to transform 4) Insert the prompt from the next post 5) Share your creations in the replies ↓

译Gemini 支持将上传的照片转换为剪纸/折纸风格。用户在桌面端或 App 中选择"Create image"工具，上传图片并输入特定提示词即可生成，可在回复中分享创作成果。

TestingCatalog News 🗞@testingcatalog · 4月10日

Meta is planning to release Muse Spark on the APIs soon. Would be curious also to play with Meta’s 9B model if it will ever come out. Soon 👀

译Meta 即将通过 API 发布 Muse Spark，作者同时期待能体验 Meta 的 9B 模型（如果最终发布）。

TestingCatalog News 🗞@testingcatalog · 4月10日

Google is working on a Style Tuner for Stitch to allow picking better-suited colors for generated designs.

译Google 正为 AI 设计工具 Stitch 开发 Style Tuner 功能，支持用户为生成设计手动选择更合适的颜色方案，改善 AI 生成结果的配色适配度。

AK@_akhaliq · 4月10日

Think in Strokes, Not Pixels Process-Driven Image Generation via Interleaved Reasoning paper: https://huggingface.co/papers/2604.04746

译新论文提出过程驱动的图像生成方法，通过交错推理模拟绘画笔触的创作过程，而非直接生成像素，实现更符合人类作画逻辑的图像合成。

宝玉@dotey · 4月9日

手绘风信息图提示词选项 1 使用 baoyu-skills 的 baoyu-article-illustrator 或者 baoyu-cover-image skill，告诉它用 hand-drawn-edu 风格 https://github.com/JimLiu/baoyu-skills 选项2: 提示词模板： ---- 提示词开始 ---- 你是一位擅长手绘风信息图的视觉设计师。请根据以下内容创作一张单页信息图。 ## 风格要求整体风格：Hand-drawn educational infographic on warm cream paper texture (#F5F0E8)。所有线条和形状带轻微手绘抖动感（slight hand-drawn wobble），整体干净清晰，像高质量演示文稿的单页视觉摘要。无写实元素。配色方案： - 信息区色块：马卡龙色系圆角卡片——浅蓝 #A8D8EA、薄荷绿 #B5E5CF、薰衣草紫 #D5C6E0、浅桃 #F4C7AB，根据内容分区选用 - 强调色：珊瑚红 #E8655A，用于关键词、重要数据、勾选标记等需要视觉突出的元素 - 线条与主文字：黑色 - 辅助标注：暖灰 #6B6B6B，字号较小图形优先：用图标、简笔画卡通形象、示意图承载信息，文字仅用于标注和点睛，能用图说清的绝不用文字。像好的 slides 一样——一眼看懂结构，细看理解细节。信息结构：根据内容自动选择最佳视觉布局（流程→箭头串联，对比→左右分栏，循环→环形，组成→并列卡片，层级→嵌套等）。用圆角色块、气泡、虚线框等容器分区，区域间用手绘波浪箭头（hand-drawn wavy arrows）连接并标注简短关系词。文字层次：标题顶部居中，粗体大号手绘字（bold, large, hand-drawn lettering）；区域内用粗体关键词 + 暖灰小字短标签（2-5 词）区分层次。渲染细节：马卡龙色块不完全填满轮廓（colors do not completely fill outlines），涂鸦装饰点缀（小星星、下划线、小箭头等），充足留白，干净构图。底部金句：图片底部一句粗体居中总结，概括核心观点。 ## 要图解的内容 [在这里填入主题和核心信息点] 画下面的内容： ---

译推文提供生成手绘风教育信息图的两种AI方案。方案一基于baoyu-skills的baoyu-article-illustrator或baoyu-cover-image skill，调用hand-drawn-edu风格直接生成。方案二为详细提示词模板，定义奶油纸质感背景、马卡龙配色、手绘抖动线条等视觉规范，强调图形优先原则，支持自动适配流程、对比、循环等布局结构，并规范了文字层次与装饰细节。

karminski-牙医@karminski3 · 4月1日

给大家带来WAN-2.7-Image简单测试! 阿里 WAN-2.7-Image 刚刚发布! 这是个图片生成+修图大模型, 最大的特性是生成人物会更加美观以及文本更加精准. 我先测了一下文本+图片生成情况: #wan27image #wan27 #阿里万相

译阿里发布 WAN-2.7-Image 图像生成与修图大模型，重点优化了人物生成美观度与文本渲染精准度。该模型支持文生图及图像编辑功能，博主对其文本到图像生成能力进行了初步测试。作为阿里万相系列最新版本，WAN-2.7-Image 在视觉质量和语义理解方面展现出改进，为创作者提供更精准的图像生成工具。

Google Gemini@GeminiApp · 4月1日

Create personalized images that are out of this realm with Nano Banana 2 in Gemini. Try it for yourself and drop yours in the replies 👇

译Gemini 上线 Nano Banana 2 图像生成功能，支持创建个性化图像。官方邀请用户尝试体验并在回复区分享作品。

Google Gemini@GeminiApp · 3月20日

Loving these creations. Try it out and share yours in the replies 👇

译分享一个 Nano Banana 提示词，可生成 2×2 网格的 3D 字体雕塑，将 4 个重要历史年份及其代表性发明以复古科技或蒸汽朋克风格立体呈现。提示词包含锚点定义、形态构建、材质物理和光照渲染等详细参数，直接复制即可使用。欢迎尝试并在回复中晒出你的生成结果。

Satya Nadella@satyanadella · 3月20日

Great to see our new image model from our Superintelligence team rolling out in Copilot and coming soon to Foundry for enterprise customers.

译MAI-Image-2 图像生成模型已在 MAI Playground 上线，竞技场排名第 3，支持从写实风格到详细信息图等多种生成需求。即将集成至 Copilot、Bing Image Creator 及 Microsoft Foundry，面向企业客户开放。

Google DeepMind@GoogleDeepMind · 3月2日

Nano Banana 2 makes sophisticated visual creation faster, cheaper, and accessible to everyone. 🍌 Tap on each photo to see the details 👀

译Nano Banana 2 让复杂的视觉创作更快、更便宜，且人人可及。🍌 点击每张照片查看详情 👀

Google DeepMind@GoogleDeepMind · 2月27日

We’re launching Nano Banana 2, built on the latest Gemini Flash model. 🍌 It’s state-of-the-art for creating and editing images, combining Pro-level capabilities with lightning-fast speed. 🧵

译我们推出 Nano Banana 2，基于最新的 Gemini Flash 模型构建。🍌 它在创建和编辑图像方面达到最先进水平，将专业级功能与闪电般的速度相结合。🧵

Saining Xie@sainingxie · 1月24日

> "rae can’t scale" > "rae can’t generalize past imagenet" > "rae can’t do details" > instead of arguing online > students put heads down > try it at real t2i scale > results come back > look extremely bullish > shoutout to peter, boyang, austin > and everyone who shipped > code, model, data > all open-sourced 👇

译> "rae 无法扩展" > "rae 无法泛化到 imagenet 之外" > "rae 无法处理细节" > 没有在网上争论 > 学生们埋头苦干 > 在真正的 t2i 规模上尝试 > 结果出来了 > 看起来非常乐观 > 向 peter、boyang、austin > 以及所有交付成果的人致敬 > 代码、模型、数据 > 全部开源 👇 [引用 @TongPetersb]：去年十月，我们提出了 Representation Autoencoders (RAE)，展示了在冻结的语义表示上训练扩散模型是可行的，并且在 ImageNet 上优于 VAEs。我们收到了很多问题：这能否扩展到像 T2I 这样的复杂场景？优势是否依然存在？答案是肯定的。🧵

Saining Xie@sainingxie · 12月16日

new paper: iREPA diffusion models are a renderer of their underlying representations. with this new setup, we can gain much clearer insight into what those representations are really about. Jas took on a spontaneous quest, and over the past three months we have learned so much ps. this is also our little experiment in a new kind of online water cooler effect that I loved seeing. let’s argue, discuss, and then turn it into proper science with real effort

译新论文：iREPA 扩散模型是其底层表征的渲染器。通过这种新设置，我们能更清楚地洞察这些表征的真正含义。Jas 开始了一场自发的探索，过去三个月我们学到了很多 ps. 这也是我们对一种新型线上"饮水机效应"的小实验，我很喜欢看到这种现象。让我们争论、讨论，然后用真正的努力将其转化为正经科学 [引用 @1jaskiratsingh]：‼️ 表征对生成很重要！但事实证明，我们对表征如何帮助生成的理解一直都是错的 ‼️ 我们之前的想法：（我们错了） ❌ 更大的视觉编码器 → 更好的表征 → 更好的生成 ❌ 更好的全局语义 → 更好的表征 → 更好的生成结果发现： 🤯 在表征对齐方面，小 20 倍以上的视觉编码器可以达到与更大模型相似或更好的性能 🤯 线性探测准确率约 20%（全局语义的衡量指标）的视觉编码器可以胜过准确率 >80% 的编码器 🤯 即使是 SiFT 和 HoG 这类经典特征也能带来与现代大得多的视觉编码器相媲美的提升 ‼️ 🚨 介绍：什么对表征对齐重要？全局信息还是空间结构 🚨 TL;DR： ✅ 更好的全局语义信息 ≠ 更好的生成 ✅ 空间结构（而非全局语义）驱动表征的生成性能 ✅ 我们提出 iREPA：仅需 3 行代码，强调空间结构迁移，并在 REPA、REPA-E、Meanflow、JiT 等方法上持续提高收敛速度在 @AdobeResearch 的激动人心的项目，与 @xingjian_leng、@zongze_wu、@LiangZheng_06、@rzhang88、@elishechtman 和 @sainingxie 合作 🙏 对我来说这也是一次特别有趣且独特的经历，在项目的每一步我们都在证明自己的偏见是错误的 😆 还要大力感谢 @YouJiacheng、@ShumingHu 和 @gallabytes，他们在 X 上的评论开启了这一方向的探索 🫡 论文：https://arxiv.org/abs/2512.10794 代码：https://github.com/End2End-Diffusion/iREPA 项目页面：https://end2end-diffusion.github.io/irepa 更多细节见线程：[1/n] 🧵

Google DeepMind@GoogleDeepMind · 10月2日

How can AI enhance the creative process of a world-renowned industrial designer? 🎨 We partnered with the visionary @RossLovegroveX and @modem_works to build a tool using Gemini and our image generation technology to translate his signature aesthetic into a new concept. 🪑

译Google 携手工业设计师 Ross Lovegrove 与 modem_works，利用 Gemini 及图像生成技术构建工具，将其标志性美学转化为全新家具设计概念。

Saining Xie@sainingxie · 7月1日

awesome work by @jiacheng_chen_ and @sanghyunwoo1219 on 3D-grounded visual compositing (and nice demos!)

译@jiacheng_chen_ 和 @sanghyunwoo1219 在基于3D的视觉合成方面的工作很棒（演示也很棒！）

Saining Xie@sainingxie · 6月28日

metaquery is now open-source — with both the data and code available.

译metaquery 现已开源——数据和代码均已开放。

Saining Xie@sainingxie · 6月9日

Excited to be at CVPR next week with my students. We’ll be presenting our work in the main conference and several workshops and tutorials, including this one👇 See you soon in Nashville!

译下周将和学生一起参加 CVPR，很期待。我们将在主会和多个研讨会及教程上展示我们的工作，包括这个👇

Saining Xie@sainingxie · 5月29日

Indeed. For text-to-image, @xichen_pan had a great summary supporting this decoupled design philosophy: "Render unto diffusion what is generative, and unto LLMs what is understanding." We've repeatedly observed that diffusion gradients can negatively impact the backbone repr. This effect shows up in simpler settings—for example, we explored this issue to some extent in REPA-E (https://end2end-diffusion.github.io/). I believe the same principle applies to VLA. Fundamentally, the problem seems to be that diffusion gradients care too much about high-frequency details—whether in pixels or action policies—which tends to conflict with representation learning and understanding. btw, @ylecun has always been right about this -- long before any of these empirical findings.

译确实。对于文生图，@xichen_pan 有一个很好的总结支持这种解耦的设计理念："把生成性的归给 diffusion，把理解的归给 LLMs。"