Top 20 Startups by Web Traffic founded since 2020 1. DeepSeek 2. Perplexity 3. Suno 4. Polymarket 5. Gamma 6. ElevenLabs 7. Lovable 8. Arena 9. xAI 10. Supabase 11. Manus 12. Higgsfield 13. Cursor 14. Fanvue 15. OpenRouter 16. GPTZero 17. Genspark 18. ShopMy 19. Venice 20. Whop Some interesting observations: — Only 25% were not AI: Polymarket, Supabase, Fanvue, ShopMy, Whop — 20% were acquired — Startups that didn't surprisingly didn't make the cut: Kalshi (founded 2018), Mistral (10M), OpenEvidence (11.4M), Cognition — All but 2 are unicorns (GPTZero, Fanvue), 7 decacorns, but there's no clear correlation between traffic and valuation

译Deedy Das 列出 2020 年以来按网站流量排名前 20 的初创公司：DeepSeek、Perplexity、Suno 领衔。仅 25%（Polymarket、Supabase、Fanvue、ShopMy、Whop）非 AI；20% 已被收购；未上榜的知名公司包括 Kalshi（2018 年成立）、Mistral（1000 万月访问量）、OpenEvidence（1140 万）、Cognition；除 GPTZero 和 Fanvue 外均为独角兽，其中 7 家为十角兽，但流量与估值无明显关联。

SemiAnalysis@SemiAnalysis_ · 23小时前57

This week the InferenceX team discusses what it took to get DeepSeek V4 on InferenceX, changes in the model architecture, what is a MegaKernel, and initial performance on various accelerators including Huawei Ascend NPUs.

译本周 InferenceX 团队讨论了将 DeepSeek V4 部署到 InferenceX 所需的工作、模型架构的变化、什么是 MegaKernel，以及在包括华为昇腾 NPU 在内的各种加速器上的初始性能。

Rohan Paul@rohanpaul_ai · 2天前59

Reuters: Chinese models charge as little as 18 cents per million tokens versus $4 average for top models, says CitiBank Research. Open-source processing on OpenRouter rose to 65% in June from 34% in January, while Chinese models such as DeepSeek gained attention by offering much lower token prices. - CitiBank Research. Cheaper AI has become the new enterprise priority as usage-based bills turn model choice from a capability contest into a cost-control problem. Gartner estimates AI coding costs will pass the average developer salary by 2028. OpenAI and Anthropic now face price pressure because enterprise buyers can compare models task by task rather than treat the biggest model as the default choice. --- reuters. com/business/retail-consumer/cheaper-ai-is-better-soaring-bills-are-reshaping-how-businesses-choose-models-2026-06-29/

译花旗研究数据显示，中国模型每百万token收费低至18美分，而顶级模型均价4美元。OpenRouter上开源模型处理占比从1月34%升至6月65%，DeepSeek等中国模型因低价受关注。Gartner预测AI编码成本2028年将超普通开发者薪资。按用量付费使企业从“选最强模型”转向成本控制，OpenAI和Anthropic面临逐任务比价压力。前Meta PM及Perplexity CEO指出，中国能更快建设数据中心，电力、许可、人力、专业能力均不成问题，进一步压低成本。

karminski-牙医@karminski3 · 3天前57

DeepSeek真的是性价比和技术双重斩杀线... 有同学看不懂DSpark是啥, 简单给大家写个小教程讲讲. 推测性解码(投机解码)这个技术是用来提升大模型输出速度的. 本质是让小模型给大模型接话, 大模型判断小模型说的对不对. 因为现在模型普遍卡内存带宽, 而GPU算力是富余的, 所以大模型的prefill速度(看字)比decode速度(吐字)快很多. 那么让小模型沿着大模型的思路先说一段话, 大模型判断对不对(只需要看字), 只要小模型猜对了, 那么这就利用了prefill速度, 吐字就会成倍的提升. 但问题来了, 外挂小模型也要看字(prefill), 也要占用显存, 也要吃显存带宽. 那么有没有更好的方法来解决呢? 来了, 这就是DSpark. 看我的这个图(左侧DSv4架构图是 @rasbt 大佬的), DSpark 接在了 Final RMSNorm 过程中. 不是接一个完整的小模型, 而是一个3 层的MTP(多Token预测)微型Transformer堆叠. 大模型算完前面60多层后, 刚把当前这句话的"高浓缩概念"(特征向量/隐藏状态)推到 Final RMSNorm 这个出口，还没来得及翻译成具体文字时，DSpark开始截胡: 首先是半自回归极速脑补 (MTP + Markov Head), DSpark自己有一丢丢参数, 然后它就瞬间并行猜5个字(特征向量), 然后再用自己内部的一个串行网络理顺逻辑. (注意啊,先并行然后串行消除并行导致的逻辑不连贯). 然后, 它会有一个置信度预测头, 预判自己猜的准不准, 比如5个字的后2不准就直接砍掉, 防止后续送回大模型浪费算力. 最后把留下的3个字塞回词表映射层, 把向量翻译为token. 到此为止DSpark工作就做完了. 然后就是大模型扫一遍DSpark输出的对不对(只用prefill，不decode), 一旦正确了, 就直接吐字, 这样之前模型一次只能吐一个字, 现在就能吐3个字了! 最后, 推测性解码是不会降智的, 速度能提升60%-85%! 之前是雇一个小模型帮忙写草稿, 现在则是直接脑子里植入芯片了. 目前SGLang已经有这个特性的PR了(29538), 而且DeepSeek刚在自己的HuggingFace主页发了一大堆小模型的DSpark魔改版. 大胆猜一波未来发布的模型会不会标配DSpark? #dspark #deepseek #投机解码 #推测性解码

译DeepSeek推出的DSpark是一种推测性解码技术，通过在Final RMSNorm后接入3层MTP微型Transformer堆叠，让大模型在输出前并行猜5个token，经置信度头剪裁后，送回大模型用prefill验证，正确则一次性吐出多个token。相比外挂小模型更高效，不降智，速度提升60%-85%。目前SGLang已有相关PR（#29538），DeepSeek已在HuggingFace发布多款DSpark魔改版小模型。

karminski-牙医@karminski3 · 3天前61

给大家带来 Flash 系列模型横评! 各个厂商除了旗舰级别模型, 也都有Flash级别的模型, 而这些模型的定位主要都是多智能体系统的驱动模型和RAG系统的驱动模型. 那么现有这些Flash模型应该怎么选? 给大家带来本篇评测! 本次主要从 Agent Loop 迭代能力, Agent 能力, 前端, 后端, 空间理解, 美学, 性价比等多个角度评测了 Gemini-3.5-Flash, Step-3.7-Flash, DeepSeek-V4-Flash 这三个模型. 从测试来看, Gemini-3.5-Flash 更适合干"漂亮活", 比如前端页面, 建模等. 而 Step-3.7-Flash 则极具性价比, 在Agent测试中取得了比旗舰模型还要高的Token效率(用最少的token干最多的事情). 所以特别适合用在Agent框架中(比如OpenClaw或者Hermes), 或者复杂的Agent系统中用来做驱动模型. DeepSeek-V4-Flash 则后端能力很不错, 很适合用来写脚本, 甚至给服务器安装一个 DeepSeek-V4-Flash 驱动的 ClaudeCode, 用来 AI-Ops. #flash模型 #step37flash #deepseekv4flash #gemini35flash #AgentLoop

译推文对三款Flash级模型（Gemini-3.5-Flash、Step-3.7-Flash、DeepSeek-V4-Flash）进行横评。这些模型定位为多智能体系统和RAG系统的驱动模型。评测维度包括Agent Loop迭代能力、Agent能力、前端/后端、空间理解、美学、性价比等。Gemini-3.5-Flash更适合前端页面、建模等“漂亮活”。Step-3.7-Flash极具性价比，在Agent测试中Token效率极高（用最少Token完成最多任务），适合作为OpenClaw、Hermes等Agent框架的驱动模型。DeepSeek-V4-Flash后端能力出色，适合写脚本或驱动ClaudeCode用于AI-Ops。

ginobefun@hongming731 · 3天前50

BestBlogs 早报 · 06-29 # OpenAI Codex / Cloudflare 机器人流量 / 陪伴机器人小伴 / 品味与策展 / 推测解码 DSpark [1] ★ 精讲｜OpenAI Codex 负责人谈产品工作的新版图：从实现成本到品味、策展与智能体工作流 [视频] OpenAI Codex 桌面端负责人 Andrew Ambrosino 在 Lenny's Podcast 聊了个实在判断：写代码的实现成本趋近于零后，产品工作的瓶颈不再是构建，而是品味与策展，真正昂贵的不再是实现而是判断力。他接着讲角色边界如何融合、长周期路线图为何会变成虚假精确、以及把半年到一年目标刻意留得模糊的野心式分阶段打法。对正在想产品经理和工程师边界的人，是值得对照的一手视角。来源：Lenny's Podcast https://www.bestblogs.dev/video/6daf60e [2] ★ 精讲｜#603.Cloudflare CEO：当机器人流量超过人类，互联网的商业模式将彻底崩塌 [播客] Cloudflare 联合创始人兼 CEO Matthew Prince 给出硬数据：2026 年上半年平台机器人流量首次超过人类，五年后可能是人类的一千倍。他的推论是，互联网过去 28 年靠广告的模式撑不住，因为「机器人不会去点广告」，总得有人买单。他还谈到裁掉超过 20% 的团队、管理幅度从 6 比 1 变 12 比 1，以及用 Agent 审查每次代码发布。对关心 AI 时代基础设施和组织变化的人，信息密度很高。来源：跨国串门儿计划 https://www.bestblogs.dev/podcast/352bbef [3] ★ 精讲｜我遇到了第一个真正想买的陪伴机器人！｜对话世博：越伴动力创始人【公路播客】 [播客] 越伴动力创始人世博被称作「少年版稚晖君」，大一开始手搓过 30 多款机器人。这次他做的陪伴机器人「小伴」不讲人话，而是用像「外星语」的声音表达情绪，还会撒娇、委屈、拒绝你。他给出三条判断：陪伴不是讨好、生命力不是可爱、少就是多。技术上端侧快脑 1.7B 加慢脑 7B，把延迟压到 0.4 秒以内，全身九成以上是柔软材质。对关注具身智能和情感机器人的人，是很具体的产品取舍记录。来源：十字路口 Crossing https://www.bestblogs.dev/podcast/b29f231 [4] 最新！万字综述 Prompt 到 Loop 进化本文系统梳理 AI 开发范式从 Prompt Engineering 到 Loop Engineering 的演进历程，阐述各阶段核心思想、技术栈与工程实践，并提出循环设计师的角色定位。来源：Datawhale https://www.bestblogs.dev/article/a41eb439 [5] Grok 4.5 在 SpaceX 与特斯拉开启私测：性能逼近 Opus 埃隆·马斯克宣布 Grok 4.5 已在 SpaceX 和特斯拉启动私测，该模型基于 1.5T 参数的 V9 基础模型并融入 Cursor 数据，早期评估显示其性能接近甚至超越 Opus。来源：Elon Musk(@elonmusk) https://www.bestblogs.dev/status/2071184354756477041 [6] DeepSeek 发布 DSpark 推测解码框架，DeepSeek-V4 单用户生成速度较 MTP-1 提升 60–85% DeepSeek 发布 DSpark 推测解码框架，通过半自回归草稿生成与负载感知调度，使 DeepSeek-V4 单用户生成速度提升 60-85%。来源：MarkTechPost https://www.bestblogs.dev/article/04ce0133 [7] 我们构建了一个路由层来削减 AI 成本，结果搞砸了产品某团队构建路由层将 AI 推理成本降低 60%，但廉价模型的质量损失数月未被发现，导致客户满意度下降与流失——最终代价是节省金额的 4–5 倍。来源：Towards Data Science https://www.bestblogs.dev/article/a676552d [8] LLM 有欲望吗？— LessWrong 本文提供了实验证据，表明 LLM 在成对选择测试中报告的偏好并不会像「努力激励」或「角色扮演」那样驱动其行为，这表明这些偏好并非真实的欲望。来源：LessWrong https://www.bestblogs.dev/article/6c941c48 [9] Seedance 之后，视频 Agent 何去何从？｜对话 OiiOii 闹闹，拆解视频模型的秘密：数据、生态与感性的结构化 [播客] 从产品操盘手视角深度拆解 AI 视频模型战局，解析 Seedance 技术路径、大厂数据生态护城河与视频 Agent 突围的独特价值。来源：卫诗婕｜漫谈 Light the Star https://www.bestblogs.dev/podcast/edf5027 [10] 2000 人尝试黑掉我的 AI 助手之后发生了什么 — Fernando Irarrázaval 在 2000 多人发起的 6000 多次提示词注入攻击下，基于 Claude Opus 4.6 的 AI 助手成功保护了 secrets.env 文件免遭泄露，证明了模型级安全训练与简单指令的有效性。来源：Hacker News https://www.bestblogs.dev/article/4a6061ae --- http://BestBlogs.dev · 发现真正适合你的高质量内容 BestBlogs 是 AI 驱动的私人阅读助手，帮助你发现真正适合你的高质量内容，欢迎体验。在线阅读：https://www.bestblogs.dev/explore/brief/2026-06-29

译OpenAI Codex负责人称，代码实现成本趋近于零后，产品瓶颈转向品味与策展。Cloudflare CEO预测2026上半年机器人流量超人类，五年后或达千倍，广告模式难以为继，公司已裁20%团队并扩管理幅度至12:1。陪伴机器人“小伴”采用端侧1.7B+7B模型，延迟压至0.4秒。Grok 4.5在SpaceX/特斯拉私测，基于1.5T参数V9模型，性能接近Opus。DeepSeek发布DSpark推测解码框架，DeepSeek-V4单用户生成速度提升60-85%。路由层降AI成本60%但质量损失代价为节省额4-5倍。2000人次6000多次提示词注入攻击下，Claude Opus 4.6成功保护secrets.env文件。

Berryxia.AI@berryxia · 4天前50

兄弟们，DeepSeek开源了DSpark！一个投机解码框架，不是新模型，是推理优化。核心问题：传统投机解码里，一个小的draft模型先猜一串token，然后大模型一次性验证。问题是猜的越后面越容易错，验证错误的猜测也浪费GPU算力。 DSpark的解法： 1. 并行backbone + 顺序head混合。纯并行猜测速度快，但后面的token会衰减，因为每个位置猜的时候不知道前面实际采样了什么。 DSpark加了一个小的Markov head，用前一个token调整当前猜测，解决了后缀衰减问题。 2. 置信度调度。加了一个置信度head，估算每个draft token的存活概率。再配合一个负载感知调度器，GPU空闲时多验证几个token，忙碌时少验证。不是所有猜的token都值得检查，只检查那些可能正确的部分。效果：在DeepSeek-V4生产环境中，单用户生成速度比MTP-1基线快60-85%。不同场景下吞吐提升1.5x到5x。开源内容： - 模型checkpoint：`DeepSeek-V4-Pro-DSpark` 和 `DeepSeek-V4-Flash-DSpark`，复用现有V4权重，附加draft模块 - 训练代码：MIT协议的DeepSpec代码库 - 与北京大学联合开发为什么重要：投机解码一直被认为"理论好但实战难"。 DSpark证明了在真实生产系统中，投机解码能稳定提速60%以上，而且不影响输出质量。 DeepSeek已经部署在生产环境里了。

译DeepSeek 开源 DSpark，一个面向生产环境的投机解码框架。核心解决传统投机解码中 draft 模型猜测后期 token 错误率高、浪费算力的问题。DSpark 采用并行 backbone + 顺序 Markov head 混合架构，消除后缀衰减；并引入置信度 head 和负载感知调度器，动态控制验证数量。在 DeepSeek-V4 生产系统中，单用户生成速度比 MTP-1 基线快 60-85%，吞吐提升 1.5x 至 5x。开源内容包括基于 V4 权重的 `DeepSeek-V4-Pro-DSpark`/`Flash-DSpark` checkpoint，以及 MIT 协议的 DeepSpec 训练代码，与北京大学联合开发。

Chubby♨️@kimmonismus · 4天前45

The release of GLM-5.2 is the second DeepSeek moment.

译中国开源权重模型 GLM-5.2 发布，被评价为继 DeepSeek 之后的第二次“DeepSeek 时刻”。有评论指出，其性能已可与 OpenAI 和 Anthropic 当前可用模型媲美。白宫 AI 负责人 David Sacks 就此警告，美国若继续将自身模型置于“炼狱”（指过度监管或限制），世界将转向中国技术，美国公司将在竞赛中落后。该言论呼应了此前 DeepSeek 开源模型的全球影响，凸显中美 AI 开源竞争进入新阶段。

Rohan Paul@rohanpaul_ai · 5天前69

🇨🇳🇺🇸Chinese AI models are up to 50 times cheaper than their American counterparts on a per-token basis. Especially Qwen, DeepSeek and Kimi, which pressures OpenAI and Anthropic pricing. - From J.P. Morgan report titled "Semiquincententacles: the US grip on global markets at 250" found that The report said Chinese firms accounted for over 45% of all traffic on the AI aggregation platform OpenRouter by April 2026, up from under 2%in late 2024. Some other findings from the report. - Enterprise AI tokens may become commoditized, because many business tasks do not need frontier models and can run on smaller open models. - AI has driven 65%-80% of S&P 500 returns, profits and capex since ChatGPT, creating clear signs of investor overexcitement in semiconductors. - NVIDIA still dominates AI accelerators, but custom chips from Google, Amazon, Microsoft and Meta are gaining because they can cut total cost by 30%-40%. - China is catching up in AI, with better models, rising GPU self-sufficiency and possible chip-scaling workarounds despite export controls. - Taiwan is the US AI system’s weak point, because TSMC supports much of the world’s advanced chip supply while Taiwan is highly exposed to energy and food blockades.

译J.P. Morgan报告显示，中国AI模型每token比美国便宜50倍，Qwen、DeepSeek、Kimi施压OpenAI和Anthropic定价。到2026年4月，中国公司在OpenRouter流量占比将从不足2%升至超45%。报告还指出企业AI token将商品化，多数任务无需前沿模型；AI已驱动标普500回报的65%-80%；NVIDIA仍主导AI加速器，但定制芯片可降总成本30%-40%；中国GPU自给率提升。UBS调查发现，60%监控AI预算的企业已转向更便宜模型，通过模型路由将简单任务分流至Qwen、DeepSeek、MiniMax等开源模型，以应对最高$35K/月账单及团队超配额200%的压力。

Rohan Paul@rohanpaul_ai · 5天前54

Fantastic, @deepseek_ai just published their new inference optimization method. Proposes DSpark, a semi-parallel speculative decoding system that gave DeepSeek-V4 about 60% to 85% faster per-user generation at matched throughput. The biggest idea in DSpark is that faster inference is not just about drafting more tokens, but about deciding which drafted tokens are worth checking. Speculative decoding already had the basic trick: a smaller draft model guesses several next tokens, then the real model checks them in 1 pass. The problem is that long draft blocks often waste work, because later guesses are more likely to be wrong, and checking bad guesses still uses GPU capacity. DSpark’s breakthrough is to make this process selective: it drafts a block, scores how likely each prefix is to survive, then verifies only the part that is likely to pay off. The mechanism has 2 linked parts: a strong parallel draft model makes many token guesses quickly, then a tiny Markov head adjusts each guess using the token right before it. That small sequential piece matters because pure parallel drafting are fast, but their later tokens decay because each position guesses without knowing what the earlier sampled token actually was. i.e. Fully parallel drafters guesses every position too independently, which can create bad token combinations later in the block. Then the confidence scheduler estimates how many drafted tokens should be checked for each request, based on both acceptance chance and current GPU load.

译DeepSeek 提出 DSpark，一种半并行推测解码系统，使 DeepSeek-V4 在相同吞吐量下每用户生成速度提升约 60% 至 85%。核心创新在于选择性验证：草稿模型并行生成多个候选 token，再由一个小型马尔可夫头根据前一个 token 微调每个猜测，弥补纯并行推测后段 token 组合质量下降的缺陷。置信度调度器基于接受概率和 GPU 负载，动态决定每个请求需验证的 token 数量，避免无效计算。

Yuchen Jin@Yuchenj_UW · 5天前38

DeepSeek is the GOAT. 🐳 They just published DSpark, a new speculative decoding method that boosts throughput by 51% to 400%. They also open-sourced DeepSpec, the training framework behind it. This is the real open AI.

译DeepSeek 是 GOAT。🐳 他们刚刚发布了 DSpark，一种新的推测解码方法，将吞吐量提升 51% 到 400%。他们还开源了背后的训练框架 DeepSpec。这才是真正的开放 AI。

凡人小北@frxiaobei · 5天前63

DeepSeek V4 进行了一次更新。新推出了投机解码（Speculative Decoding）框架 DSpark，推理速度提升 80%。 DSpark 已被部署在 DeepSeek-V4（Flash 和 Pro）的真实线上流量中。报告：《DSpark: Confidence-Scheduled Speculative Decoding with Semi-Autoregressive Generation》 https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf

Orange AI@oran_ge · 6天前62

最近几个对模型的反直觉的观察 1. GLM 5.2 正在取代 Claude sonnet 和 Opus，成为付费用户最爱的模型 2. DeepSeek v4 Pro 依然是大众里最受欢迎的模型 3. GPT 5.5 虽然很强大，但几乎没人用观测的方式的是看 cola 的 token 消耗统计这也侧面说明 cola 和 codex 用户（GPT5.5）的画像是完全不同的

译推文分享了三个反直觉的模型观察：GLM 5.2 正在取代 Claude Sonnet 和 Opus 成为付费用户最爱；DeepSeek v4 Pro 仍是大众最受欢迎模型；GPT 5.5 虽然强大但几乎无人使用。数据来源为 cola 的 token 消耗统计，侧面说明 cola 和 codex（GPT 5.5 用户）画像完全不同。

Rohan Paul@rohanpaul_ai · 6天前55

The Information reports that Anthropic’s Mythos preview spooked DeepSeek into fundraising. Because, CEO Liang Wenfeng realized it needed a much bigger cash pile to compete. And DeepSeek now plans to double its workforce too. AI competition is no longer just about model cleverness; it is about compute reserves, talent density, secure infrastructure, product surface area, and enough cash to survive several failed training runs. --- theinformation .com/articles/anthropics-mythos-spooked-deepseek-prompting-7-4-billion-fundraising

译The Information报道，Anthropic的Mythos预览版让DeepSeek感到震惊，CEO梁文峰意识到需要更大现金储备来竞争。DeepSeek随即启动74亿美元融资，并计划将所有部门员工数量翻倍，招聘覆盖AI核心研发、算法、深度学习、全栈开发和产品岗位，表明DeepSeek正从仅调模型转向构建完整系统。AI竞争已变为计算储备、人才密度、基础设施、产品表面积和现金储备的综合较量。

Chubby♨️@kimmonismus · 6天前61

I think many people are not yet aware of the tectonic shift taking place. By preventing state-of-the-art capabilities - at least insofar as we are able to use them - open source becomes not only more attractive for one’s own applications, but more attractive overall. This also applies, for example, to entire states such as the European Union, provided it genuinely sets itself the goal of implementing AI. This not only attracts new investments and fresh capital, but also creates an opportunity for a PR coup that draws many idealists toward open-source companies. In that sense, I can well imagine that companies like OpenAI and Anthropic will suffer the most from regulation by the U.S. government - on two fronts - and that this indirectly does open source a tremendous service, especially given that it now primarily comes from China.

译Kim 指出，美国政府限制前沿AI能力（阻止SOTA被使用），反而让开源模型更吸引自有应用和整体市场，欧盟等国家也可受益。这吸引新投资与理想主义人才，OpenAI 和 Anthropic 将最受监管反噬，间接助推开源（尤其来自中国）。引用称，Anthropic 4月预览 Mythos 后，DeepSeek 因无法竞争而融资74亿美元；此前该实验室靠 CEO 梁文锋个人财富，现有约300人，计划至少翻倍。

Chubby♨️@kimmonismus · 6天前68

Anthropic’s Mythos preview reportedly pushed DeepSeek into a $7.4B fundraising - because they could not compete with Mythos. Until now, the three-year-old Chinese AI lab had relied on CEO Liang Wenfeng’s personal wealth instead of outside capital. The Information reports the shift came after Anthropic’s April preview of Mythos. Liang reportedly concluded DeepSeek could not compete at that level without a much larger war chest. DeepSeek is now expanding aggressively: around 300 employees today, with plans to "at least double" headcount across AI systems, infrastructure, product and research.

译Anthropic 4 月预览的 Mythos 模型据报迫使 DeepSeek 转向外部融资，筹集 74 亿美元。此前这家成立三年的中国 AI 实验室一直靠 CEO 梁文锋个人财富运营。The Information 报道称，梁文锋认为 DeepSeek 若无更大资金储备将无法与 Mythos 竞争。DeepSeek 正激进扩张：当前约 300 名员工，计划在 AI 系统、基础设施、产品和研究部门至少翻倍人员规模。

Rohan Paul@rohanpaul_ai · 6天前61

Reuters: DeepSeek is going on a hiring sprint, aiming to double every department. The hiring spans AI core R&D, algorithms, deep learning research, full-stack development, and product roles, which means DeepSeek is no longer only tuning models but building the whole system around them. --- reuters .com/world/asia-pacific/chinas-deepseek-plans-least-double-staff-all-departments-2026-06-25/

译路透社：DeepSeek 正进行招聘冲刺，目标是每个部门人员翻倍。招聘涵盖AI核心研发、算法、深度学习研究、全栈开发和产品岗位，这意味着DeepSeek不再只是调模型，而是围绕模型构建整个系统。

Rohan Paul@rohanpaul_ai · 6天前64

UBS says 60% of companies now watching AI budgets are moving to cheaper models and open-source Chinese models The pressure is coming from extreme bills, including users spending up to $35K/month, teams exceeding quotas by 200%, and companies cutting internal AI tools from 5 to 2. Companies are not abandoning AI, they are using model routing, which sends easy tasks to cheaper models and saves premium models for hard reasoning, code, and long-context work. Chinese open-source models such as Qwen, DeepSeek, MiniMax, GLM, and Kimi now fit the enterprise cost curve because they can be run locally or used through cloud catalogs. --- news .futunn.com/en/post/75068082/ubs-group-finds-60-have-already-started-curbing-ai-spending?level=2&data_ticket=1780870170397383

译"UBS报告称，60%关注AI预算的企业正转向更便宜的模型和中国开源模型。用户月花费高达$35K，团队超配额200%，公司内部AI工具从5个削减至2个。企业采用模型路由策略，将简单任务分配给低成本模型，将复杂推理、编码和长上下文任务保留给高端模型。中国开源模型如Qwen、DeepSeek、MiniMax、GLM、Kimi因可本地部署或通过云目录使用，符合企业成本曲线。"

宝玉@dotey · 7天前42

帮转，DeepSeek 招多模态方向工程师研究员

译DeepSeek 正在招聘多模态方向的全职/实习岗位，包括多模态数据工程师（预训练数据工程师）以及多模态理解数据/算法研究员（图像与视频方向）。应聘者可通过私信或发送简历至 talent@deepseek.com 联系。

X.PIN@thexpin · 7天前61

http://x.com/i/article/2069762663366975488 # Tokenmaxxing is dying, and Chinese open-source models fill the gap Amazon, Meta, and Uber are capping the token spend as GLM-5.2 and DeepSeek give their models away for free. Over the past week, a new Chinese model called GLM-5.2 has set off another round of alarm in Silicon Valley. Released by the company z.AI under a permissive open-source license, it takes direct aim at the coding and agentic-workflow business that Anthropic has built its reputation on — and running on a one-million-token context window, it lands surprisingly close to Claude Opus 4.8 and OpenAI’s GPT-5.5. The open-source community is ecstatic. At the same moment, America’s “unlimited AI credits” mania is draining away. Amazon, Meta and others are killing their no-limits AI plans. After Uber’s engineers burned through a full year’s AI budget in four months, the company capped each employee at $1,500. Even Microsoft CEO Satya Nadella has warned that the industry can’t let a few AI giants swallow the whole economy. The link between open-source models and what people now call “Tokenmaxxing” is simple enough: programmers burn too many tokens, the bills get too big, and faced with a mountain of invoices, people reach for the open-source option. This is not the Tokenmaxxing takedown you’ve read on Substack, though. Because a few questions kept nagging at me. If open-source models can do the job, why is anyone still topping up their Claude account? And if everyone runs to open-source, how does anyone building a model make money? It was only after GLM-5.2 shipped that I arrived at a first answer. Both of these waves — the rush to open-source and the rush to burn tokens — come down to the same thing: how we decide to think about a token. ## Born Out of Scarcity Start with the open-source side, and start with GLM-5.2. Z.ai has released the core weights of GLM-5.2 under an unrestricted MIT license. Any company can download it free from Hugging Face, customize or fine-tune it, and run it locally or on a virtual machine. Standing the thing up is still a slog, but next to the now-delisted Fable 5, it’s a genuinely good option. The model was built on Huawei’s Ascend chips — no Nvidia hardware involved. But GLM-5.2 is not another DeepSeek. DeepSeek’s Liang Wenfeng came out of a quant fund, is worth billions, and has chosen near-total seclusion. (He recently put about $2.8 billion of fresh money into DeepSeek) Z.ai, by contrast, is an open-source model maker that’s already publicly listed in Hong Kong. It has no billionaire patron, and its road has been every bit as winding as DeepSeek’s. In 2020, BAAI’s Tang Jie argued the language model still deserved the effort. Of BAAI’s 480 A100 cards, 400 went to Tang’s team. Tang also tried Huawei’s 910A and 920 chips. On large-model training, the 920’s operator efficiency was just 18% of an A100’s; after Tang’s team helped rewrite the operators, they pushed it to roughly 40%, and trained a 13B code model, CodeGeeX. But Tang’s real goal was 100B-parameter model, even 2,000 910A cards weren’t enough. In the end, Tang turned to z.AI, the company he’d founded back in 2018, rented 1,000 cards. In July 2022, they finally had their hundred-billion model: GLM-130B. I tell his story because he embodies the type. Most of China’s open-source AI companies grew out of academic projects; they incorporated mainly because they needed to buy compute, and they open-sourced their architecture to keep their academic visibility. Starved of chips, they learned to adapt to whatever domestic silicon they could get. Z.ai wasn’t placed on the U.S. entity list until 2025, but it was already optimizing for Huawei chips in 2020. Localized compute and open architecture became, almost by default, the signature of Chinese AI. The open-source bet has its skeptics inside China, too. In 2024, Baidu founder Robin Li argued that closed models were more powerful and cheaper to run than open ones. His point being that closed models came with more compute and bigger teams, and that ERNIE was nearly a match for ChatGPT. (A little ironic, isn’t it?) ERNIE was not, in fact, in ChatGPT’s league, and China never produced a closed model strong enough to make Li’s case. Turning open-source into profit is a hard problem. In a 2025 interview, a z.AI expert described the company’s three possible lanes — inference, agentic, and coding — and said z.AI chose coding. MiniMax, by contrast, chose multimodal AI and AI companionship. At the time it wasn’t an obvious call: z.AI’s business leaned on enterprise and government contracts, coding showed no clear path to profit, and multimodal could win consumers directly. Z.ai was not the favorite. Then the AI-coding boom arrived. Z.ai’s latest results show a net loss of about ￥3.18B ($444M) against R&D spending of roughly ￥3.2B ($444M). Still in the red — but strip out the open-ended spend on compute, and z.ai’s revenue can cover day-to-day operations. If it can get cheaper chips, or use its chips more efficiently, or land a wave of enterprise buyers, the losses could narrow. That would be good news. In a sense, z.AI may owe Anthropic a thank-you note: both for the AI-doom evangelism and for the AI-coding fervor. Anthropic’s strong models cultivated customers, and its incessant messaging then drove some of them away. One of the places those customers landed was z.AI. A first conclusion, then: going open-source is a passive choice: a Chinese model maker admitting, out loud, that it’s behind on both compute and model quality. But if closed-model progress stalls, users won’t keep paying premium prices for closed-model tokens; they’ll choose open-source on their own. The Chinese saying fits: just hold your plate steady, and the roast duck falls from the sky. Remember to Like & Subscribe! ## Water, Electricity, and a Bad Analogy Now the other wave : Tokenmaxxing. GLM-5.2, DeepSeek and Kimi are mostly catching customers who fled the bills. But if OpenAI and Anthropic were good enough, would open-source still persuade anyone? Then Alibaba gave me a frame. In a March internal memo, CEO Wu Yongming argued that in the AI era, the token would become a basic factor of production, the way traffic was in the internet era. Alibaba set up the Alibaba Token Hub (ATH) around that idea. Follow the logic. In the age of electrification, a country’s electricity output and its GDP growth tend to rise together — no nation ever went bankrupt building power plants. So I looked at U.S. electricity prices, consumption and GDP from the 1920s to the 1960s. As prices fell, total spending on electricity rose 6.2x, but nominal GDP rose 11.1x. Americans spent relatively less on power and got more output for it. The pattern doesn’t always hold cleanly, though. Through the fast-industrializing decades in Japan, China, and West Germany, electricity spending actually outran GDP. But in West Germany and Japan, even during those high-growth years, the share of GDP eaten by electricity fell sharply to almost 2.0%. That suggests is a kind of lag: a rising industrial economy takes roughly fifteen years to work through the adjustment and reach the point where cheap power finally translates into abundant output. If Wu is right and tokens really are AI’s water and electricity, they ought to deliver something similar. But run the numbers and the story breaks. Over the past four years, the cost of a given unit of AI dropped more than 90 percent, while total token spending rose 70x. My god. If this is water and electricity, the bill is climbing far too fast. A seventyfold jump in token spending over four years has not produced anything like a matching surge in what society actually makes. Yes, the data centers went up, and the chips are back-ordered for months. But none of it has meaningfully improved the quality or efficiency of production outside the AI industry itself. What breaks the “AI as utility” analogy is the reasoning model. Across coding and agentic tasks, a model now generates thousands of internal reasoning tokens before it answers, pushing single-task consumption 10 to 100 times higher than older models. So how much does all that buy you? In an NBER paper, DeMiller, Musolff and Yang measured the gains from AI coding tools across four stages of work: - Writing a single file: +290% - Bulk work: +150% - A specific deliverable: +50% - A shipped, delivered product: +30% In other words, even in coding — the thing AI does best — the gains shrink fast as you zoom out from a single file to a finished product. Optimizing the whole pipeline is far harder than optimizing one slice of it. ## Three Months of Unlimited Tokens As latecomers, Chinese firms tried to copy the Tokenmaxxing wave too. Per public reports in March, Tencent gave core R&D teams an annual token package worth about $31,700 each, plus $1,000 a month for outside tools; ByteDance opened its internal AI tools for unlimited use and reimbursed half of employees’ personal AI experiments, capping technical staff at $1,000 a year; Baidu handed engineers unlimited ERNIE access plus up to $800 a year for outside tokens; 360 simply loaded every employee with 100 million tokens. The recalibration came fast. Three months later, Tencent’s Hunyuan team was capped at roughly $970 worth of outside models, and everyone moved onto quotas — though using Tencent’s own Hunyuan model stayed unlimited. ByteDance staff likewise faced no limit on its in-house TRAE tool. Internally, Tencent came out against usage rankings, refusing to treat token consumption as a single yardstick for output. The reason was simple: Chinese companies wanted real output, and they weren’t seeing it. One employee, speaking anonymously, described a team that built workflows across several different models — only to find the AI-generated pieces wouldn’t fit together, and to scrap the whole thing and start over. Twenty-odd people spent about $6,900 in tokens in a month and had nothing to show for it. At some firms, the free tokens got quietly repurposed — for analyzing stocks, say — and the company had no idea where they’d gone. Meta is tightening what employees can spend on Anthropic and other providers — a sharp reversal from the scene a few months earlier, when staff competed to burn tokens. Bloomberg has reported that Uber and Walmart each capped AI coding-tool use; the Financial Times reported that Amazon scrapped the internal leaderboard that ranked employees by AI usage. A June report from the consultancy Bain, titled Your AI Budget Is Growing. Your Returns Aren’t. Here’s Why., found that among companies able to quantify AI’s cost savings, 40 percent saw actual savings of 10 percent or less. Of the 37 percent who’d targeted savings of 11 to 20 percent, only 31 percent actually got there. The grassroots buying isn’t over, though. One ByteDance engineer pays for Claude Max — $100 a month reimbursed — to write what he considers the cleanest code. Better than DeepSeek, by his lights, and GLM he can’t get. But one employee’s purchase doesn’t make the whole company better off. Tokenmaxxing shifts an individual’s cost onto the employer. The irony is that the last firm into the water was the first one out. Tencent, a relative laggard in China’s AI race, quit Tokenmaxxing earlier than anyone. ByteDance is still touting its numbers: as of June, it says, daily token calls to its Doubao model topped 180 trillion, up more than tenfold in a year. Continue Reading

译中国公司 z.AI 以 MIT 许可证开源 GLM-5.2 模型，拥有百万 token 上下文窗口，基于华为昇腾芯片训练，性能接近 Claude Opus 4.8 和 GPT-5.5。与此同时，Amazon、Meta、Uber 等美国公司因工程师过度消耗 token 而开始限制 AI 预算（Uber 每员工上限 1500 美元），推动开源模型需求。GLM 团队源自学术项目，长期适配国产芯片；DeepSeek 投入 28 亿美元，共同成为“Tokenmaxxing”趋势的替代方案。

swyx 🔜 @aiDotEngineer@swyx · 6月24日41

btw Zai IPO'ed in Jan at HK$120 a share. when I first met @louszbd nobody really knew anyone using GLM's. now they have beat deepseek with the world's undisputed top open model and in some respects (see @ml_angelopoulos) say top model period, and are returning to SF @aidotengineer on top of the world and open for business! excited for @Thom_Wolf and @ZixuanLi_ to chat onstage!

译智谱AI（Zai）1月以每股120港元在港IPO。其GLM-5.2模型击败DeepSeek，成为全球公认的最佳开源模型，并在部分基准上整体表现领先。团队首次现身硅谷，参加AI Engineer World's Fair，将分享最新工作进展。

X.PIN@thexpin · 6月23日62

We got early access to WeChat's new AI assistant "Xiaowei" and ran an initial test. Xiaowei says it's built by the WeChat team, runs on their in-house Chinese LLM WeLM, with DeepSeek handling some responses. Users activate it manually. From there, Xiaowei can set calendar events, send messages, make calls, generate playlists, and spin up mini-programs — WeChat's lightweight in-app tools. It can wake up Meituan for food delivery or http://JD.com for shopping, but the final payment — including transfers and red packets — requires the user to tap through manually. Privacy: chat messages are read for the current session only — not saved, not used for training. Context memory can be disabled manually. The stakes are high. Tencent's top LLM development trails ByteDance and Alibaba, making WeChat — 1B+ users — its most critical AI launch surface. Alipay is already testing AI agents with vehicle booking and food delivery.

译作者抢先体验了微信AI助手“小微”。小微基于腾讯自研中文大模型WeLM，部分响应由DeepSeek处理。用户手动激活后，可设置日程、发消息、打电话、生成歌单、启动小程序，并能唤醒美团外卖和京东购物，但转账、红包等最终支付需手动确认。隐私方面：聊天信息仅当前会话读取，不保存、不用于训练，上下文记忆可手动关闭。微信拥有超10亿用户，而腾讯在大模型领域落后于字节和阿里，因此微信成为其最重要的AI落地窗口。支付宝也已在测试具备车辆预约和外卖配送能力的AI智能体。

X.PIN@thexpin · 6月23日38

DeepSeek's Harness team lead Cui Tianyi just posted on social media: his team is new, wildly understaffed, and he's personally interviewing candidates every single day while posting job ads across every platform he can find. Three roles open: Harness Researcher, Harness Engineer, Harness PM. One commenter asked if DeepSeek hires foreigners. His reply: "Just like US companies require employees to work in English, DeepSeek requires employees to work in Chinese. There's no rule against hiring non-Chinese nationals." So yes, foreigners can apply--you just need to be comfortable working in Mandarin.

译DeepSeek Harness团队负责人崔天毅在社交媒体发文称，团队处于初创阶段且严重缺人，他本人每天面试候选人，并在各大平台发布招聘广告。开放三个岗位：Harness Researcher、Harness Engineer、Harness PM。有网友询问是否招聘外国人，崔回应称公司没有禁止雇佣非中国籍员工，但要求所有员工使用中文工作，因此外国人可以申请，前提是能够以普通话作为工作语言。

Berryxia.AI@berryxia · 6月23日63

卧槽，这一波有人直接把DeepSeek的“墙角挖倒了啊”？今天在HuggingFace刷到一个有意思的OCR开源模型和背后有趣的故事。这个OCR模型直接与传统的OCR模型完全不同！先说说背景，熟悉的朋友都知道，我最近做过几次OCR评测（可以翻阅我的前面文章），测过18个文档、6类场景，搭过本地工作流。对OCR的能力边界，算是有点体感。之前评测最头疼的并不是准确率，是多页文档的工作流。所有模型都是逐页处理。每一页清空一次记忆，再用外部调度器拼接结果。本质上是个for-loop （循环），并不是真正的长程理解。而百度这次开源的Unlimited OCR，解法完全不同。它不逐页处理。一次前向推理，几十页文档直接转录完。核心卖点就一句话：One-Shot Long-Horizon Parsing（单次长时解析），也就是说句话说：无需大规模标注数据，低成本实现长文本深度句法理解，适配大语言模型少样本能力。一张图或者一本多页PDF，直接扔进去就能一次性解析完。不用再切成小块反复跑。据说这个模型灵感来源很有意思，人类抄书的时候，不会把整本书都记在脑子里。只关注三个点：原文、刚写完的几个字、下一个要写的字。较早的内容自然淡出。近期的上下文用来追踪进度。这种日常行为揭示了一种与当前模型截然不同的注意力模式。 Unlimited OCR的核心机制R-SWA，参考滑动窗口注意力，就是模拟这个过程。每个token能看到完整图像。但输出端只维护前面128个状态。32K上下文，一次推理几十页。 KV Cache大小恒定，不随文档长度增长。这其实是把OCR从认字工具往文档理解引擎又推进了一步。以前大家觉得长文档处理必须分块。现在越来越清楚：只要上下文够长、模型够强，一镜到底反而更高效、更准确。技术报告的写法也很有意思。故事性极强，想法激进。有种探索者的气质。这种风格此前都是DeepSeek技术报告的专属标签。然后事情就开始变得有趣了。翻了下技术报告的核心贡献者。三位，两个人用真名。唯独技术总监挂了个两字母缩写YY。YY是谁？我顺着线索往回找了一下。您才怎么着？ GitHub致谢栏把DeepSeek-OCR和DeepSeek-OCR-2排在了前两位。 DeepEncoder最初就是在DeepSeek OCR中被引入的。这次Unlimited OCR恰恰完美融合了这一高压缩率编码器。里面提及DeepSeek OCR的部分，语气不像在对标竞品。更像在对自己之前的研究展开反思和优化。国内OCR圈不算大。能做出R-SWA这种级别突破、还对DeepSeek OCR架构有亲手做过级别熟悉的人，一只手数得过来。再看另一个细节。 2026年4月24日，DeepSeek-V4正式发布。58页技术报告末尾，近300个名字按字母顺序排列。其中有10个名字旁边标注了一个小小的星号：已离职。从2025年下半年到2026年初，不到半年，DeepSeek走了五个人。他们去了哪。YY是谁。报告没直说，但越读越觉得答案在字里行间。也明显看出来百度走最近的路子确实不一样了，你可要知道一直最强的OCR 莫属于他们啊，几乎没有什么对手啊！从PaddleOCR到这次的Unlimited OCR，能感觉到在往一个更前沿的方向走。这更新迭代速度，这人才储备的能力，以及发展方向，未来可期。不管八卦，单论技术。一镜到底的长文档OCR这个方向确实是对的。开源了。感兴趣的自己试试。我后面也会进行实测，顺手点个🌟。 GitHub：http://github.com/baidu/Unlimited-OCR Hugging Face：http://huggingface.co/baidu/Unlimited-OCR

译百度在 HuggingFace 开源 Unlimited OCR 模型，核心卖点为 One-Shot Long-Horizon Parsing（单次长时解析），一次前向推理即可转录几十页 PDF 或图像。其创新机制 R-SWA（参考滑动窗口注意力）模拟人类抄写时的注意力模式——每个 token 看到完整图像，输出端只维护前 128 个状态，32K 上下文，KV Cache 大小恒定不随文档长度增长。技术报告披露灵感与 DeepSeek-OCR 架构有密切联系，核心贡献者中技术总监 YY 疑为近期从 DeepSeek 离职的研究者。模型已在 GitHub 和 HuggingFace 开放。

Artificial Analysis@ArtificialAnlys · 6月23日60

Open weights models make up the majority of the cost-performance Pareto frontier on AA-Briefcase, our new agentic knowledge work benchmark Last week we released AA-Briefcase, our proprietary agentic knowledge work benchmark testing models on long horizon tasks built by industry experts. AA-Briefcase requires models to build deliverables such as financial models, board presentations, and design mock-ups in the context of realistic multi week projects. The cost to run a single AA-Briefcase task varies by over 700x in the initial set of models we tested. With the highest performing model, Claude Fable 5, costing over $20 per task, cost efficiency is a key element in model selection for knowledge work. While the two highest performing models on the cost-performance Pareto frontier are proprietary models from @AnthropicAI, most of the remaining frontier is made up of open weights models. Notable cost efficiency trade offs: ➤ At $2.40 per task, GLM 5.2 (max) from @Zai_org scores within 90 Elo points of Claude Opus 4.8 (max) while costing 65% less ➤ At $0.08 per task, DeepSeek V4 Pro (max) from @deepseek_ai scores ~60 Elo points above Gemini 3.5 Flash while costing over 98% less

译Artificial Analysis发布AA-Briefcase智能体知识工作基准测试，评估模型在长期任务中的表现。任务成本差异超700倍，最高性能模型Claude Fable 5每任务超$20。成本-性能帕累托前沿上，除Anthropic两个最高分模型外，其余大部分由开放权重模型占据。关键性价比：GLM 5.2 (max)每任务$2.40，得分仅比Claude Opus 4.8低90 Elo，成本低65%；DeepSeek V4 Pro (max)每任务$0.08，得分比Gemini 3.5 Flash高约60 Elo，成本低98%以上。

Berryxia.AI@berryxia · 6月23日66

这速度真特么离谱啊！卧槽！最新开源的Unlimited-OCR能一次性处理几百页文档，而且速度还很稳。而这个模型来自百度刚刚在hugging face 发布，其核心创新是R-SWA（Reference Sliding Window Attention）。它让模型在解码时KV Cache保持恒定，不会随着文档页数增加而爆炸式增长。结果就是：一张图或者一本多页PDF，直接扔进去就能一次性解析完，速度和稳定性都比传统逐页处理的方式好很多。在OmniDocBench上拿到了93分，比DeepSeek-OCR高出6个百分点。这已经不是简单的准确率提升，而是把长文档OCR的工作流从“分块+外部调度器拼接”变成了真正的端到端一镜到底。以前做多页文档最头疼的就是上下文断裂和格式不一致，现在模型能一次性看到整篇文档的结构、布局和逻辑关系，输出质量自然上了一个台阶。这其实是把OCR从“认字工具”往“长文档理解引擎”又往前推了一大步。技术路线很清晰，也很实用。果然百度现在OCR独树一帜，遥遥领先了。模型地址见评论区～ 👇

译百度PaddlePaddle在HuggingFace发布Unlimited-OCR，核心创新R-SWA（Reference Sliding Window Attention）使解码时KV Cache保持恒定，避免随页数爆炸。该模型可一次性处理数百页文档，速度和稳定性优于逐页处理。在OmniDocBench上得分93%，比DeepSeek-OCR高出6个百分点。这使长文档OCR从“分块+拼接”变为端到端一镜到底，直接理解整篇文档结构与布局。

AYi@AYi_AInotes · 6月22日49

白嫖顶级大模型的窗口期来了 DeepSeek V4 Flash限时全免费 1M上下文随便造， DeepSeek V4 Flash登陆OpenModel平台开启限时免费活动， 284B MoE架构支持1M超长上下文编码与代理能力表现突出，输入输出全免费没有任何调用门槛，活动期间平台其他模型也同步享20%到80%的折扣，窗口期只到6月28日有需求的直接冲！

译DeepSeek V4 Flash 登陆 OpenModel 平台，开启限时免费活动。该模型为 284B MoE 架构，支持 1M 超长上下文，编码与智能体能力突出。活动期间输入输出均为 $0.00/M，无任何调用门槛。平台其他模型同步享受 20%–80% 折扣。免费窗口期至 6 月 28 日截止。

向阳乔木@vista8 · 6月22日53

给Deepseek贡献点人才线索，希望国产模型做大做强！

译DeepSeek Harness组（新成立部门）仍在大量招聘，共开放三种职位：Harness研究员（实习/全职）、Harness工程师（实习/全职）、Harness产品经理（仅全职）。候选人大致需通过一轮笔试和三轮面试，终面由部门负责人@tianyi主持。职位空缺较大，招聘门槛与DeepSeek其他组一致。投递链接及私信渠道已附。

宝玉@dotey · 6月21日45

帮转招人，DeepSeek Harness

译DeepSeek Harness 新部门正在招人，共三种职位：Harness 研究员（实习/全职均可）、Harness 工程师（实习/全职均可）、Harness 产品经理（限全职）。招聘门槛与 DeepSeek 其他组一致，流程为一轮笔试加三轮面试，终面由 @tianyi 负责。有意者可私信投递简历，具体链接详见推文。

meng shao@shao__meng · 6月21日44

帮转，DeepSeek Harness 组，职位空缺很大，做 Agent Harness 研究和工程的朋友们，冲！

译DeepSeek Harness 组（新成立部门）仍在大量招聘。职位包括：Harness 研究员（实习/全职）、Harness 工程师（实习/全职）、Harness 产品经理（仅全职）。招聘流程与 DeepSeek 其他组一致：一轮笔试加三轮面试，终面由 @tianyi 负责。可私信投递简历。

Berryxia.AI@berryxia · 6月21日41

兄弟们，喜大普奔哈哈！ DeepSeek-V4-Flash 免费到6月28号，直接冲啊！ 284B MoE，1M上下文，编码和Agent能力都不错，直接可以用起来，截止日期到6月28号。链接：https://www.openmodel.ai

AYi@AYi_AInotes · 6月21日31

感觉GLM 5.2太强了，有点国产Fable 5的感觉了，会不会是下一个DeepSeek时刻，然后接棒DeepSeek成为中国大模型的新一代大哥和门店担当

AYi@AYi_AInotes · 6月20日65

微软在干一件很骚的事，一边把 OpenAI 的模型卖给中国公司，一边把中国的 DeepSeek 模型准备卖给西方客户，两头通吃，可以说全球独此一家了。这件事能跑通，靠的是微软跟 OpenAI 签的特殊合同，微软拿到了全球自由转售权，OpenAI 管不着，然后在中国市场，最大的客户是字节跳动，每年在 Azure 和 AI 服务上砸的钱超过 10 亿美元。最有意思的是合规处理，模型不部署在中国境内，中国客户通过新加坡的数据中心访问。 OpenAI 那边一直催微软加强安全，特别怕中国客户搞模型蒸馏，拿 GPT 的输出训自己的模型。微软对外说有自动监控，只服务企业客户不对个人开放。但据彭博社的消息，实际上审查没那么严。更微妙的是另一头，微软同时在测 DeepSeek 的 R1 和 V4，打算卖给西方客户。美国模型卖中国，中国模型卖西方，微软自己变成了一个巨大的 AI 转换插头。这件事背后的信号也很有意思，中美 AI 在最高层面互相防着，限制出口、限制投资、限制芯片，但在商业层，套利空间已经大到连微软这种体量的公司也愿意两头下注。所以地缘归地缘，生意归生意哈哈哈

译微软凭借与OpenAI的特殊合同获得全球自由转售权，将OpenAI模型卖给中国客户（最大客户字节跳动每年在Azure和AI服务上投入超10亿美元），模型通过新加坡数据中心访问，同时监控防蒸馏。另一边，微软正在测试DeepSeek-R1和DeepSeek-V4，准备反向卖给西方客户。这一“双向AI模型贸易网络”凸显中美地缘壁垒下商业套利空间巨大。

AYi@AYi_AInotes · 6月20日75

还得是微软会闷声发大财啊，它现在已经成为全球最大的 AI 中间商，oh no ，是最大的中转站，不仅把ChatGPT卖给中国企业，也把DeepSeek 反向卖给西方客户😁 以上来自彭博社的最新报道，绝对权威可信，看完让我意外的不仅仅是微软把 GPT 卖给中国那部分，后面那一句：微软同时在测试 DeepSeek-R1 和 DeepSeek-V4，准备把这些中国模型卖给西方客户。好家伙，左手接 GPT 卖到中国，右手接 DeepSeek 卖到西方，这他么不是一家 AI 公司在卖模型，简直就是一个跨中美 AI 模型的双向贸易网络正在成形呀

译彭博社报道，微软已成为全球最大AI模型中转站，既将ChatGPT卖给中国企业，也反向将DeepSeek模型卖给西方客户。报道称微软正在测试DeepSeek-R1和DeepSeek-V4，计划向西方客户提供这些中国模型。这一模式构建起跨中美AI模型的双向贸易网络。

Chubby♨️@kimmonismus · 6月19日47

Someone on Reddit built a WoW private server with 1,800 bots and AI chat via the DeepSeek API. Dead Internet Theory, but playable. An MMORPG with no real players, yet somehow it still feels human.

译某人在Reddit上搭建了一个WoW私服，包含1800个机器人，并通过DeepSeek API实现AI聊天。死互联网理论，但可玩。一个没有真实玩家的MMORPG，却不知何故仍然感觉像人类。

AYi@AYi_AInotes · 6月19日76

这可能是我近期看到的最值得深入研究的一次skills开源和工程脚手架，最后总结的5个工程思路大家可以直接拿去用。 DeepSeek 研究员 Deli Chen 把他的 AutoResearch 协议开源了，同时扔出一篇关于 Self-play 的综述（第四篇）。最炸的地方是，他的代理第一次完全 autonomously 在 285B 模型上跑通了完整的 RL 研究闭环——实验设计、写代码、提交 GPU 任务、debug、到出结论，全程零人工干预。要知道写代码和跑通研究闭环是两件事，就像学会炒菜和开一家每天出品稳定的餐厅，差的不只是一道菜，还有整套后厨流程。至于论文里的结论，我放在评论区。

译DeepSeek研究员Deli Chen将AutoResearch协议开源，并发布Self-play综述论文。其AI智能体首次完全自主地在DeepSeek 285B模型上完成完整RL研究闭环——从实验设计、写代码、提交GPU任务、debug到结论总结，全程零人工干预。系统调用了GRPO工具，被视为持续学习研究的开端。

🚨 AI News | TestingCatalog@testingcatalog · 6月18日67

EXCLUSIVE 🔥: DeepSeek was just the beginning. Microsoft is evaluating "many" open models for Copilot Cowork. > This is adding internal pressure on MAI teams, as the GLM, MiniMax, and Kimi models are evolving more quickly > Microsoft is aiming to make models "interchangeable" and separate the harness from the underlying models themselves. > As smaller models evolve, it is also possible that some of the tasks will be executed locally in the future.

译微软正评估GLM、MiniMax、Kimi等多款开源模型用于Copilot Cowork，旨在降低推理成本。据Axios报道，微软考虑托管DeepSeek V4作为更便宜选项，同时将Copilot Cowork从无限定价转为按使用量计费。微软称用户每周执行数百任务，成本可能很高。若采用DeepSeek，模型将是可选、经微调和安全防护，并完全托管在Azure上。微软推动模型“可互换”策略，未来部分任务可能移至本地执行。

X.PIN@thexpin · 6月18日61

US and Chinese AI are quietly swapping places in the most awkward way possible. Per Bloomberg: Microsoft sells OpenAI services to ByteDance for $1B+/year. Ant Group, Meituan and Tencent are all on Azure. Azure's AI revenue in China grew ~3x in the fiscal year ending June 2025 — faster than anywhere else on earth. Meanwhile, US developers are quietly switching to Chinese models to cut costs. Cursor (the AI coding tool Musk reportedly wants to buy) has used Qwen and Kimi. One hour of coding on Claude: ~$10. Same job on DeepSeek: under 50 cents. Everyone thinks the neighbor's cooking smells better. The Chinese food is cheaper — and they deliver. https://restofworld.org/2026/when-americans-choose-chinese-ai/

译微软向字节跳动出售OpenAI服务（每年超10亿美元），蚂蚁、美团、腾讯使用Azure。截至2025年6月财年，Azure中国AI收入增长约3倍。同时，美国开发者转向中国模型降本：Cursor（马斯克据称想收购的AI编程工具）使用Qwen和Kimi；Claude一小时编码约10美元，DeepSeek不到50美分。

X.PIN@thexpin · 6月17日35

Obviously, Microsoft can't afford commercial APIs anymore. They're looking to use a self-hosted version of DeepSeek-V4 to power Copilot's agentic AI. I think we all remember that the Trump administration has threatened to ban DeepSeek. Also, Anthropic just blocked foreign users from using Fable 5 and Mythos 5. With everything going on, I'm curious to hear what Americans think about this.

译显然，微软再也用不起商业API了。他们正在考虑使用自托管的DeepSeek-V4版本来驱动Copilot的智能体AI。我想我们都还记得，特朗普政府曾威胁要封禁DeepSeek。此外，Anthropic刚刚阻止了外国用户使用Fable 5和Mythos 5。在这一切发生之际，我很好奇美国人是如何看待的。

SemiAnalysis@SemiAnalysis_ · 6月17日65

POV: @ohnePixel getting a platform for day 0 DeepSeek V4 deployment Find out more at: https://semianalysis.substack.com/p/deepseekv4-16t-day-0-to-day-43-performance

译POV: @ohnePixel 为 DeepSeek V4 首日部署获得一个平台。了解更多：https://semianalysis.substack.com/p/deepseekv4-16t-day-0-to-day-43-performance