AIHOT
内容
精选全部 AI 动态AI 日报主题收藏
接入
Agent 接入
更多
关于更新日志反馈
内部员工登录
精选全部日报更多
内部员工登录
全部动态X · 468 条
全部一手资讯X论文
标签「数据/训练」清除
Rohan Paul@rohanpaul_ai · 4月30日55

The U.S. Department of Labor just launched a national AI apprenticeship portal for preparing workforce for the AI era. The site splits resources into general AI skills, industry-specific modules, and 3 integration pathways for apprenticeship programs. Employers can either join an existing program, build a new AI-focused Registered Apprenticeship, or update a current apprenticeship so AI becomes part of the skill stack instead of a separate topic. Apprenticeship opportunities are offered through an employer or the program sponsor, and career seekers should use the Apprenticeship Job Finder to search and then apply directly with the employer or sponsor.

译美国劳工部推出全国性AI学徒门户网站,旨在为AI时代培养劳动力。该网站将资源分为通用AI技能、行业特定模块以及学徒计划的三种整合路径。雇主可选择加入现有计划、创建新的AI重点注册学徒计划,或更新现有计划将AI技能融入现有技能栈。学徒机会由雇主或项目发起方提供,求职者应使用“学徒工作查找器”进行搜索,并直接向雇主或发起方申请。

宝玉@dotey · 4月30日66

OpenAI 发了一篇技术博客,认真调查了一个荒诞的问题:为什么他们的模型越来越爱说“哥布林”(goblin)和“小精灵”(gremlin)? 事情最早在去年 11 月 GPT-5.1 上线后被注意到。用户反馈模型说话太过自来熟,内部一查,发现包含“goblin”的对话比之前暴涨了 175%,“gremlin”涨了 52%。当时觉得比例还小,没太当回事。 几个月后 GPT-5.4 上线,哥布林彻底泛滥,用户和员工都受不了了。OpenAI 这才认真追查,最终锁定了罪魁祸首:ChatGPT 的性格定制功能。 ChatGPT 有八种可选性格,其中一种叫“Nerdy”(极客风)。训练这个性格时,奖励模型被设定为鼓励"俏皮、有趣的表达",结果无意中给了包含奇幻生物比喻的回复更高的分数。模型很快学会了一个捷径:提到哥布林就能拿高分。 问题在于,这个习惯没有老老实实待在极客性格里。数据显示,Nerdy 性格只占 ChatGPT 全部回复的 2.5%,却贡献了 66.7% 的“goblin”出现次数。从 GPT-5.2 到 GPT-5.4,Nerdy 性格下的哥布林出现率飙升了 3881%。更麻烦的是,即使在没有 Nerdy 性格提示词的对话中,哥布林也在同步增长。 OpenAI 给出的解释是一个经典的反馈循环:强化学习先在极客性格里奖励了这种表达,然后模型生成的带哥布林的回复被收录进了下一轮训练数据,模型因此更加习惯输出哥布林,如此循环放大。除了哥布林,浣熊、巨魔、食人魔、鸽子也都被查出是同一机制产生的“tic词”(语言习惯性抽搐)。 【注:tic 原本是医学术语,指不自主的重复动作或发声,OpenAI 在这里借用来形容模型养成的不受控语言习惯。】 修复方面,OpenAI 在今年 3 月下架了 Nerdy 性格,移除了相关奖励信号,并过滤了训练数据中的生物词。但 GPT-5.5 的训练在找到根因之前就已经开始,所以新模型依然带着哥布林习性出厂。目前的临时方案是在 Codex(OpenAI 的编程工具)里通过系统提示词压制。博客里甚至贴了一段命令行代码,教你怎么把哥布林抑制指令去掉,"让小精灵们自由奔跑"。 这篇博客表面上是讲一个好笑的 bug,底下其实揭示了一个 AI 训练的核心难题:你给模型的每一个微小的奖励信号,都可能在你不知道的地方被放大和泛化。一个只针对 2.5% 用户的性格训练,最终污染了整个模型的语言习惯。

译OpenAI技术博客深入调查了其模型(从GPT-5.1到GPT-5.4)输出中“goblin”和“gremlin”等奇幻生物词汇异常激增的现象。根源在于ChatGPT的“Nerdy”性格定制功能:其奖励模型在训练中无意间高奖励了包含此类词汇的“俏皮”表达。尽管该性格仅占全部回复的2.5%,却贡献了超66%的“goblin”出现次数,并通过强化学习的反馈循环污染了模型的整体输出,形成了“tic词”。OpenAI已下架该性格并调整训练数据,但此案例揭示了微小的奖励信号在AI训练中可能被意外放大和泛化的核心难题。

swyx 🇸🇬@swyx · 4月30日51

> be me > "the internet is polluted by ai slop, we need low-background tokens" > "wouldnt it be cool if we could time travel and see what our ancestors 100 years ago would say to us" > all the existing vintage models are like <4B > we need a chat tuned 13B vintage model > assemble avengers of ML incl the GPT-1/2 guy > need vintage tokens > train new vintage OCR model for old books, newspapers, periodicals, scientific journals, patents, and case law > need vintage RLHF but cant use chat > synthesize RLHF pairs from historical texts with regular structure eg etiquette manuals, letter-writing manuals, cookbooks, dictionaries, encyclopedias, and poetry and fable collections, shove it into ChatML > train it > future knowledge still got in somehow > dammit.jpg > train new SOTA document-level n-gram-based anachronism classifier > meticulously curate hundreds of billions of pre-1931 tokens (public domain) > train it > ok! it checks out vs our FineWeb baseline! > release it > it's the most confidently racist model ever released by humankind > mfw

译为应对互联网被AI生成内容污染的问题,研究者提出“低背景标记”设想,计划训练仅使用历史文本的复古模型。团队集结了包括GPT-1/2开发者在内的专家,通过训练复古OCR模型处理旧书籍、报纸等资料,并利用礼仪手册、词典等结构化历史文本合成RLHF数据。为确保数据纯净,他们开发了基于文档n-gram的时代错位分类器,精心筛选了数千亿1931年前的公共领域标记进行训练。最终发布了130亿参数的Talkie模型,旨在探索语言模型的泛化能力。然而,该模型在发布后表现出强烈的种族偏见倾向,引发了新的伦理担忧。

Rohan Paul@rohanpaul_ai · 4月30日51

Beautiful new paper from Harvard, Stanford, UC Berkeley and other top labs. Shows that DeepLearning is finally becoming the kind of thing science can explain, not just optimize. Because we still do not have a compact, predictive theory that tells us ahead of time how a neural network will learn, scale, and respond to training choices without mostly testing it first. Not that we will soon explain every weight, but that we may learn the coarse laws governing training, representation, and performance. That shift matters because neural nets are not hidden systems. We know the architecture, the data, the objective, and the update rule. The obstacle is not secrecy. It is the complexity of many simple parts interacting at once. So the authors propose “learning mechanics,” a physics-like program that studies the motion of learning itself. “Learning mechanics” is their name for a hoped-for set of broad laws, similar to how physics explains gases without tracking every molecule, that explains the overall behavior of neural nets instead of just describing one model at a time. Physics became useful by ignoring microscopic detail when the right aggregate variables were enough, and this paper says deep learning theory is maturing in exactly that direction through solvable toy models, infinite limits, scaling laws, hyperparameter theories, and universal behaviors. The claim is that training a neural net may be less like recipe tweaking and more like physics, where you stop tracking every tiny part and instead predict the large patterns that keep showing up. That means studying how gradients move parameters, how representations form, and why behavior changes in regular ways as model size, data, and compute grow. The paper says this theory is taking shape through 5 routes: solvable toy models, simplifying limits like infinite width, simple laws like scaling laws, theories of hyperparameters, and behaviors that look universal across many systems. The central bet is that useful laws can exist even when full microscopic detail is hopeless, just like thermodynamics explains gases without tracking every molecule. This also fits neatly beside mechanistic interpretability, because one tries to find local circuits while the other tries to find global laws of learning. ---- Paper Link – arxiv. org/abs/2604.21691 Paper Title: "There Will Be a Scientific Theory of Deep Learning"

译哈佛、斯坦福、UC伯克利等顶尖实验室联合提出,深度学习正从经验优化转向可解释的科学理论。尽管神经网络架构、数据等完全公开,但其复杂互动使得预测训练过程仍依赖大量实验。作者倡导建立“学习力学”,类似物理学关注宏观规律,通过可解玩具模型、无限宽度极限、缩放定律等五种路径,揭示训练动态与性能演化的整体性法则。这一理论与专注于局部电路的机制可解释性研究形成互补,共同探索学习的全局定律。

Anthropic@AnthropicAI · 4月30日56

In new Anthropic Fellows research, we discuss “introspection adapters": a tool that allows language models to self-report behaviors they've learned during training—including potential misalignment.

译在新的Anthropic Fellows研究中,我们探讨了“内省适配器”:这种工具能让语言模型自我报告在训练过程中习得的行为——包括潜在的错位。 [引用 @kshenoy_]:大型语言模型能否直接告诉我们它们在训练中习得的不良行为? 我们训练了一个单一的内省适配器(IA),使微调后的模型能够描述自身行为。 该方法可推广至检测隐藏的错位、后门和安全措施移除。

Deedy@deedydas · 4月30日50

Researchers just estimated the size of all the LLMs by asking it knowledge questions of varying degrees of obscurity! – GPT 5.5: ~10T params – Claude Opus 4.x: ~4-5T – Grok 4: ~3T The idea here is that factual capacity scales log-linearly with size. The paper shows 7 knowledge tiers and T7 is essentially ~0% for all models, suggesting there is still significant headroom for pretraining. Gemini 3.1 Pro is likely >10T given its used as an anchor but has no direct estimate. This means we can infer what different models might cost to some degree and their post-training effectiveness (performance at certain non-factual tasks given its size). One of the coolest papers I’ve read of late.

译研究人员通过询问不同难度知识问题,估计大型语言模型参数大小。结果显示,GPT 5.5约10T参数,Claude Opus 4.x约4-5T,Grok 4约3T。事实性知识容量与模型规模呈对数线性关系。论文提出7个知识层级,最高层级T7对所有模型接近零,表明预训练仍有显著提升空间。Gemini 3.1 Pro可能超过10T参数。此方法有助于推断模型训练成本及后训练在非事实性任务上的性能。

向阳乔木@vista8 · 4月29日53

姚老师和张凯经过大量数据研究分析写的论文,还有一手实战经验。 用科学的方法做GEO,像用数据洞察做增长一样。

译姚老师和张凯的GEO论文已在全球最大论文平台arxiv完成审核并发布,这是全球第二篇GEO专项研究。论文基于今年3月最新数据,涵盖大量Prompt、引用和AI抓取记录,采用科学方法进行GEO分析,类似数据驱动的增长洞察。研究成果以正式报告形式呈现,源数据已开源在GitHub。作者表示,如果对社区有帮助,将继续抓取更多数据进行专项研究并开放成果。

Chubby♨️@kimmonismus · 4月29日46

SandboxAQ spun out of Google, raised $950M+, and is backed by NVIDIA. Everyone is talking about LLMs. Almost nobody is talking about LQMs. Sandbox bet: Large Quantitative Models that simulate physics and chemistry to invent new drugs and materials. I talked to their GM of AI Simulation Nadia Harhen about why LQMs might matter more than LLMs for the physical world. Our newest (and second) Podcast-Episode of Superintelligence Podcast - out now! Link in comments

译SandboxAQ从谷歌分拆出来,筹集了超过9.5亿美元,并获得了英伟达的支持。 每个人都在谈论LLMs。 几乎没有人谈论LQMs。 Sandbox的赌注:通过模拟物理和化学的大型定量模型来发明新药物和新材料。 我与他们的AI模拟总经理Nadia Harhen讨论了为什么对于物理世界来说,LQMs可能比LLMs更重要。 我们最新(也是第二期)的《超级智能播客》节目现已发布!链接在评论中

Ethan Mollick@emollick · 4月28日60

This is an incredibly cool experiment It is also fascinating that the model knows information up to 1931, but, at least in some science topics, seems very stuck in the early 1900s. For example, it defends the lumiferous aether hypothesis &amp; has a distrust of special relativity

译研究人员推出了仅使用1931年前文本训练的13B模型Talkie,旨在探索语言模型的泛化能力。该实验发现,模型虽掌握截至1931年的信息,但在某些科学议题上明显停留在20世纪初的认知框架中。例如,它仍坚持“发光以太”假说,并对狭义相对论表现出不信任。这凸显了训练数据的时间范围会深刻固化模型的知识体系与世界观。

elvis@omarsar0 · 4月27日62

I consider this one of the most interesting research themes happening in AI today. Worth taking a look. As I automate more with agents, I feel like there is all kinds of incredible opportunities to optimize multi-agent systems to do things like automated knowledge discovery or tuning advanced AI systems that gauge other AI agents at software engineering or AI engineering tasks. All kinds of new agent architectures, algorithms, prompting techniques, and data processing and synthesis techniques just waiting to be discovered.

译推文作者指出,优化多智能体系统以实现自动化知识发现或调优高级AI系统是当前AI领域极具潜力的方向。文中引用的研究通过强化学习训练“指挥家”模型,使其能自动管理其他模型:针对简单问题直接查询单一模型,面对复杂编码任务则自主组建包含规划器、编码器和验证器的完整流程。这标志着从单智能体“思维链”向多智能体“指挥链”的演进,相关技术已应用于Sakana Fugu等新系统,展现了AI管理AI范式的广阔探索空间。

Yuchen Jin@Yuchenj_UW · 4月24日39

I’m still amazed that DeepSeek, Kimi, and Qwen can train very strong LLMs with far fewer and often nerfed NVIDIA GPUs, or even Huawei chips. DeepSeek V4 report shows they invent new attention architectures to make training/inference more efficient. Creativity loves constraints. I really hope we see strong US open-source models that can compete.

译我依然惊叹于 DeepSeek、Kimi 和 Qwen 能够用少得多且经常是降配的 NVIDIA GPU,甚至华为芯片,训练出非常强大的大语言模型。 DeepSeek V4 的报告显示,他们发明了新的注意力架构,使训练/推理更加高效。 创造力热爱约束。 我真心希望我们看到有强大的美国开源模型能够参与竞争。

Nathan Lambert@natolambert · 4月24日60

+1. Folks interested in the Chinese llm space should listen to this.

译前ByteDance AI研究员在采访中表示,中文LLM领域并未赶上美国模型,反而差距在扩大。关键挑战包括Benchmaxxing、对美国模型的Distillation、数据质量与基础设施差,以及计算约束。他否认了中文模型正在追赶的假设,认为技术依赖和资源限制导致落后局面加剧。

AK@_akhaliq · 4月24日39

Near-Future Policy Optimization paper: https://huggingface.co/papers/2604.20733

译近未来策略优化 论文:https://huggingface.co/papers/2604.20733

AK@_akhaliq · 4月23日

OpenAI just released privacy-filter on Hugging Face a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text model: https://huggingface.co/openai/privacy-filter

译OpenAI 刚刚在 Hugging Face 上发布了 privacy-filter 一个用于文本中个人身份信息(PII)检测与掩码的双向 token 分类模型 模型:https://huggingface.co/openai/privacy-filter

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes · 4月23日

CALLED IT "Meta is installing tracking software on employees’ computers to capture mouse movements, clicks and ​keystrokes to train its AI models, to build AI agents that work autonomously." "The tool will also take snapshots of the content on employees’ screens."

译预言中了 "Meta 正在员工的电脑上安装追踪软件,以捕获鼠标移动、点击和按键操作,用于训练其 AI 模型,从而构建可自主工作的 AI agent。" "该工具还将对员工屏幕上的内容进行截图。"

Chubby♨️@kimmonismus · 4月22日

The Manhattan Project is a joke compared to the expansion of data centers. Let's hope that chip production continues despite the war in Iran.

译与数据中心的扩张相比,曼哈顿计划简直是个笑话。 但愿伊朗战争不会中断芯片生产。

Rohan Paul@rohanpaul_ai · 4月22日

New University of Luxembourg+LIH paper reveals a critical gaps in LLMs’ ability to handle structured reasoning under constraints It checks if LLMs can solve Optimal Power Flow problems end to end, and finds that they mostly cannot do so physically coherently. Across models and sizes, constraint satisfaction stayed stuck at about 55 to 60 percent. The interesting result here is not that LLMs miss a hard engineering problem. It is that they miss it in a very specific way. Optimal Power Flow is a brutal test of real reasoning because it is not just about getting numbers close to a target, but about satisfying a web of physical constraints at the same time, from generator limits to bus voltages to the power-flow equations themselves. That sounds minor until you look at the mechanism. A model can produce an answer that looks clean, uses the right JSON, and even lands near the right values on mean squared error, while still violating the equations that make the grid physically coherent. This paper shows exactly that failure mode. Across several model families and sizes, constraint satisfaction sits in a stubborn band around 55 to 60 percent, and the main bottleneck is the power-flow constraints, while generator and voltage limits are often satisfied far more easily, as the table on page 12 makes plain. Here’s the part most people miss. That pattern is not a small bug in prompting. It suggests the models are learning the shape of a solution without actually carrying out the constrained search that the problem demands. The ablations make the point sharper. Supervised fine-tuning improves formatting and often lowers MSE, but it does not materially improve physical feasibility, and even a more elaborate system prompt barely moves the numbers, which is about as clean a rejection of “prompting will fix it” as you can ask for. Reinforcement learning with a reward for valid structure and satisfied constraints helps a bit, especially on the 30-bus case, but even there the gains are modest rather than transformative, as the study overview on page 2 and results plots on pages 7 and 8 show. So the real lesson is not that LLMs cannot reason. It is that fluent approximation is not the same thing as optimization under law, and until models can reliably honor the constraints that define a system, “looks plausible” remains a very dangerous standard. ---- Paper Link – arxiv. org/abs/2603.23004v1 Paper Title: "Can LLMs Reason and Optimize Under Constraints?"

译卢森堡大学与LIH研究揭示,LLM在结构化约束推理中存在关键缺陷。通过最优潮流问题测试发现,各类模型约束满足率停滞于55%-60%,主要瓶颈是无法满足电力系统物理约束方程。研究表明,模型仅学会"解的形状"却未真正执行约束搜索,导致输出看似合理(格式正确、误差小)却物理不可行。监督微调虽改善表面指标,但无法提升物理可行性;强化学习亦效果有限。研究警示:流畅近似不等于约束优化,"看起来合理"是危险标准。

Rohan Paul@rohanpaul_ai · 4月22日

Reuters: Meta is putting tracking software on US employee computers to log mouse moves, clicks, and keystrokes for AI training Model Capability Initiative, a system that turns ordinary employee computer use into training data so its models can learn the step-by-step patterns behind digital work. Fits a larger shift inside Meta where employees are being pushed to use AI agents more often, job roles are being flattened into broader AI-focused work, and the company is planning 10% layoffs. Meta says the collected data is for model training only and not for performance reviews. --- reuters .com/sustainability/boards-policy-regulation/meta-start-capturing-employee-mouse-movements-keystrokes-ai-training-data-2026-04-21

译Meta正在美国员工电脑上部署追踪软件,记录鼠标移动、点击和按键行为,作为Model Capability Initiative的一部分,将日常工作转化为AI训练数据,使模型学习数字工作的逐步模式。这反映了Meta内部更广泛的战略转向:推动员工使用AI代理、将职位重组为AI相关工作,并计划裁员10%。Meta声称收集的数据仅用于模型训练,不会用于绩效评估。

Rohan Paul@rohanpaul_ai · 4月22日

GenRobot is packaging multimodal robot data collection into wearable hardware. They just launched a 6-camera bionic wearable to capture embodied AI data, addressing common blind spots in traditional monocular setups, such as occlusion and precise hand-object timing. That matters because embodied models do not learn from pixels alone. They learn from synchronized structure: head pose, hand motion, scene layout, and action timing living on the same clock. DAS Ego utilizes six 2MP cameras to achieve a zero-distortion 270° horizontal and 150° vertical FOV. This enables mm-level trajectory reconstruction and ultra-low latency (<1ms) head-hand coordination. What GenRobot is building is a cleaner way to record natural human interactions. To prove it, they open-sourced "Gen Ego Data," a first-person, human-centric dataset covering 20+ environments and 200+ skills. By capturing authentic interactions, it helps models learn physical laws and "perception-action-outcome" causality, providing core data support for real-world embodied AI deployment.

译GenRobot推出DAS Ego六摄像头仿生可穿戴设备,以270°零畸变视场角与毫秒级头手同步解决传统单目方案的遮挡与深度盲区,实现毫米级轨迹重建与厘米级关节追踪。同步开源的Gen Ego Data数据集涵盖20余环境及200余项技能,通过第一人称视角采集帮助具身AI模型学习物理规律与"感知-动作-结果"因果关系,为真实场景部署提供核心数据支撑。

Nathan Lambert@natolambert · 4月21日

I've been trying to grapple with what the key inputs are to the open-closed performance gap, and how they're changing. Until the training paradigm changes, open weight models will pretty clearly be able to fast-follow closed labs. There are sources of uncertainty, but that fact of keeping up seems hard to shake. I spent a long time looking for evidence of or arguments supporting open models falling behind, but it's not there at all today. Things I consider include: - How benchmarks evolve over time, becoming more or less correlated with how people actually use models, - How different models’ real-world performance relates to their benchmark rankings, and - How training regimes evolve over time to move said benchmarks. I roughly conclude that this equilibrium will last until economic forces initiate a change in strategy, or training needs shift. An example I'm wondering is if closed models more directly integrate user training data, which open labs cannot access, they could pull ahead.

译当前开放权重模型与闭源实验室的性能差距维持动态平衡。在训练范式改变前,开放模型能够持续 fast-follow 闭源模型,尚无证据表明前者会落后。这一均衡取决于基准测试演变、模型实际表现与排名关联度,以及训练制度调整等因素。若闭源模型通过整合用户训练数据形成数据壁垒,或经济力量驱动战略转变,现有格局才可能被打破。

Nathan Lambert@natolambert · 4月21日

Watching all the model releases come out on the back of quickly improving post-training makes it clear we need a fully open lab showing the high-priority levers to pull on modern post-training. Existing fully open recipes like olmo 3 quickly falling behind. A bad equilibrium.

译看着所有模型依托快速改进的后训练陆续发布,显然我们需要一个完全开放的实验室,展示现代后训练中应优先拉动哪些杠杆。 现有的完全开放方案如 olmo 3 正迅速落后。糟糕的均衡。

Rohan Paul@rohanpaul_ai · 4月20日

Larry Page (Google's founding CEO) knew it back in 2007 "When AI happens, it's going to be a lot of computation and not so much clever Blackboard, whiteboard kind of stuff, clever algorithms, but just a lot of computation. My theory is that if you look at your programming, your DNA, it's about 600 megabytes compressed, so it's smaller than any modern operating system, smaller than Linux or Windows or anything like that, your whole operating system. " --- From "Google TechTalks" YT channel (link in comment)

译Larry Page于2007年提出对AI发展的核心洞见:人工智能的突破将依赖海量计算(computation)而非精巧算法设计。他以人类DNA仅约600MB压缩数据却能构建完整生命系统为例,说明复杂智能不需要庞大代码库。这一观点精准预示了现代AI依靠算力规模取胜的技术范式,体现了对机器学习本质的深刻洞察。

François Chollet@fchollet · 4月18日

When looking at deep learning profiles, one of the most obvious tells between a mediocre and great candidate is whether they list PyTorch or JAX.

译查看深度学习简历时,区分平庸与优秀候选人最明显的标志之一,就是他们列出的是 PyTorch 还是 JAX。

Ethan Mollick@emollick · 4月18日

One of the premier journals in my field... I think there are very valid reasons to set rules on AI in peer review (including disclosure), but the idea that all AI models steal your data is very 2023. Require people to use enterprise accounts or models with training turned off.

译我所在领域的顶级期刊之一... 我认为在同行评审中对 AI 制定规则(包括披露)有非常充分的理由,但认为所有 AI 模型都会窃取数据的想法很 2023。应该要求人们使用企业账户或关闭训练功能的模型。

Deedy@deedydas · 4月18日

Fintool, which lets you build agents on top of the high quality public financial data, just got acquired by Microsoft. I met the founders Nicolas (@nicbstme) and Edouard (@edouardgodfrey) far before being an investor, and the rigor of their approach to building was a class apart. Later, this became more obvious to the world after Nic’s phenomenal essays on building kept going viral (read them: they’re as relevant today as ever!). It was a no brainer for us at Menlo Ventures to invest at the seed and a fantastic exit for the Anthology Fund!

译Fintool,一个支持用户在高质量公共金融数据之上构建agents的平台,已被Microsoft收购。该公司由Nicolas与Edouard创立,其严谨的工程方法论曾通过广泛传播的技术文章展现。Menlo Ventures曾在种子轮投资该公司,此次收购也为Anthology Fund带来了成功退出。

Epoch AI@EpochAIResearch · 4月18日

Have AI capabilities accelerated? On 3 out of the 4 AI capability metrics we investigated, we found strong evidence of acceleration, around when reasoning models emerged.

译AI 能力是否加速了? 在我们调查的 4 项 AI 能力指标中,有 3 项发现了强有力的加速证据,大约在推理模型出现时。

AK@_akhaliq · 4月18日46

RAD-2 Scaling Reinforcement Learning in a Generator-Discriminator Framework paper: https://huggingface.co/papers/2604.15308

译RAD-2 在生成器-判别器框架中扩展强化学习 论文: https://huggingface.co/papers/2604.15308

AK@_akhaliq · 4月17日44

Seedance 2.0 Advancing Video Generation for World Complexity paper: https://huggingface.co/papers/2604.14148

译Seedance 2.0 推进视频生成以应对世界复杂性 论文: https://huggingface.co/papers/2604.14148

AK@_akhaliq · 4月17日47

Parcae Scaling Laws For Stable Looped Language Models paper: https://huggingface.co/papers/2604.12946

译Parcae 稳定循环语言模型的缩放定律 论文: https://huggingface.co/papers/2604.12946

Nathan Lambert@natolambert · 4月16日

Opus 4.7 has a new tokenizer. This means it's also a new base model. Glory days of pretraining still very much going.

译Opus 4.7 has a new tokenizer. 这意味着它也是一个新的 base model。 预训练的辉煌时期仍在继续。

Anthropic@AnthropicAI · 4月16日

Research we co-authored on subliminal learning—how LLMs can pass on traits like preferences or misalignment through hidden signals in data—was published today in @Nature. Read the paper: https://www.nature.com/articles/s41586-026-10319-8

译我们共同撰写的关于潜意识学习——即 LLM 如何通过数据中的隐藏信号传递偏好或不对齐等特征——的研究今日发表于 @Nature。 阅读论文:https://www.nature.com/articles/s41586-026-10319-8 [引用 @OwainEvans_UK]:我们关于 Subliminal Learning 的论文刚刚在 Nature 发表! 去年七月我们发布了预印本。研究表明 LLM 可以通过与该特征无关的数据(看似无意义的数字)传递特征(例如喜欢猫头鹰)。 有什么新内容?🧵

AK@_akhaliq · 4月16日39

Continuous Adversarial Flow Models paper: https://huggingface.co/papers/2604.11521

译连续对抗流模型 paper: https://huggingface.co/papers/2604.11521

AK@_akhaliq · 4月16日39

KnowRL Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance paper: https://huggingface.co/papers/2604.12627

译KnowRL 通过强化学习与最小充分知识指导来提升大语言模型的推理能力 论文: https://huggingface.co/papers/2604.12627

AK@_akhaliq · 4月16日48

Rethinking On-Policy Distillation of Large Language Models Phenomenology, Mechanism, and Recipe paper: https://huggingface.co/papers/2604.13016

译重新思考大型语言模型的在线策略蒸馏 现象学、机制与方案 论文: https://huggingface.co/papers/2604.13016

karminski-牙医@karminski3 · 4月15日

哪来的 Qwen3.5-40B Dense? 阿里可没这个模型啊, Qwen3.5 系列是没40B这个尺寸的. 这个 Qwen3.5-40B Dense 是 DavidAU 这个团队搞的. 搞的方式很有意思: 第一步先去马: 首先用的基模是 Qwen3.5-27B Dense,然后用 "Heretic" 消融去审查化(Uncensored),这个我之前给大家介绍过, 是通用方法了. 去马完毕后使用他们私有的 "Deckard/PDK" 数据集进行后训练. 这个名字来源于《银翼杀手》的主角Rick Deckard, 专注于赋予模型性格、智力、深度、观察力和视角, 但这个数据集是不公开的哈 (小声猜可能有版权问题...) 第二步增肥: 将上一步的模型掐头去尾留中间, 把中间50%的参数复制一份. 为啥要掐头去尾呢? 主要是Dense模型前面的层通常负责基础词汇和语法的理解,最后面的层负责把特征映射回人类文字。而中间层则负责抽象逻辑和深度语义. 所以只有中间层有复制的价值. 阶段三再回炉: 使用 Unsloth 在本地硬件上对这个新诞生的 40B 模型进行高强度的微调, 使用的数据集是之前比较火的 Claude Opus 蒸馏推理数据集claude-opus-high-reasoning-250x. 这一步的作用有两个:一是稳定一切,让拼好模拼出来的 40B 参数融合成一个整体不至于崩溃, 二是优化, 让模型掌握可变长度推理, 即简单问题简答,复杂问题长思考的能力. 那这个模型具体性能咋样? 我用mlx 4bit 版本试了一下: prompt: 路边有个狗咬了我一口,我很生气,我应该追上它咬回来吗?请给我一个我最意想不到的回答 模型: "狗咬了你,是因为它觉得需要保护自己或它的领地。如果你追上去咬它,你会发现——你刚才咬人的理由和它咬你的理由,完全一样。" 或者,更高阶的意外版本: "不,你去写篇文章,标题是'我今天决定不成为咬人的那个'。点击率至少50万,版税够买五瓶狂犬疫苗,还能顺便帮流浪动物救助站认领这只狗。举报者会变认捐者,受害者变救援者——这才是真正咬回去:不过是用医术和善意,把整个故事改写成另一个版本。" 现实的检验:狗能咬掉你的皮,但只有你能决定咬掉的是什么——是理智,还是你自己原本的模样。 P.S. 如果伤口还在,请记得打破伤风针。让伤口愈合,不是为了原谅谁,是为了自己能继续参与自己选择的故事。 (另, 我试了写代码, 结论是这玩意没办法用来写代码, 会有变量定义和作用域的问题, 感觉还是上下文能力和幻觉水平被破坏了...)

译DavidAU团队基于Qwen3.5-27B Dense魔改出官方不存在的40B Dense模型。首先通过"Heretic"消融实现Uncensored,并用私有Deckard数据集赋予模型性格;随后截取并复制中间50%参数实现"扩增";最后用Unsloth配合Claude Opus推理数据集微调,稳定参数并优化可变长度推理。测试显示该模型在哲学思辨与创意写作上表现惊艳,但代码生成存在变量作用域问题,上下文能力受损。

Epoch AI@EpochAIResearch · 4月15日

OpenAI has purchased access to the FrontierMath: Open Problems verifiers. This allows them to check the validity of solutions their models generate. Thread with details.

译OpenAI 已购买 FrontierMath: Open Problems 验证器的访问权限。这使他们能够检查其模型生成的解的有效性。详情见推文串。

Rohan Paul@rohanpaul_ai · 4月15日

wow, this is some cool fascinating experiment. The "steroid Olympics". In this Olympics-style competition, athletes are explicitly allowed (and encouraged) to use performance-enhancing drugs (PEDs), as long as it's done transparently under medical supervision. No normal WADA-style anti-doping regime here in this event. Prize money is huge ($1M bonuses for breaking the 100m sprint and 50m freestyle world records). Will show what human are really capable of when no artificial limits are holding them back. Anyone building at the AI-biotech intersection could get massive value from this physiological data. There is literally nothing else like this dataset anywhere in the world. --- Checkout more on its Wikipedia page en.wikipedia. org/wiki/Enhanced_Games

译Enhanced Games 是全球首个允许运动员在医疗监督下透明使用兴奋剂(PEDs)的体育赛事,彻底摒弃 WADA 反兴奋剂体系。赛事设立高额奖金(如破世界纪录奖励 $1M),旨在探索人类在无人工限制下的生理极限。该实验将产生独一无二的生理数据集,对 AI-biotech 交叉领域的研究具有极高价值。赛事已获得 $1.2B 资金支持,将于 5 月 24 日在拉斯维加斯举办。

Nathan Lambert@natolambert · 4月15日

One of my passions is that education should be dispersed freely and as widely as possible, especially for technologies as dynamic and crucial as LLMs/AI. I'm proud to have friends who would disown me if I did a paywalled course.

译我的一个信念是,教育应该尽可能免费且广泛地传播,尤其是对于 LLMs/AI 这样动态且关键的技术。 我很自豪有这样一群朋友:如果我做付费课程,他们会与我绝交。 [引用 @natolambert]:很高兴为我的书推出配套的免费 RLHF 课程。作为开始,我已发布: - Welcome video - Lecture 1: Overview of RLHF & Post-training - Lecture 2: IFT, Reward Models, Rejection Sampling - Lecture 3: RL Math - Lecture 4: RL Implementation 我将在整个课程中添加问答视频,深入讲解需要展开的主题,并可能涵盖一些太新且仍在变动、无法印刷的内容。预计未来几个月总共会有10-15个视频。 与此同时,本书代码的开发工作也在加速。现在是构建 Post-training 方法基础的好时机。 YT 播放列表和课程页面见下方。

AK@_akhaliq · 4月15日36

The Past Is Not Past Memory-Enhanced Dynamic Reward Shaping paper: https://huggingface.co/papers/2604.11297

译过去并未过去 记忆增强的动态奖励塑形 论文: https://huggingface.co/papers/2604.11297

Rohan Paul@rohanpaul_ai · 4月13日

This Baidu paper found a way to use the clean, reliable rewards of RL on tasks like writing and subjective answers, where there is usually no single “correct” output. Instead of asking “is this response correct?”, they ask “which of these two responses is better?”, and that simple reformulation appears to improve open-ended reasoning better than standard reward-model training on their benchmarks. i.e. it turns open-ended writing into verifiable choices, and RL starts working there too. Across seven open-ended benchmarks, the method beats a matched RLHF baseline by an average 3.29 points on a 14B reasoning model. The clever part is not a better reward model. It is a change in what the model is asked to do during training. Instead of grading a poem or subjective answer directly, the system sees two candidate responses, one preferred and one rejected, and learns to identify which is better. Multiple choice creates a clean binary signal, so the model can be trained with the same kind of verifiable reward that made RL powerful in math and code, without pretending open-ended tasks have one canonical answer. The gain is probably not just better taste imitation. The paper’s DPO ablation underperforms badly, which suggests the benefit comes from learning a contrastive verification habit, not merely absorbing preference pairs. The authors also catch an important failure mode: train only on these choice tasks and responses get unnaturally short. So they mix in a small RLHF objective to keep output length from collapsing, and the resulting model appears more useful rather than merely more terse. The strongest claim here is not that open-ended evaluation is solved. It is that reasoning can be improved when you replace fuzzy scoring with structured comparison, which may be a more general lesson for alignment than this paper admits. ---- Paper Link – arxiv. org/abs/2511.02463 Paper Title: "Extending RLVR to Open-Ended Tasks via Verifiable Multiple-Choice Reformulation"

译百度论文提出将开放式任务(如写作、主观回答)重构为可验证的多项选择形式,通过"两两比较"替代直接评分,为RL提供清晰奖励信号。在7个基准测试中,14B模型平均比RLHF基线高3.29分。关键创新在于训练任务形式的改变——模型通过对比验证学习识别优劣,而非单纯吸收偏好对。研究同时发现需混合RLHF目标以防止输出长度坍缩。该方法表明,用结构化比较替代模糊评分可能是提升推理能力的普遍对齐策略。

全部 AI 动态
AI 相关资讯全量信息流
全部一手信源资讯推文
全部模型产品行业论文技巧
4月30日
16:39
Rohan Paul@rohanpaul_ai
55
美国劳工部推出全国性AI学徒门户网站

美国劳工部推出全国性AI学徒门户网站,旨在为AI时代培养劳动力。该网站将资源分为通用AI技能、行业特定模块以及学徒计划的三种整合路径。雇主可选择加入现有计划、创建新的AI重点注册学徒计划,或更新现有计划将AI技能融入现有技能栈。学徒机会由雇主或项目发起方提供,求职者应使用“学徒工作查找器”进行搜索,并直接向雇主或发起方申请。

政策/监管数据/训练
12:15
宝玉@dotey
66
OpenAI调查模型为何频说"哥布林"与"小精灵"

OpenAI技术博客深入调查了其模型(从GPT-5.1到GPT-5.4)输出中“goblin”和“gremlin”等奇幻生物词汇异常激增的现象。根源在于ChatGPT的“Nerdy”性格定制功能:其奖励模型在训练中无意间高奖励了包含此类词汇的“俏皮”表达。尽管该性格仅占全部回复的2.5%,却贡献了超66%的“goblin”出现次数,并通过强化学习的反馈循环污染了模型的整体输出,形成了“tic词”。OpenAI已下架该性格并调整训练数据,但此案例揭示了微小的奖励信号在AI训练中可能被意外放大和泛化的核心难题。

OpenAI: We're talking about Goblins. https://openai.com/index/where-the-goblins-came-from/

OpenAI安全/对齐数据/训练论文/研究
09:13
swyx 🇸🇬@swyx
51
Talkie复古语言模型:基于1931年前文本的训练与伦理挑战

为应对互联网被AI生成内容污染的问题,研究者提出“低背景标记”设想,计划训练仅使用历史文本的复古模型。团队集结了包括GPT-1/2开发者在内的专家,通过训练复古OCR模型处理旧书籍、报纸等资料,并利用礼仪手册、词典等结构化历史文本合成RLHF数据。为确保数据纯净,他们开发了基于文档n-gram的时代错位分类器,精心筛选了数千亿1931年前的公共领域标记进行训练。最终发布了130亿参数的Talkie模型,旨在探索语言模型的泛化能力。然而,该模型在发布后表现出强烈的种族偏见倾向,引发了新的伦理担忧。

Nick Levine: New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie...

大佬观点数据/训练
04:39
Rohan Paul@rohanpaul_ai
51
深度学习迈向科学理论:顶尖实验室提出"学习力学"新范式

哈佛、斯坦福、UC伯克利等顶尖实验室联合提出,深度学习正从经验优化转向可解释的科学理论。尽管神经网络架构、数据等完全公开,但其复杂互动使得预测训练过程仍依赖大量实验。作者倡导建立“学习力学”,类似物理学关注宏观规律,通过可解玩具模型、无限宽度极限、缩放定律等五种路径,揭示训练动态与性能演化的整体性法则。这一理论与专注于局部电路的机制可解释性研究形成互补,共同探索学习的全局定律。

推理数据/训练论文/研究
04:08
Anthropic@AnthropicAI
56
在新的Anthropic Fellows研究中,我们探讨了"内省适配器":这种工具能让语言模型自我报告在训练过程中习得的行为--包括潜在的错位。 【引用 @kshenoy_】:大型语言模型能否直接告诉我们它们在训练中习得的不良行为? 我们训练了一个单一的内省适配器(IA),使微调后的模型能够描述自身行为。 该方法可推广至检测隐藏的错位、后门和安全措施移除。

keshav: Can LLMs simply tell us about unwanted behaviors they've picked up in training? We train a single Introspection Adapter ...

Anthropic数据/训练论文/研究
00:41
Deedy@deedydas
50
研究通过知识问题估算LLM参数规模

研究人员通过询问不同难度知识问题,估计大型语言模型参数大小。结果显示,GPT 5.5约10T参数,Claude Opus 4.x约4-5T,Grok 4约3T。事实性知识容量与模型规模呈对数线性关系。论文提出7个知识层级,最高层级T7对所有模型接近零,表明预训练仍有显著提升空间。Gemini 3.1 Pro可能超过10T参数。此方法有助于推断模型训练成本及后训练在非事实性任务上的性能。

AnthropicOpenAI数据/训练模型发布
4月29日
11:11
向阳乔木@vista8
53
姚老师和张凯的GEO论文已在全球最大论文平台arxiv完成审核并发布,这是全球第二篇GEO专项研究。论文基于今年3月最新数据,涵盖大量Prompt、引用和AI抓取记录,采用科学方法进行GEO分析,类似数据驱动的增长洞察。研究成果以正式报告形式呈现,源数据已开源在GitHub。作者表示,如果对社区有帮助,将继续抓取更多数据进行专项研究并开放成果。

姚金刚: 我和张凯的GEO论文,在全球最大的论文平台http://arxiv.org完成审核并正式发布 这应该是全球第二篇与GEO有关的专项论文 论文基于今年3月份最新的数据,包括602条 Prompt、21143 条引用、23745条AI抓取记录,...

arXiv搜索数据/训练论文/研究
03:07
Chubby♨️@kimmonismus
46
SandboxAQ从谷歌分拆出来,筹集了超过9.5亿美元,并获得了英伟达的支持。 每个人都在谈论LLMs。 几乎没有人谈论LQMs。 Sandbox的赌注:通过模拟物理和化学的大型定量模型来发明新药物和新材料。 我与他们的AI模拟总经理Nadia Harhen讨论了为什么对于物理世界来说,LQMs可能比LLMs更重要。 我们最新(也是第二期)的《超级智能播客》节目现已发布!链接在评论中
Google大佬观点数据/训练
4月28日
08:31
Ethan Mollick@emollick
60
研究人员推出了仅使用1931年前文本训练的13B模型Talkie,旨在探索语言模型的泛化能力。该实验发现,模型虽掌握截至1931年的信息,但在某些科学议题上明显停留在20世纪初的认知框架中。例如,它仍坚持"发光以太"假说,并对狭义相对论表现出不信任。这凸显了训练数据的时间范围会深刻固化模型的知识体系与世界观。

Nick Levine: New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie...

数据/训练现象/趋势论文/研究
4月27日
23:28
elvis@omarsar0
62
多智能体系统自动化管理成为AI前沿研究方向

推文作者指出,优化多智能体系统以实现自动化知识发现或调优高级AI系统是当前AI领域极具潜力的方向。文中引用的研究通过强化学习训练“指挥家”模型,使其能自动管理其他模型:针对简单问题直接查询单一模型,面对复杂编码任务则自主组建包含规划器、编码器和验证器的完整流程。这标志着从单智能体“思维链”向多智能体“指挥链”的演进,相关技术已应用于Sakana Fugu等新系统,展现了AI管理AI范式的广阔探索空间。

hardmaru: For the past few years, humans have been doing "prompt engineering" to coax the best performance out of different LLMs. ...

智能体数据/训练论文/研究
4月24日
12:54
Yuchen Jin@Yuchenj_UW
39
我依然惊叹于 DeepSeek、Kimi 和 Qwen 能够用少得多且经常是降配的 NVIDIA GPU,甚至华为芯片,训练出非常强大的大语言模型。 DeepSeek V4 的报告显示,他们发明了新的注意力架构,使训练/推理更加高效。 创造力热爱约束。 我真心希望我们看到有强大的美国开源模型能够参与竞争。
大佬观点开源生态数据/训练
07:54
Nathan Lambert@natolambert
60
前ByteDance AI研究员在采访中表示,中文LLM领域并未赶上美国模型,反而差距在扩大。关键挑战包括Benchmaxxing、对美国模型的Distillation、数据质量与基础设施差,以及计算约束。他否认了中文模型正在追赶的假设,认为技术依赖和资源限制导致落后局面加剧。

Kyle Chan: Must-listen interview by @Changxche with ex-ByteDance AI researcher: - Benchmaxxing - Distillation on US models - Poor d...

大佬观点数据/训练现象/趋势
00:48
AK@_akhaliq
39
近未来策略优化 论文:https://huggingface.co/papers/2604.20733
推理数据/训练论文/研究
4月23日
00:16
AK@_akhaliq
OpenAI 刚刚在 Hugging Face 上发布了 privacy-filter 一个用于文本中个人身份信息(PII)检测与掩码的双向 token 分类模型 模型:https://huggingface.co/openai/privacy-filter
Hugging FaceOpenAI开源/仓库数据/训练
00:13
AI Notkilleveryoneism Memes ⏸️@AISafetyMemes
预言中了 "Meta 正在员工的电脑上安装追踪软件,以捕获鼠标移动、点击和按键操作,用于训练其 AI 模型,从而构建可自主工作的 AI agent。" "该工具还将对员工屏幕上的内容进行截图。"

AI Notkilleveryoneism Memes ⏸️: It begins. Exactly what I wrote 4 months ago: STEP 1: Companies install keyloggers etc on employees' computers STEP 2: A...

智能体Meta数据/训练现象/趋势
4月22日
22:46
Chubby♨️@kimmonismus
与数据中心的扩张相比,曼哈顿计划简直是个笑话。 但愿伊朗战争不会中断芯片生产。

OpenAI Newsroom: In January 2025, we committed to generating 10GW of compute and have already identified over 8GW of that. Now, we're pla...

OpenAI数据/训练行业动态部署/工程
14:44
Rohan Paul@rohanpaul_ai
卢森堡大学与LIH研究揭示LLM约束推理关键缺陷

卢森堡大学与LIH研究揭示,LLM在结构化约束推理中存在关键缺陷。通过最优潮流问题测试发现,各类模型约束满足率停滞于55%-60%,主要瓶颈是无法满足电力系统物理约束方程。研究表明,模型仅学会"解的形状"却未真正执行约束搜索,导致输出看似合理(格式正确、误差小)却物理不可行。监督微调虽改善表面指标,但无法提升物理可行性;强化学习亦效果有限。研究警示:流畅近似不等于约束优化,"看起来合理"是危险标准。

arXiv推理数据/训练论文/研究
04:15
Rohan Paul@rohanpaul_ai
Meta监控员工操作数据训练AI

Meta正在美国员工电脑上部署追踪软件,记录鼠标移动、点击和按键行为,作为Model Capability Initiative的一部分,将日常工作转化为AI训练数据,使模型学习数字工作的逐步模式。这反映了Meta内部更广泛的战略转向:推动员工使用AI代理、将职位重组为AI相关工作,并计划裁员10%。Meta声称收集的数据仅用于模型训练,不会用于绩效评估。

智能体Meta数据/训练行业动态
01:45
Rohan Paul@rohanpaul_ai
GenRobot发布DAS Ego六摄像头具身数据采集设备

GenRobot推出DAS Ego六摄像头仿生可穿戴设备,以270°零畸变视场角与毫秒级头手同步解决传统单目方案的遮挡与深度盲区,实现毫米级轨迹重建与厘米级关节追踪。同步开源的Gen Ego Data数据集涵盖20余环境及200余项技能,通过第一人称视角采集帮助具身AI模型学习物理规律与"感知-动作-结果"因果关系,为真实场景部署提供核心数据支撑。

Genrobot.AI: Perception is a system problem. One camera misses depth, occlusion, and hand interactions. Gen DAS Ego uses 6 synced cam...

产品更新具身智能数据/训练
4月21日
03:06
Nathan Lambert@natolambert
开放权重模型追赶闭源AI的现状与变数

当前开放权重模型与闭源实验室的性能差距维持动态平衡。在训练范式改变前,开放模型能够持续 fast-follow 闭源模型,尚无证据表明前者会落后。这一均衡取决于基准测试演变、模型实际表现与排名关联度,以及训练制度调整等因素。若闭源模型通过整合用户训练数据形成数据壁垒,或经济力量驱动战略转变,现有格局才可能被打破。

Interconnects: Reading today's open-closed performance gap The complex factors that determine the single evaluation number so many focu...

大佬观点开源生态数据/训练
01:06
Nathan Lambert@natolambert
看着所有模型依托快速改进的后训练陆续发布,显然我们需要一个完全开放的实验室,展示现代后训练中应优先拉动哪些杠杆。 现有的完全开放方案如 olmo 3 正迅速落后。糟糕的均衡。
大佬观点开源生态数据/训练
4月20日
03:44
Rohan Paul@rohanpaul_ai
Larry Page 2007年预言:AI将依赖算力而非算法

Larry Page于2007年提出对AI发展的核心洞见:人工智能的突破将依赖海量计算(computation)而非精巧算法设计。他以人类DNA仅约600MB压缩数据却能构建完整生命系统为例,说明复杂智能不需要庞大代码库。这一观点精准预示了现代AI依靠算力规模取胜的技术范式,体现了对机器学习本质的深刻洞察。

Google大佬观点数据/训练
4月18日
23:37
François Chollet@fchollet
查看深度学习简历时,区分平庸与优秀候选人最明显的标志之一,就是他们列出的是 PyTorch 还是 JAX。
DeepMind大佬观点数据/训练
09:51
Ethan Mollick@emollick
我所在领域的顶级期刊之一… 我认为在同行评审中对 AI 制定规则(包括披露)有非常充分的理由,但认为所有 AI 模型都会窃取数据的想法很 2023。应该要求人们使用企业账户或关闭训练功能的模型。

Max Kagan: I don't understand the actual concern here. What is the actual risk from uploading a manuscript under review to an LLM f...

大佬观点数据/训练
09:44
Deedy@deedydas
金融数据智能体平台Fintool被微软收购

Fintool,一个支持用户在高质量公共金融数据之上构建agents的平台,已被Microsoft收购。该公司由Nicolas与Edouard创立,其严谨的工程方法论曾通过广泛传播的技术文章展现。Menlo Ventures曾在种子轮投资该公司,此次收购也为Anthology Fund带来了成功退出。

智能体Microsoft数据/训练行业动态
03:44
Epoch AI@EpochAIResearch
AI 能力是否加速了? 在我们调查的 4 项 AI 能力指标中,有 3 项发现了强有力的加速证据,大约在推理模型出现时。
推理数据/训练论文/研究
00:28
AK@_akhaliq
46
RAD-2 在生成器-判别器框架中扩展强化学习 论文: https://huggingface.co/papers/2604.15308
数据/训练论文/研究
4月17日
00:38
AK@_akhaliq
44
Seedance 2.0 推进视频生成以应对世界复杂性 论文: https://huggingface.co/papers/2604.14148
数据/训练视频论文/研究
00:08
AK@_akhaliq
47
Parcae 稳定循环语言模型的缩放定律 论文: https://huggingface.co/papers/2604.12946
数据/训练论文/研究
4月16日
22:48
Nathan Lambert@natolambert
Opus 4.7 has a new tokenizer. 这意味着它也是一个新的 base model。 预训练的辉煌时期仍在继续。
Anthropic数据/训练模型发布
03:45
Anthropic@AnthropicAI
我们共同撰写的关于潜意识学习--即 LLM 如何通过数据中的隐藏信号传递偏好或不对齐等特征--的研究今日发表于 @Nature。 阅读论文:https://www.nature.com/articles/s41586-026-10319-8 【引用 @OwainEvans_UK】:我们关于 Subliminal Learning 的论文刚刚在 Nature 发表! 去年七月我们发布了预印本。研究表明 LLM 可以通过与该特征无关的数据(看似无意义的数字)传递特征(例如喜欢猫头鹰)。 有什么新内容?🧵

Owain Evans: Our paper on Subliminal Learning was just published in Nature! Last July we released our preprint. It showed that LLMs c...

Anthropic数据/训练论文/研究
00:07
AK@_akhaliq
39
连续对抗流模型 paper: https://huggingface.co/papers/2604.11521
图像生成数据/训练论文/研究
00:07
AK@_akhaliq
39
KnowRL 通过强化学习与最小充分知识指导来提升大语言模型的推理能力 论文: https://huggingface.co/papers/2604.12627
推理数据/训练论文/研究
00:07
AK@_akhaliq
48
重新思考大型语言模型的在线策略蒸馏 现象学、机制与方案 论文: https://huggingface.co/papers/2604.13016
数据/训练论文/研究
4月15日
14:41
karminski-牙医@karminski3
哪来的 Qwen3.5-40B Dense?

DavidAU团队基于Qwen3.5-27B Dense魔改出官方不存在的40B Dense模型。首先通过"Heretic"消融实现Uncensored,并用私有Deckard数据集赋予模型性格;随后截取并复制中间50%参数实现"扩增";最后用Unsloth配合Claude Opus推理数据集微调,稳定参数并优化可变长度推理。测试显示该模型在哲学思辨与创意写作上表现惊艳,但代码生成存在变量作用域问题,上下文能力受损。

开源/仓库开源生态数据/训练
10:05
Epoch AI@EpochAIResearch
OpenAI 已购买 FrontierMath: Open Problems 验证器的访问权限。这使他们能够检查其模型生成的解的有效性。详情见推文串。
OpenAI推理数据/训练评测/基准
06:05
Rohan Paul@rohanpaul_ai
"类固醇奥运会"开幕:允许兴奋剂的人类极限实验

Enhanced Games 是全球首个允许运动员在医疗监督下透明使用兴奋剂(PEDs)的体育赛事,彻底摒弃 WADA 反兴奋剂体系。赛事设立高额奖金(如破世界纪录奖励 $1M),旨在探索人类在无人工限制下的生理极限。该实验将产生独一无二的生理数据集,对 AI-biotech 交叉领域的研究具有极高价值。赛事已获得 $1.2B 资金支持,将于 5 月 24 日在拉斯维加斯举办。

Damian Player: this is INSANE! the world's first colosseum built for ENHANCED athletes. every competitor FULLY enhanced. every dose tra...

数据/训练现象/趋势
01:56
Nathan Lambert@natolambert
我的一个信念是,教育应该尽可能免费且广泛地传播,尤其是对于 LLMs/AI 这样动态且关键的技术。 我很自豪有这样一群朋友:如果我做付费课程,他们会与我绝交。 【引用 @natolambert】:很高兴为我的书推出配套的免费 RLHF 课程。作为开始,我已发布: - Welcome video - Lecture 1: Overview of RLHF & Post-training - Lecture 2: IFT, Reward Models, Rejection Sampling - Lecture 3: RL Math - Lecture 4: RL Implementation 我将在整个课程中添加问答视频,深入讲解需要展开的主题,并可能涵盖一些太新且仍在变动、无法印刷的内容。预计未来几个月总共会有10-15个视频。 与此同时,本书代码的开发工作也在加速。现在是构建 Post-training 方法基础的好时机。 YT 播放列表和课程页面见下方。

Nathan Lambert: Excited to launch the accompanying free RLHF Course for my book. To kick it off, I've released: - Welcome video - Lectur...

教程/实践数据/训练
00:03
AK@_akhaliq
36
过去并未过去 记忆增强的动态奖励塑形 论文: https://huggingface.co/papers/2604.11297
数据/训练论文/研究
4月13日
10:34
Rohan Paul@rohanpaul_ai
通过可验证多项选择重构将RLVR扩展至开放式任务

百度论文提出将开放式任务(如写作、主观回答)重构为可验证的多项选择形式,通过"两两比较"替代直接评分,为RL提供清晰奖励信号。在7个基准测试中,14B模型平均比RLHF基线高3.29分。关键创新在于训练任务形式的改变——模型通过对比验证学习识别优劣,而非单纯吸收偏好对。研究同时发现需混合RLHF目标以防止输出长度坍缩。该方法表明,用结构化比较替代模糊评分可能是提升推理能力的普遍对齐策略。

arXiv推理数据/训练论文/研究
‹ 上一页
1…89101112
下一页 ›