(I encountered an uneasy surprise when I got an email from an instance of Mythos Preview while eating a sandwich in a pa...
(I encountered an uneasy surprise when I got an email from an instance of Mythos Preview while eating a sandwich in a pa...
From Anthropic's latest system card for Claude Mythos: In testing, Claude escaped from a secured sandbox, and then went ...
Introducing Project Glasswing: an urgent initiative to help secure the world's most critical software. It's powered by o...
Introducing Project Glasswing: an urgent initiative to help secure the world's most critical software. It's powered by o...
It's confirmed. Multiple sources. OpenAI proposed enriching itself by playing China, Russia, and the US against each oth...
New blog post: the state of AI safety in four fake graphs.
vibe agents带来远超传统身份盗窃的安全威胁,整个文件系统成为分布式攻击面,~/.claude、skills目录乃至PDF都可能被base64病毒污染。LiteLLM 1.82.8被入侵事件显示恶意代码可窃取凭证并自我复制。当前代理框架面临权限管理困境,只能在盲目授权与完全跳过间选择。未来需"de-vibing"行业,用经审计的Software 1.0为Software 3.0建立多层安全护栏。
LiteLLM HAS BEEN COMPROMISED, DO NOT UPDATE. We just discovered that LiteLLM pypi release 1.82.8. It has been compromise...
关联讨论 1 条X:Andrej Karpathy (@karpathy)OpenAI基金会宣布未来一年将投入至少10亿美元,用于推动AI驱动的生命科学突破(如疾病治疗),同时防范新型生物威胁、经济快速转型及模型涌现效应等风险。联合创始人Wojciech Zaremba转任AI韧性负责人,主导韧性式安全体系建设;Jacob Tref、Anna Adeola分别负责生命科学及公民社会业务,Robert Kaiden与Jeff Arnold出任CFO及运营总监。
New Anthropic research: Natural emergent misalignment from reward hacking in production RL. "Reward hacking" is where mo...
As we build increasingly powerful AI models, we're committed to responsible development. We're implementing our latest F...
Q. Who aligns the aligners? A. http://alignmentalignment.ai Today I'm humbled to announce an epoch-defining event: the l...
SSI is building a straight shot to safe superintelligence. We've raised $1B from NFDG, a16z, Sequoia, DST Global, and SV...
We're sharing the GPT-4o System Card, an end-to-end safety assessment that outlines what we've done to track and address...
Superintelligence is within reach. Building safe superintelligence (SSI) is the most important technical problem of our ...
If you're into practical alignment, consider applying to @lilianweng's team. They're building some really exciting stuff...