# Meta被曝限制工程师使用Anthropic的Claude Code和OpenAI的Codex以防训练数据污染

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-06-30 02:00
- AIHOT 分数：53
- AIHOT 链接：https://aihot.virxact.com/items/cmqzjwhg7000tslks05gzx0n3
- 原文链接：https://x.com/rohanpaul_ai/status/2071655065107284026

## AI 摘要

The Information报道，Meta已限制工程师使用Anthropic的Claude Code和OpenAI的Codex，原因是为防止竞争对手模型输出污染Meta自身AI训练数据，并引发合同纠纷。OpenAI和Anthropic的服务条款均禁止使用其输出来开发竞争模型。知识蒸馏风险在于即使意外复用竞品输出也可能被视为从竞争对手提取能力。建议的策略包括成分追踪：仅在不用于模型训练管线、评测集、基准生成、后训练数据、奖励模型数据及内部数据集时才使用竞品工具。典型防护措施有隔离规则、企业账户审批、训练数据溯源日志、数据集隔离及自动扫描“AI生成”标记等。

## 正文

The Information： Meta has reportedly limited engineer use of Claude Code and Codex because rival model outputs could contaminate Meta's own AI training data and create contractual trouble with Anthropic and OpenAI.

Distillation risk starts when a new model of Meta learns from another model's outputs （from OpenAI or Anthropic）， so even accidental reuse of Claude or Codex answers could look like Meta extracted capability from competitors rather than built it alone.

OpenAI's terms bar using output to develop competing models， and Anthropic says its terms do not allow Claude outputs to train models competitive with Anthropic's own systems.

Both OpenAI's and Anthropic's terms bar using output to develop competing models.

IMO， the safest strategy could be ingredient tracking： use rival tools for ordinary productivity only when outputs are barred from model-training pipelines， evaluation sets， benchmark generation， post-training data， reward-model data， and internal datasets that later feed model development.

Of course a strong lawsuit usually needs much more ugly facts like： mass scraping， fake accounts， rate-limit evasion， automated extraction， direct use of outputs as training labels， or internal records showing the buyer knew it was cloning a rival system.

In this situation， som of the typical safeguards are clean-room rules， approved enterprise accounts， no consumer accounts for sensitive work， training-data provenance logs， dataset quarantine， prompt and output retention， automated scanners for "AI-generated by vendor X" material， and access controls separating coding-agent work from model-training data.