面壁智能 OpenBMB 联合清华NLP、慕尼黑工业大学等发布 FactNet,构建十亿级开源多语言知识图谱。它将 1.7B 原子断言统一为 1.55B FactSynsets,附带 3.01B 来自 316 种语言维基百科的字节级可追溯证据(页面ID、修订版ID、Unicode偏移),99.63% 精确重定位。人工审计 4,200 项,设计加权精度 92.1%(低资源语言 88.5%)。FactNet-Bench 包含 KGC、MKQA、MFC 三项任务,显式惩罚信息泄露,为可验证 AI 提供结构化事实基础。
LLMs keep getting more fluent-but can you actually verify what they say? Structured KBs like Wikidata lack text grounding. Annotation-based datasets like FEVER are too small and monolingual. Synthetic expansion just produces hallucinations at scale. The trilemma between authenticity, scale, and structure has gone unsolved. ❓ Today, we dive into FactNet-a landmark contribution by @TsinghuaNLP (OpenBMB member) alongside researchers from TU Munich, Modelbest Inc., and Minzu University of China. FactNet constructs a billion-scale, open-source multilingual knowledge graph that unifies structured Wikidata assertions with auditable, byte-level evidence pointers from 316 native Wikipedia editions. 🤗 Paper: https://huggingface.co/papers/2602.03417 📄 arXiv: https://arxiv.org/abs/2602.03417 💻 Code & Data: https://github.com/yl-shen/factnet