# TRL-Bench：标准化跨范式表格编码器表示级评估基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-08 08:00
- AIHOT 分数：68
- AIHOT 链接：https://aihot.virxact.com/items/cmq8ws2lj06ieslld84wk6upm
- 原文链接：https://arxiv.org/abs/2606.09323

## AI 摘要

TRL-Bench 是一个多粒度表格表示学习基准，通过统一协议评估行级、列级和表级嵌入。包含三个测试套件：TRL-CTbench（列/表）、TRL-Rbench（行）和 TRL-DLTE（组合式数据湖表增强）。发布的数据资产包括 50 个 OpenML 表（123 个验证目标）、16 个行对链接改写任务及 47,772 表 DLTE 湖。在 20 个模型和 16 个任务上的评估表明，标准化下游条件后，编码器质量呈能力特定性，通用文本编码器在表面文本信号强的任务上领先，表格专用模型在其预训练目标与任务对齐时胜出，最强 DLTE 管线需组合能力匹配的专用模型。

## 正文

Tabular encoders are usually evaluated inside task-specific end-to-end pipelines, so models from different training paradigms are difficult to compare directly even when they operate on similar tabular signals. We introduce TRL-Bench, a multi-granular tabular representation learning (TRL) benchmark that standardizes cross-paradigm representation-level evaluation: each encoder exports row-, column-, or table embeddings through its supported wrapper, and shared lightweight heads probe them across three suites: TRL-CTbench (column/table), TRL-Rbench (row), and TRL-DLTE (compositional Data-Lake Table Enrichment spanning all three granularities). To support this standardized setting, we release curated benchmark assets and task reformulations, including 50 OpenML tables with 123 verified targets, 16 row-pair linkage rewrites, and a 47,772-table DLTE lake derived from 1,379 parent tables. Across 20 models and 16 tasks, TRL-Bench shows that once downstream conditions are standardized, encoder quality is capability-specific rather than captured by a single leaderboard. In TRL-CTbench, generic text encoders often lead on tasks with strong surface-text signal, while tabular specialists win where their pretraining objective aligns with the task. In TRL-Rbench, within-table prediction and cross-table linkage favor different training regimes, with atomic linkage performance correlating strongly with the row-matching stage of DLTE pipelines. In TRL-DLTE, the strongest pipelines combine capability-matched specialists rather than reuse a single encoder, and top end-to-end quality depends on non-additive compositional fit rather than per-stage marginal rank alone. TRL-Bench provides a common protocol for measuring reusable signal in exported tabular representations under shared downstream conditions. Code and data: https://github.com/LOGO-CUHKSZ/TRL-Bench