何时、何地、如何:表格自监督学习的自适应分箱
阅读原文· arxiv.org针对医学表格数据标签获取成本高的问题,研究者提出训练自适应离散化预任务Adaptive Binning。该方法将离散化与学习过程耦合,通过特征级粗到细课程逐步细化分箱,并在检测到训练平台期时选择表征感知的分割点,同时优化值空间和表征空间一致性。异质性感知目标统一分类重建与有序监督。在公共医学表格数据集上,线性探测和微调均取得一致提升,无需数据集特定分箱调参。还引入标准化医学表格SSL基准。代码已开源。
Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.