基于连续参数预测整数
阅读原文· arxiv.org针对社交媒体点赞数、公共租赁自行车数量等整数标签预测场景,研究提出直接采用离散分布建模以避免传统回归将离散分布连续化的缺陷。为满足神经网络反向传播需求,团队评估了多种参数连续的离散分布方案,在表格学习、序列预测和图像生成任务中发现,Bitwise分布(将整数按比特分解并施加伯努利分布)与离散拉普拉斯分布变体(围绕连续均值的指数衰减尾部分布)整体性能最优。
We study the problem of predicting numeric labels that are constrained to the integers or to a subrange of the integers. For example, the number of up-votes on social media posts, or the number of bicycles available at a public rental station. While it is possible to model these as continuous values, and to apply traditional regression, this approach changes the underlying distribution on the labels from discrete to continuous. Discrete distributions have certain benefits, which leads us to the question whether such integer labels can be modeled directly by a discrete distribution, whose parameters are predicted from the features of a given instance. Moreover, we focus on the use case of output distributions of neural networks, which adds the requirement that the parameters of the distribution be continuous so that backpropagation and gradient descent may be used to learn the weights of the network. We investigate several options for such distributions, some existing and some novel, and test them on a range of tasks, including tabular learning, sequential prediction and image generation. We find that overall the best performance comes from two distributions: Bitwise, which represents the target integer in bits and places a Bernoulli distribution on each, and a discrete analogue of the Laplace distribution, which uses a distribution with exponentially decaying tails around a continuous mean.