超参数
加权
班级(哲学)
计算机科学
点(几何)
比例(比率)
采样(信号处理)
算法
数学
数据挖掘
人工智能
量子力学
医学
滤波器(信号处理)
物理
放射科
计算机视觉
几何学
作者
Yin Cui,Menglin Jia,Tsung-Yi Lin,Yang Song,Serge Belongie
标识
DOI:10.1109/cvpr.2019.00949
摘要
With the rapid increase of large-scale, real-world datasets, it becomes critical to address the problem of long-tailed data distribution (i.e., a few classes account for most of the data, while most classes are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on the number of observations for each class. In this work, we argue that as the number of samples increases, the additional benefit of a newly added data point will diminish. We introduce a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple-formula (1-β n )/(1-β), where n is the number of samples and β ∈ [0, 1) is a hyperparameter. We design a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. Comprehensive experiments are conducted on artificially induced long-tailed CIFAR datasets and large-scale datasets including ImageNet and iNaturalist. Our results show that when trained with the proposed class-balanced loss, the network is able to achieve significant performance gains on long-tailed datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI