尹志
(2022-10-27 20:44):
#paper doi: https://doi.org/10.48550/arXiv.1708.02002,Focal Loss for Dense Object Detection. (ICCV 2017) 这是一篇目标检测领域的经典的论文,我们知道,一直以来,目标检测领域有两类模型,单阶段和二阶段检测模型。前者以yolo和ssd为主,后者基本上是R-CNN派生出来的。一般而言,单阶段的目标检测算法速度快于二阶段检测算法,而准确性上弱于二阶段算法。原理上,二阶段检测算法基本是第一步生成一堆的候选目标框,然后第二步精准分类这些候选目标框;而单阶段检测算法是直接生成一堆(大量)的检测框。那么是不是提出一个单阶段的检测算法,速度也快,准确性也可以媲美二阶段算法呢?文章认为,单阶段在准确性上目前比不过二阶段算法的原因,是因为存在类别不平衡的问题。在二阶段算法中,我们通过第一阶段已经过滤了大多数的背景样本了,但单阶段算法一次生成的候选框非常密集,其中前景-背景类别的不平衡就非常严重,这也导致准确率上不去。因此作者提出,我们在常规的交叉熵里引入一个缩放因子,这个缩放因子在训练中能够自动对容易的样本进行降权重,从而让模型能更好的处理难例。这就是大名鼎鼎的focal loss。基于focal loss,作者设计了一个单阶段目标检测网络:RetinaNet, 通过实验对比,RetinaNet不论在速度上还是准确性上,都获得了SOTA的性能,在COCO数据集上获得了39.1的AP(这在当年是非常优秀的成绩)
arXiv,
2018.
DOI: 10.48550/arXiv.1708.02002
Focal Loss for Dense Object Detection
翻译
Abstract:
The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: this https URL.
翻译