# CVPR 2021-Progressive Self Label Correction (ProSelfLC) for Training Robust Deep Neural Networks

in Blogs

For any specific discussion or potential future collaboration, please feel free to contact me.

As a young researcher, your interest and star (citation) will mean a lot for me and my collaborators.

Paper link: https://arxiv.org/abs/2005.03788

List of Content

## Feedbacks

## Storyline

## Open ML Research Questions

## Noticeable Findings

Comprehensive learning dynamics for thorough understanding of learning behaviours.

## Literature Review

Target modification includes OR (LS and CP), and LC (Self LC and Non-self LC).

Self LC is the most appealing because it requires no extra learners to revise learning targets,

being free!

Summary of CCE, LS, CP and LC from the angle of target modification, entropy and KL divergence.

## In Self LC, a core question is not well answered:

## Underlying Principle of ProSelfLC

## Mathematical Details of ProSelfLC

Beyond semantic class: the similarity structure defined by a label distribution.

Human annotations and predicted label distributions, which should we trust more?

## Design Reasons of ProSelfLC

Regarding \(g(t)\), in the earlier learning phase, i.e., \(t < \Gamma/2\), \(g(t) < 0.5 \Rightarrow \epsilon_{\mathrm{ProSelfLC}} < 0.5, \forall \mathbf{p}\), so that the human annotations dominate and ProSelfLC only modifies the similarity structure. This is because when a learner does not see the training data for enough times, we assume it is not trained well, which is the most elementary concept in deep learning. Most importantly, more randomness exists at the earlier phase, as a result, the learner may output a wrong confident prediction. In our design, \(\epsilon_{\mathrm{ProSelfLC}} < 0.5, \forall \mathbf{p}\) can assuage the bad impact of such unexpected cases. When it comes to the later learning phase, i.e., \(t > \Gamma/2\), we have \(g(t) > 0.5\), which means overall we give enough credits to a learner as it has been trained for more than the half of total iterations.

Regarding \(l(\mathbf{p})\), we discuss its effect in the later learning phase when it becomes more meaningful. If \(\mathbf{p}\) is not confident, \(l(\mathbf{p})\) will be large, then \(\epsilon_{\mathrm{ProSelfLC}}\) will be small, which means we choose to trust a one-hot annotation more when its prediction is of high entropy, so that we can further reduce the entropy of output distributions}. In this case, ProSelfLC only modifies the similarity structure. Beyond, when \(\mathbf{p}\) is highly confident, there are two fine cases: If \(\mathbf{p}\) is consistent with \(\mathbf{q}\) in the semantic class, ProSelfLC only modifies the similarity structure too; If they are inconsistent, ProSelfLC further corrects the semantic class of a human annotation.

Ablation study on the design of ProSelfLC, where \(\epsilon_{\mathrm{ProSelfLC}}\) consistently performs the best when multiple metrics are reported.

Case analysis on the design of ProSelfLC.