Robust Deep Learning via Derivative Manipulation and IMAE

14 Jun 2020 in Blogs

For source codes, the usage is conditioned on academic use only and kindness to cite our work: Derivative Manipulation and IMAE.
As a young researcher, your interest and kind citation (star) will definitely mean a lot for me and my collaborators.
For any specific discussion or potential future collaboration, please feel free to contact me.

Selected work partially impacted by our work

ICML-20: Normalized Loss Functions for Deep Learning with Noisy Labels
ICML-20: SIGUA: Forgetting May Make Learning with Noisy Labels More Robust
- Notes and remarks
NeurIPS-20: Early-Learning Regularization Prevents Memorization of Noisy Labels
- The analysis about “gradient and example weighting” has been done in our IMAE + DM, which mathematically prove that CCE tends to over-fit and why, and how to propose robust example weighting schemes.
- Their analysis in Page#4: During the early-learning stage, the algorithm makes progress and the accuracy on wrongly labeledexamples increases. However, during this initial stage, the relative importance of the wrongly labeled examples continues to grow; once the effect of the wrongly labeled examples begins to dominate, memorization occurs.
NeurIPS-20: Coresets for Robust Training of Deep Neural Networks against Noisy Labels
- The key idea behind this method is to select subsets of clean data points that provide an approximately low-rank Jacobian matrix. The authors then prove that gradient descent applied to the subsets cannot overfit the noisy labels, even without regularization or early stopping.
Medical Image Analysis: Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis
2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI): Learning to Detect Brain Lesions from Noisy Annotations
A Survey on Deep Learning with Noisy Labels: How to train your model when you cannot trust on the annotations?

How do you think of requesting kind citations?

SIGUA: Forgetting May Make Learning with Noisy Labels More Robust
- Reason1: Reducing a learning rate on ‘bad’ examples, is intrinsically equivalent to, reducing the weights (derivative magnitudes) of ‘bad’ data points. “SIGUA works in each mini-batch: it implements SGD on good data as usual, and if there are any bad data, it implements stochastic gradientascent (SGA) on bad data with a reduced learning rate.”
  
  In [DM] and [IMAE], we have studied on how to model example-level weighting from the perspective of gradient/derivative. Concretely, we have claimed that those ‘bad’ examples are assigned with smaller derivative magnitude at the final layer. Mathematically, a point’s final gradient for back-propagation = its derivative * learning rate. You do not modify the derivative, instead you adjust the learning rate. But fundamentally, the principle is the same. Therefore, our work [DM] and [IMAE] should be discussed.
- Reason 2: Although [DM] and [IMAE] are unpublished in conferences or journals, they have been released in arXiv for more than 1 year by now. Therefore, it is improper to ignore them. Furthermore, [DM] and [IMAE] are included my PhD thesis and passed the examination. PhD Thesis: Example Weighting for Deep Representation Learning
- I am looking forward to your ideas. If I am wrong, please feel free to tell me. Otherwise, I will appreciate it significantly if you agree to discuss our work in your paper. Many thanks.

We really need to rethink robust losses and optimisation in deep learning!

In Normalized Loss Functions for Deep Learning with Noisy Labels, it is stated in the abstract that “we theoretically show by applying a simple normalization that: any loss can be made robust to noisy labels. However, in practice, simply being robust is not sufficient for a loss function to train accurate DNNs.”
- This statement is Quite Contradictory: A ROBUST LOSS IS NOT SUFFICIENT (i.e., ROBUST AND ACCURATE)? => Then what is value to say whether a loss is robust or not?
For me, a trained robust model should be accurate on both training and testing datasets.
I remark that we are the first to thoroughly analyse robust losses, e.g., MAE’s underfitting, and how it weights data points.

When talking about robustness/regularisation, our community tend to connnect it merely to better test performance. I advocate caring training performance as well because:

If noisy training examples are fitted well, a model has learned something wrong;
If clean ones are not fitted well, a model is not good enough.
There is a potential arguement that the test dataset can be infinitely large theorectically, thus being significant.
- Personal comment: Though being true theorectically, in realistic deployment, we obtain more testing samples as time goes, accordingly we generally choose to retrain or fine-tune to make the system adaptive. Therefore, this arguement does not make much sense.

Other details

IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude’s Variance Matters
- Following work: Derivative Manipulation for General Example Weighting
Derivative Manipulation for General Example Weighting
- Preliminary: IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude’s Variance Matters
Github Pages
- DerivativeManipulation
- IMAE

Citation

@article{wang2019derivative,
  title={Derivative Manipulation for General Example Weighting},
  author={Wang, Xinshao and Kodirov, Elyor and Hua, Yang and Robertson, Neil M},
  journal={arXiv preprint arXiv:1905.11233},
  year={2019}
}

@article{wang2019imae,
  title={ {IMAE} for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters},
  author={Wang, Xinshao and Hua, Yang and Kodirov, Elyor and Robertson, Neil M},
  journal={arXiv preprint arXiv:1903.12141},
  year={2019}
}

Selected work partially impacted by our work

How do you think of requesting kind citations?

We really need to rethink robust losses and optimisation in deep learning!

When talking about robustness/regularisation, our community tend to connnect it merely to better test performance. I advocate caring training performance as well because:

Other details

Templates (for web app):

Error