Robust Deep Learning via Derivative Manipulation and IMAE

For source codes, the usage is conditioned on academic use only and kindness to cite our work: Derivative Manipulation and IMAE.
As a young researcher, your interest and kind citation (star) will definitely mean a lot for me and my collaborators.
For any specific discussion or potential future collaboration, please feel free to contact me.

How do you think of requesting kind citations?

  • SIGUA: Forgetting May Make Learning with Noisy Labels More Robust

    How do you think of requesting kind citations?

    • Reason1: Reducing a learning rate on ‘bad’ examples, is intrinsically equivalent to, reducing the weights (derivative magnitudes) of ‘bad’ data points. “SIGUA works in each mini-batch: it implements SGD on good data as usual, and if there are any bad data, it implements stochastic gradientascent (SGA) on bad data with a reduced learning rate.”

    In [DM] and [IMAE], we have studied on how to model example-level weighting from the perspective of gradient/derivative. Concretely, we have claimed that those ‘bad’ examples are assigned with smaller derivative magnitude at the final layer. Mathematically, a point’s final gradient for back-propagation = its derivative * learning rate. You do not modify the derivative, instead you adjust the learning rate. But fundamentally, the principle is the same. Therefore, our work [DM] and [IMAE] should be discussed.

    • Reason 2: Although [DM] and [IMAE] are unpublished in conferences or journals, they have been released in arXiv for more than 1 year by now. Therefore, it is improper to ignore them. Furthermore, [DM] and [IMAE] are included my PhD thesis and passed the examination. PhD Thesis: Example Weighting for Deep Representation Learning

    • I am looking forward to your ideas. If I am wrong, please feel free to tell me. Otherwise, I will appreciate it significantly if you agree to discuss our work in your paper. Many thanks.

We really need to rethink robust losses and optimisation in deep learning!

  • In Normalized Loss Functions for Deep Learning with Noisy Labels, it is stated in the abstract that “we theoretically show by applying a simple normalization that: any loss can be made robust to noisy labels. However, in practice, simply being robust is not sufficient for a loss function to train accurate DNNs.
    • This statement is Quite Contradictory: A ROBUST LOSS IS NOT SUFFICIENT (i.e., ROBUST AND ACCURATE)? => Then what is value to say whether a loss is robust or not?
  • For me, a trained robust model should be accurate on both training and testing datasets.

  • I remark that we are the first to thoroughly analyse robust losses, e.g., MAE’s underfitting, and how it weights data points.

When talking about robustness/regularisation, our community tend to connnect it merely to better test performance. I advocate caring training performance as well because:

  • If noisy training examples are fitted well, a model has learned something wrong;
  • If clean ones are not fitted well, a model is not good enough.
  • There is a potential arguement that the test dataset can be infinitely large theorectically, thus being significant.
    • Personal comment: Though being true theorectically, in realistic deployment, we obtain more testing samples as time goes, accordingly we generally choose to retrain or fine-tune to make the system adaptive. Therefore, this arguement does not make much sense.

Other details

  1. IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude’s Variance Matters
  2. Derivative Manipulation for General Example Weighting
  3. Github Pages
  4. Citation
    @article{wang2019derivative,
      title={Derivative Manipulation for General Example Weighting},
      author={Wang, Xinshao and Kodirov, Elyor and Hua, Yang and Robertson, Neil M},
      journal={arXiv preprint arXiv:1905.11233},
      year={2019}
    }
    
    @article{wang2019imae,
      title={ {IMAE} for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters},
      author={Wang, Xinshao and Hua, Yang and Kodirov, Elyor and Robertson, Neil M},
      journal={arXiv preprint arXiv:1903.12141},
      year={2019}
    }
    
# #

© 2019-2020. All rights reserved.

Welcome to Xinshao Wang's Personal Website