ICML-2019
 means being highly related to my personal research interest.
 means being highly related to my personal research interest.
Label Noise
- 
Unsupervised Label Noise Modeling and Loss Correction  - A suitable two-component mixture model as an unsupervised generative model of sample loss values during training to allow online estimation of the probability that a sample is mislabelled.
 
- 
Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels-I am skeptical of their conclusion.- We find that the test accuracy can be quantitatively characterized in terms of the noise ratio in datasets. The test accuracy is a quadratic function of the noise ratio in the case of symmetric noise, which explains the experimental findings previously published. (I am not convinced on this!)
- DNNs tend to learn simple patterns first, then gradually memorize all samples, which justifies the widely used small-loss criteria: treating samples with small training loss as clean ones (Han et al., 2018; Jiang et al., 2018).
 
- SELFIE: Refurbishing Unclean Samples for Robust Deep Learning-Song, Hwanjun and Kim, Minseok and Lee, Jae-Gil
Robustness
Importance Weighting?
- 
What is the Effect of Importance Weighting in Deep Learning?- Across tasks, architectures and datasets, our results confirm that for standard neural networks, weighting has a significant effect early in training. However, as training progresses the effect dissipates and for most weight ratios considered (between 256:1 and 1:256) the effect of importance weighting is indistinguishable from unweighted risk minimization after sufficient training epochs.
- While L2 regularization restores some of the impact of importance weighting, this has the perplexing consequence of expressing the amount by which importance weights affect the learned model in terms of a seemingly unrelated quantity—the degree of regularization—prompting the question: how does one appropriately choose the L2 regularization given importance weights? Interestingly, dropout regularization, which is often used interchangeably with L2 regularization, does not exhibit any such interaction with importance weighting. Batch normalization also appears to interact with importance weights, although as we will discuss later, the precise mechanism remains unclear.
 
- Related Papers from other conferences- 
ICLR2019-CRITICAL LEARNING PERIODS IN DEEP NETWORKS- Our findings, described in Section 2, indicate that the early transient is critical in determining the final solution of the optimization associated with training an artificial neural network.
- To study this early phase, in Section 3, we use the Fisher Information to quantify the effective connectivity of a network during training, and introduce the notion of Information Plasticity in learning. Information Plasticity is maximal during the memorization phase, and decreases in the reorganization phase. We show that deficit sensitivity during critical periods correlates strongly with the effective connectivity.
 
- 
ICML2018-Not All Samples Are Created Equal: Deep Learning with Importance Sampling- Deep Neural Network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on “informative” examples, and reduces the variance of the stochastic gradients during training.
- Our contribution is twofold: first, we derive a tractable upper bound to the per-sample gradient norm, and second we derive an estimator of the variance reduction achieved with importance sampling, which enables us to switch it on when it will result in an actual speedup.
- Recently, researchers have shifted their focus on using importance sampling to improve and accelerate the training of neural networks (Alain et al., 2015; Loshchilov & Hutter, 2015; Schaul et al., 2015). Those works, employ either the gradient norm or the loss to compute each sample’s importance. However, the former is prohibitively expensive to compute and the latter is not a particularly good approximation of the gradient norm.
- Firstly we provide an intuitive metric to predict how useful importance sampling is going to be, thus we are able to decide when to switch on importance sampling during training. Secondly, we also provide theoretical guarantees for speedup, when variance reduction is above a threshold.
 
 
- 
ICLR2019-CRITICAL LEARNING PERIODS IN DEEP NETWORKS
