:+1: means being highly related to my personal research interest.

Label Noise


Importance Weighting?

  • What is the Effect of Importance Weighting in Deep Learning?
    • Across tasks, architectures and datasets, our results confirm that for standard neural networks, weighting has a significant effect early in training. However, as training progresses the effect dissipates and for most weight ratios considered (between 256:1 and 1:256) the effect of importance weighting is indistinguishable from unweighted risk minimization after sufficient training epochs.
    • While L2 regularization restores some of the impact of importance weighting, this has the perplexing consequence of expressing the amount by which importance weights affect the learned model in terms of a seemingly unrelated quantity—the degree of regularization—prompting the question: how does one appropriately choose the L2 regularization given importance weights? Interestingly, dropout regularization, which is often used interchangeably with L2 regularization, does not exhibit any such interaction with importance weighting. Batch normalization also appears to interact with importance weights, although as we will discuss later, the precise mechanism remains unclear.
  • Related Papers from other conferences
      • Our findings, described in Section 2, indicate that the early transient is critical in determining the final solution of the optimization associated with training an artificial neural network.
      • To study this early phase, in Section 3, we use the Fisher Information to quantify the effective connectivity of a network during training, and introduce the notion of Information Plasticity in learning. Information Plasticity is maximal during the memorization phase, and decreases in the reorganization phase. We show that deficit sensitivity during critical periods correlates strongly with the effective connectivity.
    • ICML2018-Not All Samples Are Created Equal: Deep Learning with Importance Sampling
      • Deep Neural Network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on “informative” examples, and reduces the variance of the stochastic gradients during training.
      • Our contribution is twofold: first, we derive a tractable upper bound to the per-sample gradient norm, and second we derive an estimator of the variance reduction achieved with importance sampling, which enables us to switch it on when it will result in an actual speedup.
      • Recently, researchers have shifted their focus on using importance sampling to improve and accelerate the training of neural networks (Alain et al., 2015; Loshchilov & Hutter, 2015; Schaul et al., 2015). Those works, employ either the gradient norm or the loss to compute each sample’s importance. However, the former is prohibitively expensive to compute and the latter is not a particularly good approximation of the gradient norm.
      • Firstly we provide an intuitive metric to predict how useful importance sampling is going to be, thus we are able to decide when to switch on importance sampling during training. Secondly, we also provide theoretical guarantees for speedup, when variance reduction is above a threshold.
# #

© 2019-2020. All rights reserved.

Welcome to Xinshao Wang's Personal Website