CVPR2019
means being highly related to my personal research interest.
Deep Metric Learning

Divide and Conquer the Embedding Space for Metric Learning
Each learner will learn a separate distance metric using only a subspace of the original embedding space and a part of the data.
 Natural hard negatives mining: Finally, the splitting and sampling connect to hard negative mining, which is verified by them. (I appreciate this ablation study in Table 6 )
 Divide means: 1) Splitting the training data into K Clusters; 2) Splitting the embedding into K Slices.

Deep Metric Learning to Rank
 Our main contribution is a novel solution to optimizing Average Precision under the Euclidean metric, based on the probabilistic interpretation of AP as the area under precisionrecall curve, as well as distance quantization.
 We also propose a categorybased minibatch sampling strategy and a largebatch training heuristic.
 On three fewshot image retrieval datasets, FastAP consistently outperforms competing methods, which often involve complex optimization heuristics or costly model ensembles.

MultiSimilarity Loss With General Pair Weighting for Deep Metric Learning
 Objective of the proposed multisimilarity loss, which aims to collect informative pairs, and weight these pairs through their own and relative similarities.

Stochastic ClassBased Hard Example Mining for Deep Metric Learning
 Scale linearly to the number of classes.
 The methods proposed by MovshovitzAttias et al. [14] and Wen et al. [34] are related to ours in a sense that class representatives are jointly trained with the feature extractor. However, their goal is to formulate new losses using the class representatives whereas we use them for hard negative mining.
 Given an anchor instance, our algorithm first selects a few hard negative classes based on the classtosample distances and then performs a refined search in an instancelevel only from the selected classes.
A Theoretically Sound Upper Bound on the Triplet Loss for Improving the Efficiency of Deep Distance Metric Learning
Unsupervised Embedding Learning via Invariant and Spreading Instance Feature

SignalToNoise Ratio: A Robust Distance Metric for Deep Metric Learning
 We propose a robust SNR distance metric based on SignaltoNoise Ratio (SNR) for measuring the similarity of image pairs for deep metric learning. Compared with Euclidean distance metric, our SNR distance metric can further jointly reduce the intraclass distances and enlarge the interclass distances for learned features.
 SNR in signal processing is used to measure the level of a desired signal to the level of noise, and a larger SNR value means a higher signal quality. For similarity measurement in deep metric learning, a pair of learned features x and y can be given as y = x + n, where n can be treated as a noise. Then, the SNR is the ratio of the feature variance and the noise variance.
 To show the generality of our SNRbased metric, we also extend our approach to hashing retrieval learning.
 Deep Asymmetric Metric Learning via Rich Relationship Mining
 DAMLRRM relaxes the constraint on positive pairs to extend the generalization capability. We build positive pairs training pool by constructing a minimum connected tree for each category instead of considering all positive pairs within a minibatch. As a result, there will exist a direct or indirect path between any positive pair, which ensures the relevance being bridged to each other. The inspiration comes from ranking on manifold [58] that spreads the relevance to their nearby neighbors one by one.
 Idea is novel. The results on SOP are not good, only 69.7 with GoogLeNet

HybridAttention Based Decoupled Metric Learning for ZeroShot Image Retrieval
 Very complex: object attention, spatial attention, random walk graph, etc.

Deep Metric Learning Beyond Binary Supervision
 Binary supervision indicating whether a pair of images are of the same class or not.
 Using continuous labels
 Learn the degree of similarity rather than just the order.
 A triplet mining strategy adapted to metric learning with continuous labels.
 Image retrieval tasks with continuous labels in terms of human poses, room layouts and image captions.
Hardnessaware deep metric learning : data augmentation
Ensemble Deep Manifold Similarity Learning using Hard Proxies random walk algorithm, ensemble models.
ReRanking via Metric Fusion for Object Retrieval and Person ReIdentification
 Deep Embedding Learning With Discriminative Sampling Policy
 Point Cloud Oversegmentation With GraphStructured Deep Metric Learning
 Polysemous VisualSemantic Embedding for CrossModal Retrieval
 A Compact Embedding for Facial Expression Similarity
 RepMet: RepresentativeBased Metric Learning for Classification and FewShot Object Detection
 Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking
Robustness

Learning to Learn from Noisy Labeled Data This work achieves promising results with metalearning. Our result on Clothing 1M is comparable with theirs. However, their modelling via metalearning seems extremely complex in practice. https://www.reddit.com/r/MachineLearning/comments/bws5iv/r_cvpr_2019_noisetolerant_training_work_learning/
 Too many hyperparameters shown in their Algorithm 1 and implementation section 4.2.
 The strategies of iterative training together with iterative data filtering/cleaning, reusing lastround best model as mentor, etc., make it difficult to handle in practice.
 https://github.com/LiJunnan1992/MLNT/issues/1

Probabilistic Endtoend Noise Correction for Learning with Noisy Labels

Questions on “Probabilistic Endtoend Noise Correction for Learning with Noisy Labels, CVPR 2019”. Discussion and sharing are appreciated.
 Question 1: There is a softmax transformation between two label vectors and a gradient flow path between them. However, according to my understanding, this path is not necessary. The target is to learn true labels y^d, which can be initialised by observed labels directly. Therefore, the true label distributions should be the end of the graph, it does not make sense to backpropagate to another label vector version.
 Question 2: If the answer of Question1 is yes, then learning the true labels for minimising the loss should be exactly the same as ‘Joint Optimisation Framework for Learning with Noisy Labels’, i.e., Alternative Optimisation. The fact is that if we set the true labels as the network’s predictions, the loss becomes zero naturally. Therefore, gradient backpropagation is unnecessary for estimating the true labels.
 Question 3: The compatibility loss penalises distant true labels versus observed labels. I have no idea why it works when noise rate is high in the experiments? Is it meaningful to penalise distant true labels when noise rate is very high?
 Question 4: The model is trained by 3 stages:
 1) Backbone learning without noise handling (only cross entropy loss);
 2) pencil learning with 3 losses jointly (one classification loss + two regularisation terms);
 3) finetuning with only classification loss (regularisation terms are removed).
 Is anybody interested in seeing the result of each stage training? By which we can know exactly how much improvement comes from each step.

Questions on “Probabilistic Endtoend Noise Correction for Learning with Noisy Labels, CVPR 2019”. Discussion and sharing are appreciated.