:+1: means being highly related to my personal research interest.

Deep Metric Learning

  • Divide and Conquer the Embedding Space for Metric Learning :+1:
    • Each learner will learn a separate distance metric using only a subspace of the original embedding space and a part of the data.

    • Natural hard negatives mining: Finally, the splitting and sampling connect to hard negative mining, which is verified by them. (I appreciate this ablation study in Table 6 )
    • Divide means: 1) Splitting the training data into K Clusters; 2) Splitting the embedding into K Slices.
  • Deep Metric Learning to Rank :+1:
    • Our main contribution is a novel solution to optimizing Average Precision under the Euclidean metric, based on the probabilistic interpretation of AP as the area under precision-recall curve, as well as distance quantization.
    • We also propose a category-based minibatch sampling strategy and a large-batch training heuristic.
    • On three few-shot image retrieval datasets, FastAP consistently outperforms competing methods, which often involve complex optimization heuristics or costly model ensembles.
  • Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning :+1:
    • Objective of the proposed multi-similarity loss, which aims to collect informative pairs, and weight these pairs through their own and relative similarities.
  • Ranked List Loss for Deep Metric Learning :+1:

  • Stochastic Class-Based Hard Example Mining for Deep Metric Learning :+1:

    • Scale linearly to the number of classes.
    • The methods proposed by Movshovitz-Attias et al. [14] and Wen et al. [34] are related to ours in a sense that class representatives are jointly trained with the feature extractor. However, their goal is to formulate new losses using the class representatives whereas we use them for hard negative mining.
    • Given an anchor instance, our algorithm first selects a few hard negative classes based on the class-to-sample distances and then performs a refined search in an instance-level only from the selected classes.
  • A Theoretically Sound Upper Bound on the Triplet Loss for Improving the Efficiency of Deep Distance Metric Learning :+1:

  • Unsupervised Embedding Learning via Invariant and Spreading Instance Feature :+1:

  • Signal-To-Noise Ratio: A Robust Distance Metric for Deep Metric Learning :+1:

    • We propose a robust SNR distance metric based on Signal-to-Noise Ratio (SNR) for measuring the similarity of image pairs for deep metric learning. Compared with Euclidean distance metric, our SNR distance metric can further jointly reduce the intra-class distances and enlarge the inter-class distances for learned features.
    • SNR in signal processing is used to measure the level of a desired signal to the level of noise, and a larger SNR value means a higher signal quality. For similarity measurement in deep metric learning, a pair of learned features x and y can be given as y = x + n, where n can be treated as a noise. Then, the SNR is the ratio of the feature variance and the noise variance.
    • To show the generality of our SNR-based metric, we also extend our approach to hashing retrieval learning.
  • Spectral Metric for Dataset Complexity Assessment :+1:

  • Deep Asymmetric Metric Learning via Rich Relationship Mining :+1:
    • DAMLRRM relaxes the constraint on positive pairs to extend the generalization capability. We build positive pairs training pool by constructing a minimum connected tree for each category instead of considering all positive pairs within a mini-batch. As a result, there will exist a direct or indirect path between any positive pair, which ensures the relevance being bridged to each other. The inspiration comes from ranking on manifold [58] that spreads the relevance to their nearby neighbors one by one.
    • Idea is novel. The results on SOP are not good, only 69.7 with GoogLeNet
  • Hybrid-Attention Based Decoupled Metric Learning for Zero-Shot Image Retrieval :-1:
    • Very complex: object attention, spatial attention, random walk graph, etc.
  • Deep Metric Learning Beyond Binary Supervision :-1:
    • Binary supervision indicating whether a pair of images are of the same class or not.
    • Using continuous labels
    • Learn the degree of similarity rather than just the order.
    • A triplet mining strategy adapted to metric learning with continuous labels.
    • Image retrieval tasks with continuous labels in terms of human poses, room layouts and image captions.
  • Hardness-aware deep metric learning :-1: : data augmentation

  • Ensemble Deep Manifold Similarity Learning using Hard Proxies :-1: random walk algorithm, ensemble models.

  • Re-Ranking via Metric Fusion for Object Retrieval and Person Re-Identification :-1:

  • Deep Embedding Learning With Discriminative Sampling Policy :-1:
  • Point Cloud Oversegmentation With Graph-Structured Deep Metric Learning :-1:
  • Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval :-1:
  • A Compact Embedding for Facial Expression Similarity :-1:
  • RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection :-1:
  • Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking :-1:


  • Learning to Learn from Noisy Labeled Data :+1: This work achieves promising results with meta-learning. Our result on Clothing 1M is comparable with theirs. However, their modelling via meta-learning seems extremely complex in practice.
    • Too many hyper-parameters shown in their Algorithm 1 and implementation section 4.2.
    • The strategies of iterative training together with iterative data filtering/cleaning, reusing last-round best model as mentor, etc., make it difficult to handle in practice.
  • Probabilistic End-to-end Noise Correction for Learning with Noisy Labels
    • Questions on “Probabilistic End-to-end Noise Correction for Learning with Noisy Labels, CVPR 2019”. Discussion and sharing are appreciated.
      • Question 1: There is a softmax transformation between two label vectors and a gradient flow path between them. However, according to my understanding, this path is not necessary. The target is to learn true labels y^d, which can be initialised by observed labels directly. Therefore, the true label distributions should be the end of the graph, it does not make sense to back-propagate to another label vector version.
      • Question 2: If the answer of Question1 is yes, then learning the true labels for minimising the loss should be exactly the same as ‘Joint Optimisation Framework for Learning with Noisy Labels’, i.e., Alternative Optimisation. The fact is that if we set the true labels as the network’s predictions, the loss becomes zero naturally. Therefore, gradient back-propagation is unnecessary for estimating the true labels.
      • Question 3: The compatibility loss penalises distant true labels versus observed labels. I have no idea why it works when noise rate is high in the experiments? Is it meaningful to penalise distant true labels when noise rate is very high?
      • Question 4: The model is trained by 3 stages:
        • 1) Backbone learning without noise handling (only cross entropy loss);
        • 2) pencil learning with 3 losses jointly (one classification loss + two regularisation terms);
        • 3) fine-tuning with only classification loss (regularisation terms are removed).
      • Is anybody interested in seeing the result of each stage training? By which we can know exactly how much improvement comes from each step.
# #

© 2019-2020. All rights reserved.

Welcome to Xinshao Wang's Personal Website