Paper Summary on Distance Metric, Representation Learning
in Blogs
means being highly related to my personal research interest.
 arXiv 2020On the Fairness of Deep Metric Learning
 ICCV 2019, CVPR 2020 Deep Metric Learning
 CVPR 2019 Deep Metric Learning
 Fewshot Learning
 Large Output Spaces
 Poincaré, Hyperbolic, Curvilinear
 Wasserstein
 Semisupervised or Unsupervised Learning
 NeurIPS 2019Stochastic Shared Embeddings: Datadriven Regularization of Embedding Layers
arXiv 2020

Revisiting Training Strategies and Generalization Performance in Deep Metric LearningKarsten Roth et al
 Deep Metric Learning (DML) is arguably one of the most influential lines of research for learning visual similarities with many proposed approaches every year. Although the field benefits from the rapid progress, the divergence in training protocols, architectures, and parameter choices make an unbiased comparison difficult. To provide a consistent reference point, we revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices as well as the commonly neglected minibatch sampling process. Based on our analysis, we uncover a correlation between the embedding space compression and the generalization performance of DML models. Exploiting these insights, we propose a simple, yet effective, training regularization to reliably boost the performance of rankingbased DML models on various standard benchmark datasets.
 We propose a simple technique to regularize the embedding space compression which we find to boost generalization performance of rankingbased DML approaches.

Unbiased Evaluation of Deep Metric Learning Algorithms–Istvan Feh ´ erv ´ ari etal 2019
 we perform an unbiased comparison of the most popular DML baseline methods under same conditions and more importantly, not obfuscating any hyper parameter tuning or adjustment needed to favor a particular method. We find, that under equal conditions several older methods perform significantly better than previously believed.
 In this work, it stated “On the SOP dataset, we never managed to make this algorithm converge.” using Ranked List Loss.
 This is not the fact: I thank their interest in our work, which is a great motivation for me and my collaborators. I appreciate their report on the difficulty of applying our method.
 Please see Ranked List Loss for its improved results, and Github page for reproducible results.
 A Metric Learning Reality Check–Kevin Musgrave, Serge Belongie, SerNam Lim
ICCV 2019, CVPR 2020 Deep Metric Learning

Mic: Mining interclass characteristics for improved metric learningKarsten Roth∗ , Biagio Brattoli⋆ , Bjorn Ommer
 The common approach to metric learning is to enforce a representation that is invariant under all factors but the ones of interest. (Very Common Practice)
 In contrast, we propose to explicitly learn the latent characteristics that are shared by and go across object classes. We can then directly explain away structured visual variability, rather than assuming it to be unknown random noise. (Being Contrastive is Interesting! => Regularisation Technique?)
 We propose a novel surrogate task to learn visual characteristics shared across classes with a separate encoder. This encoder is trained jointly with the encoder for class information by reducing their mutual information.
 ResNet50 + PyTorch
 Complex methods for me: The number of clusters is set before training to a fixed, problemspecific value: 30 for CUB2002011 [37], 200 for CARS196 [19], 50 for Stanford Online Products [28], 150 for InShop Clothes [43] and 50 for PKU VehicleID [21]. We update the cluster labels every other epoch.
 For all experiments, we use the original images without bounding boxes.

CrossBatch Memory for Embedding LearningXun Wang∗ , Haozhi Zhang∗ , Weilin Huang†, Matthew R. Scott
 We propose a crossbatch memory (XBM) mechanism that memorizes the embeddings of past iterations, allowing the model to collect sufficient hard negative pairs across multiple minibatches  even over the whole dataset.
 GoogLeNet V1, V2 and ResNet50

Multiple Centers or Adaptive Number of Centers => Softmax Loss
Analogous to ProxyNCA or ProxyTriplet
Considering that images in CUB2011 and Cars196 are similar to those in ImageNet, we freeze BN on these two data sets and keep BN training on the rest one. Embeddings of examples and centers have the unit length in the experiments.
Backbone: GoogLeNet V2 (Inception with BN)
During training, only random horizontal mirroring and random crop are used as the data augmentation. A single center crop is taken for test.
CUB2011: We note that different works report the results with different dimension of embeddings while the size of embeddings has a significant impact on the performance. For fair comparison, we report the results for the dimension of 64, which is adopted by many existing methods and the results with 512 feature embeddings, which reports the stateoftheart results on most of data sets.
Prior Work: ProxyNCA

Circle Loss: A Unified Perspective of Pair Similarity Optimization
 Motivation: aiming to maximize the withinclass similarity \(s_p\) and minimize the betweenclass similarity \(s_n\). We find a majority of loss functions, including the triplet loss and the softmax plus crossentropy loss, embed \(s_n\) and \(s_p\) into similarity pairs and seek to reduce \((s_n − s_p)\). Such an optimization manner is inflexible, because the penalty strength on every single similarity score is restricted to be equal.
 Our intuition is that if a similarity score deviates far from the optimum, it should be emphasized.
 we simply reweight each similarity to highlight the lessoptimized similarity scores. It results in a Circle loss, which is named due to its circular decision boundary.
 Circle loss offers a more flexible optimization approach towards a more definite convergence target, compared with the loss functions optimizing \((s_n − s_p)\).
 (1) a unified loss function; (2) flexible optimization; (3) definite convergence status.
 Evaluation:
 Tasks:
 Face recognition
 Person reidentification (Market1501,MSMT17)
 Finegrained image retrieval (CUB1002011, CARS196, SOP11318)
Net architecture1: ResNet50 (globla) + MGN (local features) for person reid (. Our implementation concatenates all the part features into a single feature vector for simplici);
 Net architecture2: GoogLeNet (BNInception) for CUB, CARS, SOP, 512D embeddings;
 Tasks:
 The performance is not better than Ranked List Loss on SOP.

Sampling Wisely: Deep Image Embedding by Topk Precision Optimization
 This work is partially inspired by our work: Ranked List Loss, CVPR 2019
 In contrast, in this paper, we propose a novel deep image embedding algorithm with endtoend optimization to topk precision, the evaluation metric that is closely related to user experience.
 Specially, our loss function is constructed with Wisely Sampled “misplaced” images along the topk nearest neighbor decision boundary, so that the gradient descent update directly promotes the concerned metric, topk precision.
 Our theoretical analysis on the upper bounding and consistency properties of the proposed loss supports that minimizing our proposed loss is equivalent to maximizing topk precision
 Evaluation:
 Datasets: CUB2002011, CARS196, SOP
 PyTorch + Adam
 Net architecture: Densenet 201, GoogLeNet V2 (Inception with BN)
 Finetuning
 Embedding size: 64, 512?
 Input size: warp (256x256) => crop (227x227)
 Testing: only center crop
 The performance is not better than Ranked List Loss
CVPR 2019 Deep Metric Learning

Divide and Conquer the Embedding Space for Metric Learning
 ResNet50
Each learner will learn a separate distance metric using only a subspace of the original embedding space and a part of the data.
 Natural hard negatives mining: Finally, the splitting and sampling connect to hard negative mining, which is verified by them. (I appreciate this ablation study in Table 6 )
 Divide means: (1) Splitting the training data into K Clusters; (2) Splitting the embedding into K Slices.

Deep Metric Learning to Rank=FastAP
 ResNet18 & ResNet50
 Our main contribution is a novel solution to optimizing Average Precision under the Euclidean metric, based on the probabilistic interpretation of AP as the area under precisionrecall curve, as well as distance quantization.
 We also propose a categorybased minibatch sampling strategy and a largebatch training heuristic.
 On three fewshot image retrieval datasets, FastAP consistently outperforms competing methods, which often involve complex optimization heuristics or costly model ensembles.

MultiSimilarity Loss With General Pair Weighting for Deep Metric Learning
 Objective of the proposed multisimilarity loss, which aims to collect informative pairs, and weight these pairs through their own and relative similarities.
 GoogLeNet V2 (Inception BN)

Ranked List Loss for Deep Metric Learning
 GoogLeNet V2 (Inception BN)

Stochastic ClassBased Hard Example Mining for Deep Metric Learning
 Inception V1
 Scale linearly to the number of classes.
 The methods proposed by MovshovitzAttias et al. [14] and Wen et al. [34] are related to ours in a sense that class representatives are jointly trained with the feature extractor. However, their goal is to formulate new losses using the class representatives whereas we use them for hard negative mining.
 Given an anchor instance, our algorithm first selects a few hard negative classes based on the classtosample distances and then performs a refined search in an instancelevel only from the selected classes.
A Theoretically Sound Upper Bound on the Triplet Loss for Improving the Efficiency of Deep Distance Metric Learning
Unsupervised Embedding Learning via Invariant and Spreading Instance Feature

SignalToNoise Ratio: A Robust Distance Metric for Deep Metric Learning
 We propose a robust SNR distance metric based on SignaltoNoise Ratio (SNR) for measuring the similarity of image pairs for deep metric learning. Compared with Euclidean distance metric, our SNR distance metric can further jointly reduce the intraclass distances and enlarge the interclass distances for learned features.
 SNR in signal processing is used to measure the level of a desired signal to the level of noise, and a larger SNR value means a higher signal quality. For similarity measurement in deep metric learning, a pair of learned features x and y can be given as y = x + n, where n can be treated as a noise. Then, the SNR is the ratio of the feature variance and the noise variance.
 To show the generality of our SNRbased metric, we also extend our approach to hashing retrieval learning.
 Deep Asymmetric Metric Learning via Rich Relationship Mining
 DAMLRRM relaxes the constraint on positive pairs to extend the generalization capability. We build positive pairs training pool by constructing a minimum connected tree for each category instead of considering all positive pairs within a minibatch. As a result, there will exist a direct or indirect path between any positive pair, which ensures the relevance being bridged to each other. The inspiration comes from ranking on manifold [58] that spreads the relevance to their nearby neighbors one by one.
 Idea is novel. The results on SOP are not good, only 69.7 with GoogLeNet

HybridAttention Based Decoupled Metric Learning for ZeroShot Image Retrieval
 Very complex: object attention, spatial attention, random walk graph, etc.

Deep Metric Learning Beyond Binary Supervision
 Binary supervision indicating whether a pair of images are of the same class or not.
 Using continuous labels
 Learn the degree of similarity rather than just the order.
 A triplet mining strategy adapted to metric learning with continuous labels.
 Image retrieval tasks with continuous labels in terms of human poses, room layouts and image captions.
Hardnessaware deep metric learning : data augmentation
Ensemble Deep Manifold Similarity Learning using Hard Proxies random walk algorithm, ensemble models.
ReRanking via Metric Fusion for Object Retrieval and Person ReIdentification
 Deep Embedding Learning With Discriminative Sampling Policy
 Point Cloud Oversegmentation With GraphStructured Deep Metric Learning
 Polysemous VisualSemantic Embedding for CrossModal Retrieval
 A Compact Embedding for Facial Expression Similarity
 RepMet: RepresentativeBased Metric Learning for Classification and FewShot Object Detection
 Eliminating Exposure Bias and Metric Mismatch in Multiple Object Tracking
Fewshot Learning
 ICLR 2018MetaLearning for SemiSupervised FewShot Classification
 NeurIPS 2019Unsupervised Meta Learning for FewShow Image Classification
 NeurIPS 2019Learning to SelfTrain for SemiSupervised FewShot Classification
 NeurIPS 2019Adaptive CrossModal Fewshot Learning
 NeurIPS 2019Cross Attention Network for Fewshot Classification
 NeurIPS 2019Incremental FewShot Learning with Attention Attractor Networks
 ICML 2019LGMNet: Learning to Generate Matching Networks for FewShot Learning
Large Output Spaces
 NeurIPS 2019Breaking the Glass Ceiling for EmbeddingBased Classifiers for Large Output Spaces
 AISTATS 2019Stochastic Negative Mining for Learning with Large Output Spaces
Poincaré, Hyperbolic, Curvilinear
 NeurIPS 2019Multirelational Poincaré Graph Embeddings
 NeurIPS 2019Numerically Accurate Hyperbolic Embeddings Using TilingBased Models
 NeurIPS 2019Curvilinear Distance Metric Learning
Wasserstein
 NeurIPS 2019Generalized Sliced Wasserstein Distances
 NeurIPS 2019TreeSliced Variants of Wasserstein Distances
 NeurIPS 2019Sliced GromovWasserstein
 NeurIPS 2019Wasserstein Dependency Measure for Representation Learning
Semisupervised or Unsupervised Learning
 CVPR 2019Label Propagation for Deep Semisupervised Learning
 NeurIPS 2017Mean teachers are better role models: Weightaveraged consistency targets improve semisupervised deep learning results
 ICLR 2019Unsupervised Learning via MetaLearning
NeurIPS 2019Stochastic Shared Embeddings: Datadriven Regularization of Embedding Layers
NOTE: In deep neural nets, lower level embedding layers account for a large portion of the total number of parameters.Tikhonov regularization, graphbased regularization, and hard parameter sharing are approaches that introduce explicit biases into training in a hope to reduce statistical complexity. Alternatively, we propose stochastically shared embeddings (SSE), a datadriven approach to regularizing embedding layers, which stochastically transitions between embeddings during stochastic gradient descent (SGD). Because SSE integrates seamlessly with existing SGD algorithms, it can be used with only minor modifications when training large scale neural networks. We develop two versions of SSE: SSEGraph using knowledge graphs of embeddings; SSESE using no prior information. We provide theoretical guarantees for our method and show its empirical effectiveness on 6 distinct tasks, from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages. We find that when used along with widelyused regularization methods such as weight decay and dropout, our proposed SSE can further reduce overfitting, which often leads to more favorable generalization results.
We conducted experiments for a total of 6 tasks from simple neural networks with one hidden layer in recommender systems, to the transformer and BERT in natural languages.