Machine Learning, Zero-Shot Learning, Computer Vision, Pattern Recognition.
Zero-shot learning (ZSL) aims to recognize unseen objects (test classes) given some other seen objects (training classes) by sharing information of attributes between different objects. Attributes are artificially annotated for objects and treated equally in recent ZSL tasks. However, some inferior attributes with poor predictability or poor discriminability may have negative impacts on the ZSL system performance. This letter first derives a generalization error bound for ZSL tasks. Our theoretical analysis verifies that selecting the subset of key attributes can improve the generalization performance of the original ZSL model, which uses all the attributes. Unfortunately, previous attribute selection methods have been conducted based on the seen data, and their selected attributes have poor generalization capability to the unseen data, which is unavailable in the training stage of ZSL tasks. Inspired by learning from pseudo-relevance feedback, this letter introduces out-of-the-box data-pseudo-data generated by an attribute-guided generative model-to mimic the unseen data. We then present an iterative attribute selection (IAS) strategy that iteratively selects key attributes based on the out-of-the-box data. Since the distribution of the generated out-of-the-box data is similar to that of the test data, the key attributes selected by IAS can be effectively generalized to test data. Extensive experiments demonstrate that IAS can significantly improve existing attribute-based ZSL methods and achieve state-of-the-art performance.
Zero-shot learning (ZSL) aims to recognize unseen objects using disjoint seen objects via sharing attributes. The generalization performance of ZSL is governed by the attributes, which transfer semantic information from seen classes to unseen classes. To take full advantage of the knowledge transferred by attributes, in this paper, we introduce the notion of the complementary attributes (CAs), as a supplement to the original attributes, to enhance the semantic representation ability. Theoretical analyses demonstrate that CAs can improve the PAC-style generalization bound of the original ZSL model. Since the proposed CA focuses on enhancing the semantic representation, CA can be easily applied to any existing attribute-based ZSL methods, including the label-embedding strategy-based ZSL (LEZSL) and the probability-prediction strategy-based ZSL (PPZSL). In PPZSL, there is a strong assumption that all attributes are independent of each other, which is arguably unrealistic in practice. To solve this problem, a novel rank aggregation (RA) framework is proposed to circumvent the assumption. Extensive experiments on five ZSL benchmark datasets and the large-scale ImageNet dataset demonstrate that the proposed CA and RA can significantly and robustly improve the existing ZSL methods and achieve state-of-the-art performance.
© 2019 International Joint Conferences on Artificial Intelligence. All rights reserved. As a kind of semantic representation of visual object descriptions, attributes are widely used in various computer vision tasks. In most of existing attribute-based research, class-specific attributes (CSA), which are class-level annotations, are usually adopted due to its low annotation cost for each class instead of each individual image. However, class-specific attributes are usually noisy because of annotation errors and diversity of individual images. Therefore, it is desirable to obtain image-specific attributes (ISA), which are image-level annotations, from the original class-specific attributes. In this paper, we propose to learn image-specific attributes by graph-based attribute propagation. Considering the intrinsic property of hyperbolic geometry that its distance expands exponentially, hyperbolic neighborhood graph (HNG) is constructed to characterize the relationship between samples. Based on HNG, we define neighborhood consistency for each sample to identify inconsistent samples. Subsequently, inconsistent samples are refined based on their neighbors in HNG. Extensive experiments on five benchmark datasets demonstrate the significant superiority of the learned image-specific attributes over the original class-specific attributes in the zero-shot object classification task.