Zero, One and Few Shot Learning

Zero-Shot Learning

Zero-shot learning is a problem setup in machine learning where, at test time, a learner observes samples from classes that were not observed during training and needs to predict the class they belong to. The general idea of zero-shot learning is to transfer the knowledge in the training instances to test instance classification. Thus, zero-shot learning is a subfield of transfer learning. Zero-shot learning has applications in image classification, natural language processing, and more. Zero-shot learning has many potential applications in domains where labeled data is scarce or expensive, such as medical imaging, natural language understanding, speech recognition, etc.

This is useful for scenarios where obtaining labeled data for every possible class is impractical or impossible, such as classifying all animal species or natural languages. One of the challenges of zero-shot learning is representing unseen classes so that the model can understand and relate to the seen classes. A common approach is to use some form of auxiliary information, such as textual descriptions, attributes, or semantic embeddings, that capture the salient features of each class.

For example, a model trained to recognize horses has never seen a zebra. If we provide the model with a textual description of what a zebra looks like (e.g., "a zebra is an animal that looks like a striped horse"), then the model can use its learned knowledge about horses and language to infer that the image below belongs to the class "zebra."

One of the recent advances in zero-shot learning is Contrastive Language-Image Pretraining (CLIP), proposed by OpenAI. CLIP learns to classify images without explicit labels using natural language captions as supervision. CLIP trains on 400 million image-text pairs scraped from the internet and learns to associate words and phrases with visual patterns.

CLIP can generalize to novel tasks and domains by providing natural language queries. For example, CLIP can classify images such as "a photo of flowers", "a painting of mountains", or "a diagram of human anatomy" without ever seeing these categories during training.

One-Shot Learning and Siamese Neural Network

One-shot learning is a machine learning-based object classification algorithm that assesses the similarity and differences between two images. It’s mainly used in computer vision. One-shot learning aims to teach the model to set its assumptions about their similarities based on, ideally, one visual every class. It is used in facial recognition, for example, face verification and face identification, where individuals should be accurately classified with various looks, lighting, accessories, and haircuts.

A Siamese network is a class of neural networks that incorporates one or more identical networks. These networks receive a pair of inputs. Each network computes the features of one input. The similarity function calculates the inputs' similarity using the networks' outputs. Siamese networks are based on a particular kind of convolutional neural networks (CNNs) called Siamese neural networks and are primarily utilized in tasks related to computer vision (SNNs). They are trained to evaluate the distance between features in two input images.

Training a Siamese neural network for one-shot learning involves verification and generalization. In the verification step of training, we use the Triplet loss function. Triplet loss is a loss function where a reference input (called the anchor) is compared to a matching input (called positive) and a non-matching input (called negative). The distance between the anchor and the positive input is minimized, while the distance between the anchor and the negative input is maximized. To achieve better results for the model training, positive, negative, and anchor images must look relatively similar to help the model learn complex cases.

The model is trained to evaluate the probability that the input pairs belong to the same class in the generalization stage. At this step, it is essential to provide the model with images that are difficult to distinguish. By increasing the complexity of the estimations, we speed up the model's training process.

Few-Shot Learning

Few-shot learning is a branch of machine learning that aims to train models with very little data. Unlike conventional machine learning methods that require large amounts of labeled data to perform well, few-shot learning methods can learn from only a handful of examples per class.

Few-shot learning methods are inspired by how humans learn new concepts from a few examples. For instance, if you see two images of armadillos and two of pangolins for the first time, you can quickly tell them apart by noticing their distinctive features, such as ears and scales. Few-shot learning methods try to mimic this ability by using meta-learning techniques.

Meta-learning means "learning to learn." It involves training a model on a large set of related tasks (meta-training) to quickly adapt to new jobs (meta-testing) with a few examples. For example, one popular meta-learning method is called Model-Agnostic Meta-Learning (MAML). MAML trains a model on classification tasks with different classes (such as animals, plants, vehicles, etc.) to learn a general representation useful for any classification task. Then, when given a new classification task with unseen classes (such as armadillos and pangolins), MAML can fine-tune the model with only a few examples per class and achieve reasonable accuracy.

Few-shot learning can enable face recognition systems to identify new faces with only one or a few images per person. It can also enable image segmentation systems to segment new objects with only a few pixel-level annotations. It can allow natural language understanding systems to perform new tasks such as sentiment analysis, question answering, or text summarization with only a few examples or instructions per task. It can enable medical diagnosis systems to recognize rare diseases or anomalies with only a few cases per condition. It can also facilitate drug discovery systems to predict new molecules with desired properties with only a few samples per molecule.

Few-shot learning is an exciting and challenging research area that aims to make machine learning more accessible and adaptable. Using meta-learning techniques and leveraging prior knowledge from related tasks, few-shot learning methods can overcome the data scarcity problem and enable models to learn from a few examples like humans.

GDSC IIT Indore

Search This Blog