Skip to main content

Zero, One and Few Shot Learning

Zero-Shot Learning


Zero-shot learning is a problem setup in machine learning where, at test time, a learner observes samples from classes that were not observed during training and needs to predict the class they belong to. The general idea of zero-shot learning is to transfer the knowledge in the training instances to test instance classification. Thus, zero-shot learning is a subfield of transfer learning. Zero-shot learning has applications in image classification, natural language processing, and more. Zero-shot learning has many potential applications in domains where labeled data is scarce or expensive, such as medical imaging, natural language understanding, speech recognition, etc.

This is useful for scenarios where obtaining labeled data for every possible class is impractical or impossible, such as classifying all animal species or natural languages. One of the challenges of zero-shot learning is representing unseen classes so that the model can understand and relate to the seen classes. A common approach is to use some form of auxiliary information, such as textual descriptions, attributes, or semantic embeddings, that capture the salient features of each class.

For example, a model trained to recognize horses has never seen a zebra. If we provide the model with a textual description of what a zebra looks like (e.g., "a zebra is an animal that looks like a striped horse"), then the model can use its learned knowledge about horses and language to infer that the image below belongs to the class "zebra."

One of the recent advances in zero-shot learning is Contrastive Language-Image Pretraining (CLIP), proposed by OpenAI. CLIP learns to classify images without explicit labels using natural language captions as supervision. CLIP trains on 400 million image-text pairs scraped from the internet and learns to associate words and phrases with visual patterns.

CLIP can generalize to novel tasks and domains by providing natural language queries. For example, CLIP can classify images such as "a photo of flowers", "a painting of mountains", or "a diagram of human anatomy" without ever seeing these categories during training.

One-Shot Learning and Siamese Neural Network

One-shot learning is a machine learning-based object classification algorithm that assesses the similarity and differences between two images. It’s mainly used in computer vision. One-shot learning aims to teach the model to set its assumptions about their similarities based on, ideally, one visual every class. It is used in facial recognition, for example, face verification and face identification, where individuals should be accurately classified with various looks, lighting, accessories, and haircuts. 

A Siamese network is a class of neural networks that incorporates one or more identical networks. These networks receive a pair of inputs. Each network computes the features of one input. The similarity function calculates the inputs' similarity using the networks' outputs. Siamese networks are based on a particular kind of convolutional neural networks (CNNs) called Siamese neural networks and are primarily utilized in tasks related to computer vision (SNNs). They are trained to evaluate the distance between features in two input images.

Training a Siamese neural network for one-shot learning involves verification and generalization. In the verification step of training, we use the Triplet loss function. Triplet loss is a loss function where a reference input (called the anchor) is compared to a matching input (called positive) and a non-matching input (called negative). The distance between the anchor and the positive input is minimized, while the distance between the anchor and the negative input is maximized. To achieve better results for the model training, positive, negative, and anchor images must look relatively similar to help the model learn complex cases.

The model is trained to evaluate the probability that the input pairs belong to the same class in the generalization stage. At this step, it is essential to provide the model with images that are difficult to distinguish. By increasing the complexity of the estimations, we speed up the model's training process.

Few-Shot Learning

Few-shot learning is a branch of machine learning that aims to train models with very little data. Unlike conventional machine learning methods that require large amounts of labeled data to perform well, few-shot learning methods can learn from only a handful of examples per class.

Few-shot learning methods are inspired by how humans learn new concepts from a few examples. For instance, if you see two images of armadillos and two of pangolins for the first time, you can quickly tell them apart by noticing their distinctive features, such as ears and scales. Few-shot learning methods try to mimic this ability by using meta-learning techniques.

Meta-learning means "learning to learn." It involves training a model on a large set of related tasks (meta-training) to quickly adapt to new jobs (meta-testing) with a few examples. For example, one popular meta-learning method is called Model-Agnostic Meta-Learning (MAML). MAML trains a model on classification tasks with different classes (such as animals, plants, vehicles, etc.) to learn a general representation useful for any classification task. Then, when given a new classification task with unseen classes (such as armadillos and pangolins), MAML can fine-tune the model with only a few examples per class and achieve reasonable accuracy.

Few-shot learning can enable face recognition systems to identify new faces with only one or a few images per person. It can also enable image segmentation systems to segment new objects with only a few pixel-level annotations. It can allow natural language understanding systems to perform new tasks such as sentiment analysis, question answering, or text summarization with only a few examples or instructions per task. It can enable medical diagnosis systems to recognize rare diseases or anomalies with only a few cases per condition. It can also facilitate drug discovery systems to predict new molecules with desired properties with only a few samples per molecule.

Few-shot learning is an exciting and challenging research area that aims to make machine learning more accessible and adaptable. Using meta-learning techniques and leveraging prior knowledge from related tasks, few-shot learning methods can overcome the data scarcity problem and enable models to learn from a few examples like humans.

Comments

Popular posts from this blog

All About Reinforcement learning

Reinforcement Learning is a type of Machine Learning where an agent learns to make decisions by interacting with an environment. It is based on the concept of trial and error learning, where the agent tries different actions and learns from the feedback it receives in the form of rewards or penalties. Reinforcement Learning is widely used in various domains such as gaming, robotics, finance, and healthcare. Reinforcement Learning Cycle The Reinforcement Learning process starts with an agent and an environment. The agent interacts with the environment by taking actions and receiving feedback in the form of rewards or penalties. The goal of the agent is to maximize its cumulative reward over a period of time. The agent uses a policy, which is a set of rules that determine the actions it takes in different situations. The policy is learned through trial and error, and it is updated based on the feedback received from the environment. The rewards and penalties in Reinforcement Learning are...

Overfitting and Underfitting

Overfitting and Underfitting are two fundamental problems due to which a machine learning model performs poorly. Any machine learning model's primary objective is to generalize effectively. Here, generalization refers to the ability of an ML model to adapt the provided set of unknown inputs to produce an acceptable output. It indicates that it can generate trustworthy and accurate output after undergoing training on the dataset. Before we move on to overfitting and underfitting, we need to be familiar with some prerequisite terms: Noise: Noise stands for unnecessary or irrelevant data, or other similar outliers, that do not follow the general trend of the overall dataset. Bias: Bias is the error rate of the training data, and occurs due to the oversimplification of machine learning algorithms when the model makes assumptions to make a function easier to learn. Variance: Variance is defined as the difference in the model's error rate with the training data and the model's...

Natural Language Processing

Natural Language Processing (or NLP) is the field of artificial intelligence that aims to enable computers to comprehend written and spoken language in a manner that is similar to that of humans. NLP blends statistical, machine learning, and deep learning models with computational linguistics—rule-based modelling of human language. With the use of these technologies, computers are now able to process human language in the form of text or audio data and fully "understand" what is being said or written, including the speaker's or writer's intentions and sentiment. It has numerous practical uses in a wide range of industries, including corporate intelligence, search engines, and medical research. NLP has two components: Natural Language Understanding (NLU): It involves converting the provided natural language input into helpful representations and examining the language's various facets. Natural Language Generation (NLG): Relatively more straightforward than NLU, i...