Meta AI Introduces OMNIVORE, That Can Operate Across Modalities

Computer Vision models work across various modalities such as images, videos and depth perceptions. These models excel humans in the tasks that they are designed for. But one of the shortcomings that they possess is their lack of flexibility. To address this issue and make models more flexible, the Meta AI research team came up with a Computer Vision model called OMNIVORE. As the name suggests it is a single vision model that can operate across modalities. The performance of OMNIVORE is said to be better than conventional modality-specific models of the same size. This model developed by Meta AI has two major benefits

It can perform cross-model generalisations. The model applies what it has learned from one modality to another modality.
It is cost-effective and saves time on the research when compared to the ones functioning on modality-specific models.

OMNIVORE can be trained easily. While using ready-made standard datasets, the functioning of the model was equal to or higher than the corresponding single model.

Functioning of an OMNIVORE

The OMNIVORE is based on transformer architecture. Although the model is compatible with any transformer model, the base was built in Swin Transformer given its standout performance in image and video analytics. The functioning of an OMNIVORE can be understood through various stages.

Images, videos and single-view 3D modalities are converted into embeddings
Embeddings are fed into the transformer model
The transformer converts images into patches, video into patio-temporal labels and 3D images into RGB patches and depth patches
With the use of a linear layer, the patcher is projected into embeddings
The same linear layer is used for RGB patcher whereas a operate one is used for depth patches

The model converts all visual modalities into a common format through embedding. Later it uses a series of spatiotemporal attention operations to build a unified representation of varied modalities. According to the research team, they were surprised that even though the model did not undergo explicit training in cross model correspondence, OMNIVORE representation generalise well across all visual modalities. These capabilities of the model evolve without cross-modal supervision, due to parameter sharing between the models.

Experimenting OMNIVORE

Series of experiments were conducted on OMNIVORE. The researchers experimented comparing it with the modality-specific model. They considered three different model sizes: models T, S and B. The pre-trained model was fine-tuned on all seven tasks, whereas image specific models were pre-trained on 1N1K. The video-specific model and single view 3D-specific model were formatted using inflation of the other two models and they were fine-tuned on K400 and SUN RGB-D respectively. According to the Meta AI research, results stated that OMNIVORE can achieve 86.0% purity on ImageNet, 84.1% in the Kinetics dataset for action recognition and 67.1% on SUN RGB-D for single-view 3D scene classification. Also, the results confirmed that the performance of the model was better than or equal when in comparison. Among all the models, Swin-B was the model that achieved SOTA on all the tasks.

When the OMNIVORE was compared to a specific model which had the same architecture and number of parameters, the results were the same. Then the OMNIVORE was trained from scratch on 1N1K, K400 and SUN datasets and the VideoSwin and DepthSwin were fine-tuned from the ImageSwin model. The researchers next experimented the model with SOTA models on image, video and 3D data classification tasks. All the results were still good with OMNIVORE outstanding other models and showcasing better performance.

It was also found that even though the model was not trained on 1K depth maps, OMNIVORE was capable of providing semantically similar correct answers by retrieving depth maps. The researchers are confident that the model can overcome and tackle several limitations in the field of AI and computer vision.

Source: indiaai.gov.in

Meta AI Introduces OMNIVORE, That Can Operate Across Modalities

Leave a Reply Cancel reply

Editors Corner

How can Artificial Intelligence tools be a blessing for recruiters?

Will Artificial Intelligence ever match human intelligence?

Artificial Intelligence: Features of peer-to-peer networking

What not to share or ask on Chatgpt?

How can Machine Learning help in detecting and eliminating poverty?

How can Artificial Intelligence help in treating Autism?

Speech Recognition and its Wonders in your corporate life

Most groundbreaking Artificial Intelligence-based gadgets to vouch for in 2023

Recommended News

Google: AI From All Perspectives

US And UK Doctors Think Pfizer Is Setting The Standard For AI And Machine Learning In Drug Discovery

An Agreement Is Signed By MEA, MeitY, And CSC To Offer E-Migration Services Via Shared Service Centers

PR Handbook For AI Startups: How To Avoid Traps And Succeed In A Crowded Field

Related Posts

An Agreement Is Signed By MEA, MeitY, And CSC To Offer E-Migration Services Via Shared Service Centers

Android Devices With Faster And More Intelligent Performance Than IPhones

Google Unveils AI Capable Of Predicting The Behavior Of Human Molecules, Accelerating The Search For New Drugs

Introduction Of Thermal Camera Technology And Product Booklet For Intelligent Transportation Systems (ITS) To Industry

Tesla Should Make Cars In India To Avail Sops: Govt

Recent Posts

Google: AI From All Perspectives

US And UK Doctors Think Pfizer Is Setting The Standard For AI And Machine Learning In Drug Discovery

An Agreement Is Signed By MEA, MeitY, And CSC To Offer E-Migration Services Via Shared Service Centers

PR Handbook For AI Startups: How To Avoid Traps And Succeed In A Crowded Field

OpenAI Creates An AI Safety Committee Following Significant Departures

Tags

Follow us

Welcome Back!

Retrieve your password

Add New Playlist

Join Our Newsletter