After ChatGPT, Microsoft working on AI model that takes images as cues


Image Recognition

ARTICLE SOURCE

After ChatGPT, Microsoft working on AI model that takes images as cuesNew Delhi 03-March-2023Photo : IANSAs the war over artificial intelligence (AI) chatbots heat up, Microsoft has unveiled Kosmos-1, a new AI model that can also respond to visual cues or images, apart from text prompts or messages. The multimodal large language model (MLLM) can help in an array of new tasks, including image captioning, visual question answering and more. "A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. "We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs," said the team.