Image Recognition
Both Google and Microsoft have now announced research into applying similar AI models to robots, with impressive results. AdvertisementAccording to the researchers, this is the largest Visual Language Model (VLM) reported to date, with 562 billion parameters. According to the paper, the AI model when controlling robots even displays "emergent capabilities like multimodal chain of thought reasoning, and the ability to reason over multiple images, despite being trained on only single-image prompts." Google is not the only company testing out a new multimodal AI and how to incorporate large language models in robots. PaLM-E makes great progress on the important robotics problem of task planning.