Alibaba Cloud makes headway with large language vision models

Thu Aug 31 2023 14:32:00 GMT+0000
Image Recognition

Alibaba Cloud, the digital technology and intelligence backbone of Alibaba Group, has launched two open-source large vision language models (LVLM), Qwen-VL and its conversationally fine-tuned Qwen-VL-Chat. For commercial uses, companies with more than 100 million monthly active users can request a licence from Alibaba Cloud. Based on various benchmarks, Qwen-VL recorded outstanding performances on several visual language tasks, including zero-shot captioning, general visual question answering, text-oriented visual question answering and object detection. Qwen-VL-Chat has also achieved leading results in both Chinese and English for text-image dialogue and alignment levels with humans, according to the benchmark test of Alibaba Cloud. Earlier this month, Alibaba Cloud open sourced its 7-billion-parameter LLMs, Qwen-7B and Qwen-7B-Chat as its ongoing contribution to the open-source community.