Meta’s Open Source AI Training Dataset Features 26,000 Videos

Wed Mar 15 2023 14:31:00 GMT+0000
Image Recognition

Meta has announced a new, open-source dataset for AI training that the company is hoping will reduce the kind of demographic bias that has been documented by researchers at the National Institute of Standards and Technology (NIST) and elsewhere. The “Casual Conversations v2” dataset comprises over 26,000 video monologues depicting individuals from a number of countries: Brazil, India, Indonesia, Mexico, Vietnam, Philippines, and the United States. In the videos, the participants describe certain of their own demographic attributes – things like race, gender, and age – which can help AI systems to properly tag and interpret demographic data. The dataset’s inclusion of vocal sample could also help to alleviate the related but much less discussed issue of demographic bias in voice recognition. The Casual Conversations v2 dataset is available through the Meta AI website.