Monojit Choudhury, formerly of Microsoft Research and Project Turing, has joined MBZUAI as a professor of natural language processing. Choudhury's work at Microsoft involved developing NLP applications and responsible AI, including manually programming LLMs to prevent toxic or biased content. He was impressed by GPT-4's capabilities and believes academia is the best place for deep research in NLP. Why it matters: Choudhury's experience at Microsoft, including his work on responsible AI and LLMs, could contribute to MBZUAI's NLP research and the development of more inclusive LLMs.
MBZUAI has been actively involved in developing AI and generative models, contributing to models like Llama 2, Jais, Vicuna, and LaMini. Professor Preslav Nakov notes Llama 2's improvements in size and carbon footprint over Llama 1. MBZUAI aims to tackle challenges like information accuracy, economic costs, and the scarcity of Arabic online content. Why it matters: MBZUAI's work helps address the limitations of current LLMs, particularly for Arabic, and promotes sustainable AI development in the region.
MBZUAI researcher Karima Kadaoui is using AI to assist disadvantaged communities and languages, with a focus on democratizing NLP tasks for Arabic dialects. Her master's thesis focused on impaired speech recognition, converting disfluencies of individuals with speech disabilities into clear speech. She emphasizes the importance of diversity and inclusion in AI to avoid bias and ensure systems reflect the user distribution. Why it matters: This highlights MBZUAI's commitment to gender equity in STEM and the development of AI solutions tailored to the nuances of the Arabic language.
MBZUAI is conducting research to improve cross-cultural understanding using AI, including studying LLM limitations in recognizing cultural references. They developed "Culturally Yours," a tool that helps users comprehend cultural references in text, and the "All Languages Matter Benchmark" (ALM Bench) to evaluate multimodal LLMs across 100 languages. MBZUAI has also developed LLMs tailored to low-resource languages like Jais (Arabic), Nanda (Hindi), and Sherkala (Kazakh). Why it matters: These initiatives promote inclusivity and ensure AI systems are culturally aware and can serve diverse populations effectively, particularly in the Middle East's multicultural context.
MBZUAI researchers have released ALM Bench, a new benchmark dataset for evaluating the performance of multimodal LLMs on cultural visual question-answer tasks across 100 languages. The dataset includes over 22,000 question-answer pairs across 19 categories, with a focus on low-resource languages and cultural nuances, including three Arabic dialects. They tested 16 open- and closed-source multimodal LLMs on it, revealing a significant need for greater cultural and linguistic inclusivity. Why it matters: The benchmark aims to improve the inclusivity of multimodal AI systems by addressing the underrepresentation of low-resource languages and cultural contexts.