MBZUAI researchers introduce BiMediX, a bilingual (English and Arabic) mixture of experts LLM for medical applications. The model is trained on BiMed1.3M, a new 1.3 million bilingual instruction dataset and outperforms existing models like Med42 and Jais-30B on medical benchmarks. Code and models are available on Github.
This paper explores multilingual satire detection methods in English and Arabic using zero-shot and chain-of-thought (CoT) prompting. It compares the performance of Jais-chat(13B) and LLaMA-2-chat(7B) on distinguishing satire from truthful news. Results show that CoT prompting significantly improves Jais-chat's performance, achieving an F1-score of 80% in English. Why it matters: This demonstrates the potential of Arabic LLMs like Jais to handle nuanced language tasks such as satire detection, which is critical for combating misinformation in the region.
A talk will present two projects related to the use of NLP for estimating a client’s depression severity and well-being. The first project examines emotional coherence between the subjective experience of emotions and emotion expression in therapy using transformer-based emotion recognition models. The second project proposes a semantic pipeline to study depression severity in individuals based on their social media posts by exploring different aggregation methods to answer one of four Beck Depression Inventory (BDI) options per symptom. Why it matters: This research explores how NLP techniques can be applied to mental health assessment, potentially offering new tools for diagnosis and treatment monitoring.
The Hala technical report introduces a family of Arabic-centric instruction and translation models developed using a translate-and-tune pipeline. A strong Arabic-English teacher model is compressed to FP8 and used to create bilingual supervision data. The LFM2-1.2B model is fine-tuned on this data and used to translate English instruction sets into Arabic, creating a million-scale corpus. Why it matters: The release of models, data, evaluation tools, and recipes will accelerate research and development in Arabic NLP, providing valuable resources for the community.
The paper introduces FanarGuard, a bilingual moderation filter for Arabic and English language models that considers both safety and cultural alignment. A dataset of 468K prompt-response pairs was created and scored by LLM judges on harmlessness and cultural awareness to train the filter. The first benchmark targeting Arabic cultural contexts was developed to evaluate cultural alignment. Why it matters: FanarGuard advances context-sensitive AI safeguards by integrating cultural awareness into content moderation, addressing a critical gap in current alignment techniques.
Researchers at MBZUAI have developed a new automatic method to examine cross-lingual abilities in multilingual language models, testing 10 models across 16 languages. They combined beam search with language-model-based simulation, generating 6,000 bilingual question pairs and found significant performance drops compared to English, even in high-resource languages like Chinese. The method introduces perturbations to test the models' ability to transfer knowledge rather than rely on memorization. Why it matters: This research highlights critical gaps in cross-lingual AI, providing a framework for developing more equitable and effective multilingual models, especially for Arabic and other under-represented languages.
MBZUAI researchers have expanded LLM safety research to Chinese, presenting their work at the 62nd Annual Meeting of the Association for Computational Linguistics in Bangkok. They developed an open-source Chinese dataset of 3,000 prompts translated and localized from the English "Do-Not-Answer" dataset. The dataset includes a "region-specific sensitivity" category to address unique safety risks for Chinese speakers, evaluating if models are over-sensitive in identifying innocuous questions as harmful. Why it matters: This research addresses a critical gap in LLM safety evaluation, ensuring that language models are both safe and effective for diverse linguistic and cultural contexts, particularly in regions with unique sensitivities.