AlcLaM: Arabic Dialectal Language Model

arXiv · July 18, 2024 · Significant research

Summary

The paper introduces AlcLaM, an Arabic dialectal language model trained on 3.4M sentences from social media. AlcLaM expands the vocabulary and retrains a BERT-based model, using only 13GB of dialectal text. Despite the smaller training data, AlcLaM outperforms models like CAMeL, MARBERT, and ArBERT on various Arabic NLP tasks. Why it matters: AlcLaM offers a more efficient and accurate approach to Arabic NLP by focusing on dialectal Arabic, which is often underrepresented in existing models.

Keywords

Arabic NLP · dialectal Arabic · language model · BERT · social media

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

ALLaM: Large Language Models for Arabic and English

arXiv · Jul 22

The paper introduces ALLaM, a series of large language models for Arabic and English, designed to support Arabic Language Technologies. The models are trained with language alignment and knowledge transfer in mind, using a decoder-only architecture. ALLaM achieves state-of-the-art results on Arabic benchmarks like MMLU Arabic and Arabic Exams. Why it matters: This work advances Arabic NLP by providing high-performing LLMs and demonstrating effective techniques for cross-lingual transfer learning and alignment with human preferences.

AlcLaM: Arabic Dialectal Language Model

Summary

Keywords

Related

ALLaM: Large Language Models for Arabic and English