Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

arXiv · September 17, 2025 · Significant research

Summary

The Hala technical report introduces a family of Arabic-centric instruction and translation models developed using a translate-and-tune pipeline. A strong Arabic-English teacher model is compressed to FP8 and used to create bilingual supervision data. The LFM2-1.2B model is fine-tuned on this data and used to translate English instruction sets into Arabic, creating a million-scale corpus. Why it matters: The release of models, data, evaluation tools, and recipes will accelerate research and development in Arabic NLP, providing valuable resources for the community.

Keywords

Arabic NLP · instruction tuning · translation models · bilingual supervision · Hala

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

ALLaM: Large Language Models for Arabic and English

arXiv · Jul 22

The paper introduces ALLaM, a series of large language models for Arabic and English, designed to support Arabic Language Technologies. The models are trained with language alignment and knowledge transfer in mind, using a decoder-only architecture. ALLaM achieves state-of-the-art results on Arabic benchmarks like MMLU Arabic and Arabic Exams. Why it matters: This work advances Arabic NLP by providing high-performing LLMs and demonstrating effective techniques for cross-lingual transfer learning and alignment with human preferences.

Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

Summary

Keywords

Related

ALLaM: Large Language Models for Arabic and English