Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
arXiv ·
🇦🇪MBZUAI / Inception · UAE · Arabic LLM
Bilingual Arabic-English LLM developed jointly by MBZUAI, Inception, and Core42. Trained on a large Arabic-English corpus with a custom tokenizer optimized for Arabic morphology.
Jais-30B achieved state-of-the-art results on Arabic NLP benchmarks at release in 2023. The model uses a custom BPE tokenizer that better handles Arabic's morphological complexity compared to multilingual tokenizers. It supports both Modern Standard Arabic (MSA) and conversational instruction-following tasks.
arXiv ·
MBZUAI · · Infrastructure Partnership