Skip to content
GCC AI Research

AraBERT: Transformer-based Model for Arabic Language Understanding

arXiv · · Significant research

Summary

Researchers at the American University of Beirut (AUB) have released AraBERT, a BERT model pre-trained specifically for Arabic language understanding. The model was trained on a large Arabic corpus and compared against multilingual BERT and other state-of-the-art methods. AraBERT achieved state-of-the-art performance on several tested Arabic NLP tasks including sentiment analysis, named entity recognition, and question answering. Why it matters: This release provides the Arabic NLP community with a high-performing, open-source language model, facilitating further research and development.

Keywords

AraBERT · Arabic NLP · BERT · AUB · Language Model

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding

arXiv ·

The paper introduces AraELECTRA, a new Arabic language representation model. AraELECTRA is pre-trained using the replaced token detection objective on large Arabic text corpora. The model is evaluated on multiple Arabic NLP tasks, including reading comprehension, sentiment analysis, and named-entity recognition. Why it matters: AraELECTRA outperforms current state-of-the-art Arabic language representation models, given the same pretraining data and even with a smaller model size, advancing Arabic NLP.

An Empirical Study of Pre-trained Transformers for Arabic Information Extraction

arXiv ·

This paper introduces GigaBERT, a customized bilingual BERT model pre-trained for Arabic NLP and English-to-Arabic zero-shot transfer learning. The study evaluates GigaBERT's performance on four information extraction tasks: named entity recognition, part-of-speech tagging, argument role labeling, and relation extraction. Results show that GigaBERT outperforms mBERT, XLM-RoBERTa, and AraBERT in both supervised and zero-shot transfer settings. Why it matters: GigaBERT advances Arabic NLP by providing a high-performing, publicly available model tailored for the complexities of the Arabic language and cross-lingual applications.

AraNet: A Deep Learning Toolkit for Arabic Social Media

arXiv ·

Researchers introduce AraNet, a deep learning toolkit for Arabic social media processing. The toolkit uses BERT models trained on social media datasets to predict age, dialect, gender, emotion, irony, and sentiment. AraNet achieves state-of-the-art or competitive performance on these tasks without feature engineering. Why it matters: The public release of AraNet accelerates Arabic NLP research by providing a comprehensive, deep learning-based tool for various social media analysis tasks.

Pre-trained Transformer-Based Approach for Arabic Question Answering : A Comparative Study

arXiv ·

This paper presents a comparative study of pre-trained transformer models for Arabic question answering (QA). The study evaluates the performance of AraBERTv2-base, AraBERTv0.2-large, and AraELECTRA models on four reading comprehension datasets: Arabic-SQuAD, ARCD, AQAD, and TyDiQA-GoldP. The researchers fine-tuned these models and analyzed the results to understand the performance disparities. Why it matters: This research contributes to the advancement of Arabic NLP by evaluating and comparing state-of-the-art models on important QA tasks, addressing the scarcity of resources in this domain.