Skip to content
GCC AI Research

Pre-trained Transformer-Based Approach for Arabic Question Answering : A Comparative Study

arXiv · · Notable

Summary

This paper presents a comparative study of pre-trained transformer models for Arabic question answering (QA). The study evaluates the performance of AraBERTv2-base, AraBERTv0.2-large, and AraELECTRA models on four reading comprehension datasets: Arabic-SQuAD, ARCD, AQAD, and TyDiQA-GoldP. The researchers fine-tuned these models and analyzed the results to understand the performance disparities. Why it matters: This research contributes to the advancement of Arabic NLP by evaluating and comparing state-of-the-art models on important QA tasks, addressing the scarcity of resources in this domain.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

An Empirical Study of Pre-trained Transformers for Arabic Information Extraction

arXiv ·

This paper introduces GigaBERT, a customized bilingual BERT model pre-trained for Arabic NLP and English-to-Arabic zero-shot transfer learning. The study evaluates GigaBERT's performance on four information extraction tasks: named entity recognition, part-of-speech tagging, argument role labeling, and relation extraction. Results show that GigaBERT outperforms mBERT, XLM-RoBERTa, and AraBERT in both supervised and zero-shot transfer settings. Why it matters: GigaBERT advances Arabic NLP by providing a high-performing, publicly available model tailored for the complexities of the Arabic language and cross-lingual applications.

The Inception Team at NSURL-2019 Task 8: Semantic Question Similarity in Arabic

arXiv ·

The Inception Team presented a system for Semantic Question Similarity in Arabic as part of the NSURL 2019 Task 8. The system explores different methods for determining question similarity in Arabic. Their best result was an ensemble model using a pre-trained multilingual BERT model, achieving a 95.924% F1-Score and ranking first among nine participating teams. Why it matters: This demonstrates strong performance on a key Arabic NLP task, advancing the state-of-the-art in semantic understanding for the language.

On the importance of Data Scale in Pretraining Arabic Language Models

arXiv ·

This paper studies the impact of data scale on Arabic Pretrained Language Models (PLMs). Researchers retrained BERT-base and T5-base models on large Arabic corpora, achieving state-of-the-art results on the ALUE and ORCA benchmarks. The analysis indicates that pretraining data volume is the most important factor for performance. Why it matters: This work provides valuable insights into building effective Arabic language models, emphasizing the importance of large, high-quality datasets for advancing Arabic NLP.

AraBERT: Transformer-based Model for Arabic Language Understanding

arXiv ·

Researchers at the American University of Beirut (AUB) have released AraBERT, a BERT model pre-trained specifically for Arabic language understanding. The model was trained on a large Arabic corpus and compared against multilingual BERT and other state-of-the-art methods. AraBERT achieved state-of-the-art performance on several tested Arabic NLP tasks including sentiment analysis, named entity recognition, and question answering. Why it matters: This release provides the Arabic NLP community with a high-performing, open-source language model, facilitating further research and development.