AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding
arXiv ·
The paper introduces AraELECTRA, a new Arabic language representation model. AraELECTRA is pre-trained using the replaced token detection objective on large Arabic text corpora. The model is evaluated on multiple Arabic NLP tasks, including reading comprehension, sentiment analysis, and named-entity recognition. Why it matters: AraELECTRA outperforms current state-of-the-art Arabic language representation models, given the same pretraining data and even with a smaller model size, advancing Arabic NLP.