Skip to content
GCC AI Research

AraNet: A Deep Learning Toolkit for Arabic Social Media

arXiv · · Significant research

Summary

Researchers introduce AraNet, a deep learning toolkit for Arabic social media processing. The toolkit uses BERT models trained on social media datasets to predict age, dialect, gender, emotion, irony, and sentiment. AraNet achieves state-of-the-art or competitive performance on these tasks without feature engineering. Why it matters: The public release of AraNet accelerates Arabic NLP research by providing a comprehensive, deep learning-based tool for various social media analysis tasks.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

AraBERT: Transformer-based Model for Arabic Language Understanding

arXiv ·

Researchers at the American University of Beirut (AUB) have released AraBERT, a BERT model pre-trained specifically for Arabic language understanding. The model was trained on a large Arabic corpus and compared against multilingual BERT and other state-of-the-art methods. AraBERT achieved state-of-the-art performance on several tested Arabic NLP tasks including sentiment analysis, named entity recognition, and question answering. Why it matters: This release provides the Arabic NLP community with a high-performing, open-source language model, facilitating further research and development.

AraPoemBERT: A Pretrained Language Model for Arabic Poetry Analysis

arXiv ·

The paper introduces AraPoemBERT, an Arabic language model pretrained exclusively on 2.09 million verses of Arabic poetry. AraPoemBERT was evaluated against five other Arabic language models on tasks including poet's gender classification (99.34% accuracy) and poetry sub-meter classification (97.79% accuracy). The model achieved state-of-the-art results in these and other downstream tasks, and is publicly available on Hugging Face. Why it matters: This specialized model advances Arabic NLP by providing a new state-of-the-art tool tailored for the nuances of classical Arabic poetry.

AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding

arXiv ·

The paper introduces AraELECTRA, a new Arabic language representation model. AraELECTRA is pre-trained using the replaced token detection objective on large Arabic text corpora. The model is evaluated on multiple Arabic NLP tasks, including reading comprehension, sentiment analysis, and named-entity recognition. Why it matters: AraELECTRA outperforms current state-of-the-art Arabic language representation models, given the same pretraining data and even with a smaller model size, advancing Arabic NLP.

Overview of the Arabic Sentiment Analysis 2021 Competition at KAUST

arXiv ·

KAUST organized an Arabic Sentiment Analysis Challenge where participants developed ML models to classify tweets as positive, negative, or neutral. The competition used the ASAD dataset with 55K tweets for training, 20K for validation, and 20K for final evaluation. The full dataset of 100K labeled tweets has been released for public use.