Skip to content
GCC AI Research

Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation

arXiv · · Significant research

Summary

This paper introduces AraDhati+, a new comprehensive dataset for Arabic subjectivity analysis created by combining existing datasets like ASTD, LABR, HARD, and SANAD. The researchers fine-tuned Arabic language models including XLM-RoBERTa, AraBERT, and ArabianGPT on AraDhati+ for subjectivity classification. An ensemble decision approach achieved 97.79% accuracy. Why it matters: The work addresses the under-resourced nature of Arabic NLP by providing a new dataset and demonstrating strong results in subjectivity classification, advancing sentiment analysis capabilities for the Arabic language.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Large Language Models and Arabic Content: A Review

arXiv ·

This study reviews the use of large language models (LLMs) for Arabic language processing, focusing on pre-trained models and their applications. It highlights the challenges in Arabic NLP due to the language's complexity and the relative scarcity of resources. The review also discusses how techniques like fine-tuning and prompt engineering enhance model performance on Arabic benchmarks. Why it matters: This overview helps consolidate research directions and benchmarks in Arabic NLP, guiding future development of LLMs tailored for the Arabic language and its diverse dialects.