Skip to content
GCC AI Research

Search

Results for "ADAB"

ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics

arXiv ·

The paper introduces ADAB (Arabic Politeness Dataset), a new annotated Arabic dataset for politeness detection collected from online platforms. The dataset covers Modern Standard Arabic and multiple dialects (Gulf, Egyptian, Levantine, and Maghrebi). It contains 10,000 samples across 16 politeness categories and achieves substantial inter-annotator agreement (kappa = 0.703). Why it matters: This dataset addresses the under-explored area of Arabic-language resources for politeness detection, which is crucial for culturally-aware NLP systems.

ASAD: A Twitter-based Benchmark Arabic Sentiment Analysis Dataset

arXiv ·

Researchers introduce ASAD, a new large-scale, high-quality Arabic Sentiment Analysis Dataset based on 95K tweets with positive, negative, and neutral labels. The dataset is launched with a competition sponsored by KAUST offering a total of 17000 USD in prizes. Baseline models are implemented and results reported to provide a reference for competition participants.

Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs

arXiv ·

MBZUAI researchers release 'Fann or Flop', a new benchmark for evaluating Arabic poetry understanding in LLMs. The benchmark covers 12 historical eras and 14 poetic genres, assessing semantic understanding, metaphor interpretation, and cultural context. Evaluation of state-of-the-art LLMs reveals challenges in poetic understanding despite strong performance on standard Arabic benchmarks.

ADEO delegation visits MBZUAI

MBZUAI ·

A delegation from the Abu Dhabi Executive Office (ADEO) Education Affairs Department visited MBZUAI on December 15, 2021. Ian Mathews, VP of Corporate Services, presented MBZUAI's progress and 2022 initiatives. Discussions covered the importance of collaboration and recruitment enhancements with ADEO's support. Why it matters: This visit highlights the ongoing relationship between MBZUAI and key Abu Dhabi government entities, signaling continued support for the university's AI initiatives.

Adversarial Training: Improvements and Applications

MBZUAI ·

This article discusses adversarial training (AT) as a method to improve the robustness of machine learning models against adversarial attacks. AT aims to correctly classify data and ensure no data fall near decision boundaries, simulating adversarial attacks during training. Dr. Jingfeng Zhang from RIKEN-AIP will present on improvements to AT and its application in evaluating and enhancing the reliability of ML methods. Why it matters: As ML models become more prevalent in real-world applications in the GCC region, ensuring their robustness against adversarial attacks is crucial for maintaining their reliability and security.

ArabJobs: A Multinational Corpus of Arabic Job Ads

arXiv ·

The ArabJobs dataset is a new corpus of over 8,500 Arabic job advertisements collected from Egypt, Jordan, Saudi Arabia, and the UAE. The dataset contains over 550,000 words and captures linguistic, regional, and socio-economic variation in the Arab labor market. It is available on GitHub and can be used for fairness-aware Arabic NLP and labor market research.

NADI 2022: The Third Nuanced Arabic Dialect Identification Shared Task

arXiv ·

The third Nuanced Arabic Dialect Identification Shared Task (NADI 2022) focused on advancing Arabic NLP through dialect identification and sentiment analysis at the country level. A total of 21 teams participated, with the winning team achieving 27.06 F1 score on dialect identification and 75.16 F1 score on sentiment analysis. The task highlights the challenges in Arabic dialect processing and motivates further research. Why it matters: Standardized evaluations like NADI are crucial for benchmarking progress and fostering innovation in Arabic NLP, especially for dialectal variations.