This paper presents a benchmark study of contrastive learning (CL) methods applied to Arabic social meaning tasks like sentiment analysis and dialect identification. The study compares state-of-the-art supervised CL techniques against vanilla fine-tuning across a range of tasks. Results indicate that CL methods outperform vanilla fine-tuning in most cases and demonstrate data efficiency. Why it matters: This work highlights the potential of contrastive learning for improving performance in Arabic NLP, especially in low-resource scenarios.
MBZUAI researchers introduce SocialMaze, a new benchmark for evaluating social reasoning capabilities in large language models (LLMs). SocialMaze includes six diverse tasks across social reasoning games, daily-life interactions, and digital community platforms, emphasizing deep reasoning, dynamic interaction, and information uncertainty. Experiments show that LLMs vary in handling dynamic interactions, degrade under uncertainty, but can be improved via fine-tuning on curated reasoning examples.
The paper introduces ADAB (Arabic Politeness Dataset), a new annotated Arabic dataset for politeness detection collected from online platforms. The dataset covers Modern Standard Arabic and multiple dialects (Gulf, Egyptian, Levantine, and Maghrebi). It contains 10,000 samples across 16 politeness categories and achieves substantial inter-annotator agreement (kappa = 0.703). Why it matters: This dataset addresses the under-explored area of Arabic-language resources for politeness detection, which is crucial for culturally-aware NLP systems.
KAUST encouraged attendees of the 2015 Winter Enrichment Program (WEP) to share their experiences on social media using the hashtag #wep2015. The university provided tips for participants to effectively use platforms like Facebook, Twitter, and Instagram during the event. KAUST emphasized responsible sharing and respect for the university's multicultural community when posting. Why it matters: This initiative aimed to amplify the reach of WEP's activities and engage a broader audience in KAUST's community and knowledge-sharing efforts.
A new dataset called ArabCulture is introduced to address the lack of culturally relevant commonsense reasoning resources in Arabic AI. The dataset covers 13 countries across the Gulf, Levant, North Africa, and the Nile Valley, spanning 12 daily life domains with 54 fine-grained subtopics. It was built from scratch by native speakers writing and validating culturally relevant questions. Why it matters: The dataset highlights the need for more culturally aware models and benchmarks tailored to the Arabic-speaking world, moving beyond machine-translated resources.
The paper introduces MIRAGE, a framework for evaluating LLMs' ability to simulate human behaviors in murder mystery games. MIRAGE uses four methods: TII, CIC, ICI and SCI to assess the LLMs' role-playing proficiency. Experiments show that even GPT-4 struggles with the complexities of the MIRAGE framework.
This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.