Skip to content
GCC AI Research

The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic

arXiv · · Significant research

Summary

Researchers introduce two new benchmarks, derived from the Qiyas exam, to evaluate mathematical reasoning and language understanding in Arabic. They tested ChatGPT-3.5-turbo and ChatGPT-4, which achieved 49% and 64% accuracy respectively. The new benchmarks aim to address the lack of resources for evaluating Arabic language models.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP

arXiv ·

This paper presents a comprehensive evaluation of ChatGPT's performance across 44 Arabic NLP tasks using over 60 datasets. The study compares ChatGPT's capabilities in Modern Standard Arabic (MSA) and Dialectal Arabic (DA) against smaller, fine-tuned models. Results show ChatGPT is outperformed by smaller, fine-tuned models and exhibits limitations in handling Arabic dialects compared to MSA. Why it matters: The work highlights the need for further research and development of Arabic-specific NLP models to overcome the limitations of general-purpose models like ChatGPT.