The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic
arXiv ·
Researchers introduce two new benchmarks, derived from the Qiyas exam, to evaluate mathematical reasoning and language understanding in Arabic. They tested ChatGPT-3.5-turbo and ChatGPT-4, which achieved 49% and 64% accuracy respectively. The new benchmarks aim to address the lack of resources for evaluating Arabic language models.