LAraBench: Benchmarking Arabic AI with Large Language Models

arXiv · May 24, 2023 · Significant research

Summary

LAraBench introduces a benchmark for Arabic NLP and speech processing, evaluating LLMs like GPT-3.5-turbo, GPT-4, BLOOMZ, Jais-13b-chat, Whisper, and USM. The benchmark covers 33 tasks across 61 datasets, using zero-shot and few-shot learning techniques. Results show that SOTA models generally outperform LLMs in zero-shot settings, though larger LLMs with few-shot learning reduce the gap. Why it matters: This benchmark helps assess and improve the performance of LLMs on Arabic language tasks, highlighting areas where specialized models still excel.

Keywords

LAraBench · Arabic NLP · LLM · GPT-3.5 · Jais-13b

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP

arXiv · Jun 10

This paper benchmarks reasoning-focused LLMs, especially DeepSeek models, on fifteen Arabic NLP tasks. The study uses zero-shot, few-shot, and fine-tuning strategies. Key findings include that three in-context examples improve F1 scores by over 13 points on classification tasks, DeepSeek outperforms GPT-4-mini by 12 F1 points on complex inference tasks in the zero-shot setting, and LoRA fine-tuning yields up to an additional 8 points in F1 and BLEU. Why it matters: The systematic evaluation provides insights into the performance of LLMs on Arabic NLP, highlighting the effectiveness of different strategies for improving performance and contributing to the development of more capable Arabic language models.

LAraBench: Benchmarking Arabic AI with Large Language Models

Summary

Keywords

Related

AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP