Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
arXiv · · Significant research
Summary
MBZUAI researchers release 'Fann or Flop', a new benchmark for evaluating Arabic poetry understanding in LLMs. The benchmark covers 12 historical eras and 14 poetic genres, assessing semantic understanding, metaphor interpretation, and cultural context. Evaluation of state-of-the-art LLMs reveals challenges in poetic understanding despite strong performance on standard Arabic benchmarks.
Keywords
Arabic poetry · LLM · benchmark · cultural context · semantic understanding
Get the weekly digest
Top AI stories from the GCC region, every week.