Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

arXiv · February 20, 2025 · Significant research

Summary

Researchers introduce TimeTravel, a benchmark dataset for evaluating large multimodal models (LMMs) on historical and cultural artifacts. The benchmark comprises 10,250 expert-verified samples across 266 cultures and 10 historical regions, designed to assess AI in tasks like classification and interpretation of manuscripts, artworks, inscriptions, and archaeological discoveries. The goal is to establish AI as a reliable partner in preserving cultural heritage and assisting researchers.

Keywords

multimodal models · historical artifacts · cultural heritage · benchmark · dataset

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

arXiv · Mar 21

The paper introduces SaudiCulture, a new benchmark for evaluating the cultural competence of LLMs within Saudi Arabia, covering five major geographical regions and diverse cultural domains. The benchmark includes questions of varying complexity and distinguishes between common and specialized regional knowledge. Evaluations of five LLMs (GPT-4, Llama 3.3, FANAR, Jais, and AceGPT) revealed performance declines on region-specific questions, highlighting the need for region-specific knowledge in LLM training.

Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

Summary

Keywords

Related

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia