Researchers from the National Center for AI in Saudi Arabia investigated the sensitivity of Large Language Model (LLM) leaderboards to minor benchmark perturbations. They found that small changes, like choice order, can shift rankings by up to 8 positions. The study recommends hybrid scoring and warns against over-reliance on simple benchmark evaluations, providing code for further research.
The Open Arabic LLM Leaderboard (OALL) has been launched to benchmark Arabic language models, addressing the gap in resources for non-English NLP. It incorporates datasets like AlGhafa, ACVA, and translated versions of MMLU and EXAMS from the AceGPT suite. The leaderboard uses normalized log likelihood accuracy for tasks, built around HuggingFace’s LightEval framework. Why it matters: This initiative promotes research and development in Arabic NLP, serving over 380 million Arabic speakers by enhancing the evaluation and improvement of Arabic LLMs.
KAUST held an Innovation & Economic Development Open House event on October 4 and 5. The event showcased industry partners in the KAUST Innovation Cluster, including Dow Chemical, SABIC, Saudi Aramco, and startups like FalconViz and NOMADD. Student groups like the Entrepreneurship Business & Innovation Group (eBIG) also participated, highlighting efforts to foster innovation within the KAUST community. Why it matters: This event demonstrates KAUST's ongoing commitment to fostering entrepreneurship and translating research into real-world applications, aligning with Saudi Arabia's broader economic diversification goals.