Researchers introduce AraNet, a deep learning toolkit for Arabic social media processing. The toolkit uses BERT models trained on social media datasets to predict age, dialect, gender, emotion, irony, and sentiment. AraNet achieves state-of-the-art or competitive performance on these tasks without feature engineering. Why it matters: The public release of AraNet accelerates Arabic NLP research by providing a comprehensive, deep learning-based tool for various social media analysis tasks.
The National Center for Vegetation Cover Development and Combating Desertification (NCVC) and KAUST have launched the SAUDINet initiative. The initiative aims to advance terrestrial ecology research in Saudi Arabia, focusing on restoring degraded lands, enhancing carbon sequestration and preserving biodiversity. NCVC’s workforce will receive specialized training in biodiversity monitoring and ecological sampling, with samples analyzed in KAUST’s labs. Why it matters: The partnership aims to establish Saudi Arabia as a global leader in the study of arid ecosystems and address the lack of data from hyper-arid lands in climate models.
Researchers at the American University of Beirut (AUB) have released AraBERT, a BERT model pre-trained specifically for Arabic language understanding. The model was trained on a large Arabic corpus and compared against multilingual BERT and other state-of-the-art methods. AraBERT achieved state-of-the-art performance on several tested Arabic NLP tasks including sentiment analysis, named entity recognition, and question answering. Why it matters: This release provides the Arabic NLP community with a high-performing, open-source language model, facilitating further research and development.
The study introduces AraSpider, the first Arabic version of the Spider dataset, to advance Arabic NLP. Four multilingual translation models and two text-to-SQL models (ChatGPT 3.5 and SQLCoder) were evaluated. Back translation significantly improved the performance of both ChatGPT 3.5 and SQLCoder on the AraSpider dataset. Why it matters: This work democratizes access to text-to-SQL resources for Arabic speakers and provides a methodology for translating datasets to other languages.
KAUST researchers Anthony Cioppa and Silvio Giancola have developed SoccerNet, an open platform for AI-driven sports analysis. SoccerNet uses a large reference set of soccer game recordings (500 games, 850 hours) to provide a platform for research. It enables researchers to develop AI systems that understand and analyze soccer games. Why it matters: This platform addresses the challenge of limited datasets in sports AI research, fostering innovation and standardized performance comparison.
The paper introduces AraGPT2, a suite of pre-trained transformer models for Arabic language generation, with the largest model (AraGPT2-mega) containing 1.46 billion parameters. Trained on a large Arabic corpus of internet text and news, AraGPT2-mega demonstrates strong performance in synthetic news generation and zero-shot question answering. To address the risk of misuse, the authors also released a discriminator model with 98% accuracy in detecting AI-generated text. Why it matters: This release of both the model and discriminator fills a critical gap in Arabic NLP and encourages further research and applications in the field.
The paper introduces AraTrust, a new benchmark for evaluating the trustworthiness of LLMs when prompted in Arabic. The benchmark contains 522 multiple-choice questions covering dimensions like truthfulness, ethics, safety, and fairness. Experiments using AraTrust showed that GPT-4 performed the best, while open-source models like AceGPT 7B and Jais 13B had lower scores. Why it matters: This benchmark addresses a critical gap in evaluating LLMs for Arabic, which is essential for ensuring the safe and ethical deployment of AI in the Arab world.
The Autonomous Robotics Research Center (ARRC) is developing underwater communication systems, including a multimode modem prototype, and has filed three patents. One key technology is the Universal Underwater Software Defined Modem (UniSDM), which supports sound, magnetic induction, light, and radio waves. ARRC also developed a network management framework for automatic network slicing (ANS) of communication resources. Why it matters: These advancements are crucial for improving underwater exploration, industrial maintenance, and marine monitoring in the region, enabling more efficient and reliable communication for underwater robots.