This study analyzes the evolution of data science vocabulary using 16,018 abstracts containing "data science" over 13 years. It identifies new vocabulary introduction and its integration into scientific literature using techniques like EDA, LSA, LDA, and N-grams. The research compares overall scientific publications with those specific to Saudi Arabia, identifying representative articles based on vocabulary usage. Why it matters: The work provides insights into the development of data science terminology and its specific adoption within the Saudi Arabian research landscape.
Researchers from Alexandria University introduce AlexU-Word, a new dataset for offline Arabic handwriting recognition. The dataset contains 25,114 samples of 109 unique Arabic words, covering all letter shapes, collected from 907 writers. The dataset is designed for closed-vocabulary word recognition and to support segmented letter recognition-based systems. Why it matters: This dataset can help advance Arabic handwriting recognition systems, addressing a need for high-quality Arabic datasets in NLP research.
A new culturally inclusive and linguistically diverse dataset called Palm for Arabic LLMs is introduced, covering 22 Arab countries and featuring instructions in both Modern Standard Arabic (MSA) and dialectal Arabic (DA) across 20 topics. The dataset was built through a year-long community-driven project involving 44 researchers from across the Arab world. Evaluation of frontier LLMs using the dataset reveals limitations in cultural and dialectal understanding, with some countries being better represented than others.
Michael Hickner, an Associate Professor from Penn State University, visited KAUST as part of the CRDF-KAUST-OSR Visiting Scholar Fellowship Program. Hickner specializes in Materials Science and Engineering, Chemistry, and Chemical Engineering. The visit was documented with photos by Meres J. Weche. Why it matters: Such programs foster international collaboration and knowledge exchange in science and engineering between KAUST and other leading institutions.
Dr. Eric Fossum, professor at Dartmouth and inventor of CMOS active pixel image sensors, spoke at KAUST's 2017 Enrichment in the Spring Program. The lecture focused on how to be a successful scientist-entrepreneur. He received a gift from the KAUST Enrichment Programs team. Why it matters: This highlights KAUST's efforts to engage with leading international experts to foster innovation and entrepreneurship among its researchers and students.
A proposed recognition system aims to identify missing persons, deceased individuals, and lost objects during the Hajj and Umrah pilgrimages in Saudi Arabia. The system intends to leverage facial recognition and object identification to manage the large crowds expected in the coming decade, estimated to reach 20 million pilgrims. It will be integrated into the CrowdSensing system for crowd estimation, management, and safety.
Ted Briscoe from the University of Cambridge discussed using machine learning and NLP to develop learning-oriented assessment (LOA) for non-native writers. The technology is used in Cambridge English courseware like Empower and Linguaskill, as well as Write and Improve. Briscoe is also the co-founder and CEO of iLexIR Ltd. Why it matters: Improving automated language assessment could significantly enhance online language learning platforms in the Arab world and beyond.