Skip to content
GCC AI Research

Search

Results for "AlexU-Word"

AlexU-Word: A New Dataset for Isolated-Word Closed-Vocabulary Offline Arabic Handwriting Recognition

arXiv ·

Researchers from Alexandria University introduce AlexU-Word, a new dataset for offline Arabic handwriting recognition. The dataset contains 25,114 samples of 109 unique Arabic words, covering all letter shapes, collected from 907 writers. The dataset is designed for closed-vocabulary word recognition and to support segmented letter recognition-based systems. Why it matters: This dataset can help advance Arabic handwriting recognition systems, addressing a need for high-quality Arabic datasets in NLP research.

Follow your passion

KAUST ·

Entrepreneur Alexandru Ionut Budisteanu spoke at KAUST's 2018 Winter Enrichment Program (WEP) about pursuing one's passion to achieve their dreams. Budisteanu shared his journey of creating video games and building an autonomous self-driving car prototype. He emphasized the importance of finding a job or activity that one loves and working with passion. Why it matters: Showcases KAUST's efforts to host inspiring speakers and promote entrepreneurship among students.

AraBERT: Transformer-based Model for Arabic Language Understanding

arXiv ·

Researchers at the American University of Beirut (AUB) have released AraBERT, a BERT model pre-trained specifically for Arabic language understanding. The model was trained on a large Arabic corpus and compared against multilingual BERT and other state-of-the-art methods. AraBERT achieved state-of-the-art performance on several tested Arabic NLP tasks including sentiment analysis, named entity recognition, and question answering. Why it matters: This release provides the Arabic NLP community with a high-performing, open-source language model, facilitating further research and development.

AI Safety Research

MBZUAI ·

Adel Bibi, a KAUST alumnus and researcher at the University of Oxford, presented his research on AI safety, covering robustness, alignment, and fairness of LLMs. The research addresses challenges in AI systems, alignment issues, and fairness across languages in common tokenizers. Bibi's work includes instruction prefix tuning and its theoretical limitations towards alignment. Why it matters: This research from a leading researcher highlights the importance of addressing safety concerns in LLMs, particularly regarding alignment and fairness in the Arabic language.

A Glass Bead Game of *-ology: Contemporary Computational Approaches to Linguistic Morphology, Typology and Social Psychology

MBZUAI ·

Ekaterina Vylomova from the University of Melbourne gave a talk on using NLP models to advance research in linguistic morphology, typology, and social psychology. The talk covered using models to study morphology, phonetic changes in words over time, and diachronic changes in language semantics. Vylomova presented the UniMorph project, a cross-lingual annotation schema and database with morphological paradigms for over 150 languages. Why it matters: This research demonstrates the potential of NLP to contribute to a deeper understanding of language evolution and structure, with applications in linguistic research and the study of social and cultural changes.

Machine learning and natural language processing in support of interactive automated tutoring for non-native

MBZUAI ·

Ted Briscoe from the University of Cambridge discussed using machine learning and NLP to develop learning-oriented assessment (LOA) for non-native writers. The technology is used in Cambridge English courseware like Empower and Linguaskill, as well as Write and Improve. Briscoe is also the co-founder and CEO of iLexIR Ltd. Why it matters: Improving automated language assessment could significantly enhance online language learning platforms in the Arab world and beyond.

Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks

arXiv ·

MBZUAI introduces Agent-X, a benchmark for evaluating multi-step reasoning in vision-centric agents across real-world, multimodal settings. Agent-X includes 828 tasks with diverse visual contexts and spans six environments, requiring tool use and stepwise decision-making. Experiments show that current LLMs struggle with multi-step vision tasks, achieving less than 50% success, highlighting areas for improvement in LMM reasoning and tool use.