Skip to content
GCC AI Research

Search

Results for "proper nouns"

Proper Noun Diacritization for Arabic Wikipedia: A Benchmark Dataset

arXiv ·

A new dataset for Arabic proper noun diacritization was introduced, addressing the ambiguity caused by undiacritized proper nouns in Arabic Wikipedia. The dataset includes manually diacritized Arabic proper nouns of various origins along with their English Wikipedia glosses. GPT-4o was benchmarked on the task of recovering full diacritization from undiacritized Arabic and English forms, achieving 73% accuracy. Why it matters: The release of this dataset should facilitate further research on Arabic Wikipedia proper noun diacritization, improving the accessibility and accuracy of Arabic NLP resources.

Fact checking with ChatGPT

MBZUAI ·

A new paper from MBZUAI researchers explores using ChatGPT to combat the spread of fake news. The researchers, including Preslav Nakov and Liangming Pan, demonstrate that ChatGPT can be used to fact-check published information. Their paper, "Fact-Checking Complex Claims with Program-Guided Reasoning," was accepted at ACL 2023. Why it matters: This research highlights the potential of large language models to address the growing challenge of misinformation, with implications for maintaining information integrity in the digital age.

User-Centric Gender Rewriting

MBZUAI ·

NYU and NYU Abu Dhabi researchers are working on user-centric gender rewriting in NLP, especially for Arabic. They are building an Arabic Parallel Gender Corpus and developing models for gender rewriting tasks. The work aims to address representational harms caused by NLP systems that don't account for user preferences regarding grammatical gender. Why it matters: This research promotes fairness and inclusivity in Arabic NLP by enabling systems to generate gender-specific outputs based on user preferences, mitigating biases present in training data.

The Human Phenotype Project

MBZUAI ·

Professor Eran Segal presented The Human Phenotype Project, a longitudinal cohort study with over 10,000 participants. The project aims to identify molecular markers and develop prediction models for disease using deep profiling techniques including medical history, lifestyle, blood tests, and microbiome analysis. The study provides insights into drivers of obesity, diabetes, and heart disease, identifying novel markers at the microbiome, metabolite, and immune system level. Why it matters: Such large-scale phenotyping initiatives could inform personalized medicine approaches relevant to the Middle East's specific health challenges.

Science: The language of modern life

KAUST ·

Michael Hickner, an Associate Professor from Penn State University, visited KAUST as part of the CRDF-KAUST-OSR Visiting Scholar Fellowship Program. Hickner specializes in Materials Science and Engineering, Chemistry, and Chemical Engineering. The visit was documented with photos by Meres J. Weche. Why it matters: Such programs foster international collaboration and knowledge exchange in science and engineering between KAUST and other leading institutions.

LLMs tackle math word problems

MBZUAI ·

MBZUAI researchers presented a study at NAACL 2024 analyzing errors made by open-source LLMs when solving math word problems. The study, led by Ekaterina Kochmar and KV Aditya Srivatsa, investigates characteristics that make math word problems difficult for machines. Llama2-70B was used to test the ability of LLMs to solve these problems, revealing that LLMs can perform math operations correctly but still give the wrong answer. Why it matters: The research aims to improve AI's ability to understand and solve math word problems, potentially leading to better educational applications and teaching methods.

Culturally Yours: A new tool for understanding cultural references in text

MBZUAI ·

MBZUAI researchers have developed "Culturally Yours," a reading assistant that highlights and explains culturally-specific items on webpages to help users understand unfamiliar terms. The tool addresses the "cold-start problem" by asking users for demographic information to personalize the identification of potentially unfamiliar cultural references. It was presented at the 31st International Conference on Computational Linguistics in Abu Dhabi. Why it matters: This tool can help bridge linguistic and cultural gaps, particularly for underrepresented languages and cultures, and aid businesses in reaching diverse audiences.

Mubeen AI: A Specialized Arabic Language Model for Heritage Preservation and User Intent Understanding

arXiv ·

MASARAT SA has developed Mubeen, a proprietary Arabic language model specializing in Arabic linguistics, Islamic studies, and cultural heritage. Mubeen was trained using native Arabic sources, including digitized historical manuscripts processed via a proprietary Arabic OCR engine. The model employs a Practical Closure Architecture to improve user intent understanding and provide decisive guidance. Why it matters: Mubeen addresses the utility gap in current Arabic LLMs by focusing on native Arabic data and cultural authenticity, which is critical for heritage preservation and alignment with Saudi Vision 2030.