Skip to content
GCC AI Research

Search

Results for "POS tagging"

Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging

arXiv ·

This paper explores language-independent alternatives to morphological segmentation for Arabic NLP using data-driven sub-word units, characters as a unit of learning, and word embeddings learned using a character CNN. The study evaluates these methods on machine translation and POS tagging tasks. Results show these methods achieve performance close to or surpassing state-of-the-art approaches. Why it matters: By offering simpler, more adaptable segmentation techniques, this research can help improve Arabic NLP applications across diverse domains and dialects.

Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization

arXiv ·

The paper addresses the challenge of missing diacritics in Arabic NLP by exploring naturally occurring diacritics in a new dataset across six genres. It maps partially diacritized words to their full diacritization and proposes extensions to the analyze-and-disambiguate approach. The extended diacritization algorithm achieves notable improvements, and the code/datasets are released as open source. Why it matters: This research provides valuable resources and methods for improving Arabic text processing, especially in contexts where diacritization is crucial for accurate interpretation.

An Accurate Arabic Root-Based Lemmatizer for Information Retrieval Purposes

arXiv ·

This paper introduces a new non-statistical Arabic lemmatizer algorithm designed for information retrieval systems. The lemmatizer leverages Arabic language knowledge resources to generate accurate lemma forms and relevant features. The algorithm achieves a maximum accuracy of 94.8% and 89.15% on first seen documents, outperforming the Stanford Arabic model's 76.7% on the same dataset. Why it matters: Accurate Arabic lemmatization is crucial for improving the performance of Arabic information retrieval systems, which can enhance access to Arabic language content.

Combining Context-Free and Contextualized Representations for Arabic Sarcasm Detection and Sentiment Identification

arXiv ·

This paper presents team SPPU-AASM's hybrid model for Arabic sarcasm and sentiment detection in the WANLP ArSarcasm shared task 2021. The model combines sentence representations from AraBERT with static word vectors trained on Arabic social media corpora. Results show the system achieves an F1-sarcastic score of 0.62 and a F-PN score of 0.715, outperforming existing approaches. Why it matters: The research demonstrates that combining context-free and contextualized representations improves performance in nuanced Arabic NLP tasks like sarcasm and sentiment analysis.

Machine learning and natural language processing in support of interactive automated tutoring for non-native

MBZUAI ·

Ted Briscoe from the University of Cambridge discussed using machine learning and NLP to develop learning-oriented assessment (LOA) for non-native writers. The technology is used in Cambridge English courseware like Empower and Linguaskill, as well as Write and Improve. Briscoe is also the co-founder and CEO of iLexIR Ltd. Why it matters: Improving automated language assessment could significantly enhance online language learning platforms in the Arab world and beyond.

A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic

arXiv ·

The paper introduces a two-step approach for transliterating Judeo-Arabic text (written in Hebrew script) into Arabic script. The method involves character-level mapping followed by post-correction to fix grammatical and orthographic errors. The authors also benchmarked LLMs on the transliteration task and demonstrate that transliteration enables the use of Arabic NLP tools on Judeo-Arabic. Why it matters: This work makes Judeo-Arabic texts more accessible to Arabic NLP, enabling processing and analysis that was previously impossible.

Words at work: New directions in natural language processing with Ted Briscoe

MBZUAI ·

MBZUAI's Professor Ted Briscoe is working on an educational technology initiative with IBM to support Arabic literacy in the Gulf by providing personalized feedback on student writing. He is also developing a question-answering system for Abu Dhabi Global Market to help companies understand local regulations. The Q&A system aims to assist smaller companies in establishing offices in Abu Dhabi by providing affordable access to regulatory information. Why it matters: These projects apply NLP to address practical needs in education and business, fostering Arabic literacy and easing regulatory compliance for SMEs in the UAE.