An Accurate Arabic Root-Based Lemmatizer for Information Retrieval Purposes

arXiv · March 15, 2012 · Notable

NLP Arabic AI Research Information Retrieval

Summary

This paper introduces a new non-statistical Arabic lemmatizer algorithm designed for information retrieval systems. The lemmatizer leverages Arabic language knowledge resources to generate accurate lemma forms and relevant features. The algorithm achieves a maximum accuracy of 94.8% and 89.15% on first seen documents, outperforming the Stanford Arabic model's 76.7% on the same dataset. Why it matters: Accurate Arabic lemmatization is crucial for improving the performance of Arabic information retrieval systems, which can enhance access to Arabic language content.

Keywords

Arabic NLP · lemmatization · information retrieval · POS tagging · algorithm

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.