Skip to content
GCC AI Research

An Accurate Arabic Root-Based Lemmatizer for Information Retrieval Purposes

arXiv · · Notable

Summary

This paper introduces a new non-statistical Arabic lemmatizer algorithm designed for information retrieval systems. The lemmatizer leverages Arabic language knowledge resources to generate accurate lemma forms and relevant features. The algorithm achieves a maximum accuracy of 94.8% and 89.15% on first seen documents, outperforming the Stanford Arabic model's 76.7% on the same dataset. Why it matters: Accurate Arabic lemmatization is crucial for improving the performance of Arabic information retrieval systems, which can enhance access to Arabic language content.

Get the weekly digest

Top AI stories from the GCC region, every week.