Skip to content
GCC AI Research

Search

Results for "historical corpus"

Studying the History of the Arabic Language: Language Technology and a Large-Scale Historical Corpus

arXiv ·

This paper introduces a large-scale historical corpus of written Arabic spanning 1400 years. The corpus was cleaned and processed using Arabic NLP tools, including identification of reused text. The study uses a novel automatic periodization algorithm to study the history of the Arabic language, confirming the division into Modern Standard and Classical Arabic. Why it matters: This resource enables further computational research into the evolution of Arabic and the development of NLP tools for historical texts.