This paper introduces a large-scale historical corpus of written Arabic spanning 1400 years. The corpus was cleaned and processed using Arabic NLP tools, including identification of reused text. The study uses a novel automatic periodization algorithm to study the history of the Arabic language, confirming the division into Modern Standard and Classical Arabic. Why it matters: This resource enables further computational research into the evolution of Arabic and the development of NLP tools for historical texts.
Researchers from MBZUAI have proposed a new taxonomy of eight temporal frames and studied their persuasive use in news discourse. They created a multilingual dataset by expertly annotating 458 English and German news articles, identifying over 2,000 temporally framed sentences and approximately 3,000 annotations. Their experiments demonstrated that temporal framing is learnable at the sentence level, with supervised models significantly outperforming zero-shot classification approaches. Why it matters: This research provides a valuable dataset and methodology for understanding how time-related language shapes interpretation in news, contributing to advancements in NLP for media analysis and potentially countering disinformation.