KAUST researchers introduced MOLE, a framework leveraging LLMs for automated metadata extraction from scientific papers. The system processes documents in multiple formats and validates outputs, targeting datasets beyond Arabic. A new benchmark dataset has been released to evaluate progress in metadata extraction.
Researchers created Masader, the largest public catalog for Arabic NLP datasets, containing 200 datasets annotated with 25 attributes. They developed a metadata annotation strategy applicable to other languages. The paper highlights issues within current Arabic NLP datasets and suggests recommendations. Why it matters: This curated dataset directory helps lower the barrier to entry for Arabic NLP research and development.
This paper introduces a hybrid deep learning and machine learning pipeline for classifying construction and demolition waste. A dataset of 1,800 images from UAE construction sites was created, and deep features were extracted using a pre-trained Xception network. The combination of Xception features with machine learning classifiers achieved up to 99.5% accuracy, demonstrating state-of-the-art performance for debris identification.