The paper introduces UAE-3D, a multi-modal VAE for 3D molecule generation that compresses molecules into a unified latent space, maintaining near-zero reconstruction error. This approach simplifies latent diffusion modeling by eliminating the need to handle multi-modality and equivariance separately. Experiments on GEOM-Drugs and QM9 datasets show UAE-3D establishes new benchmarks in de novo and conditional 3D molecule generation, with significant improvements in efficiency and quality.
Researchers created Masader, the largest public catalog for Arabic NLP datasets, containing 200 datasets annotated with 25 attributes. They developed a metadata annotation strategy applicable to other languages. The paper highlights issues within current Arabic NLP datasets and suggests recommendations. Why it matters: This curated dataset directory helps lower the barrier to entry for Arabic NLP research and development.
The Symposium on Data Mining and Applications (SDMA 2014) was organized by MEGDAM to foster collaboration among data mining and machine learning researchers in Saudi Arabia, GCC countries, and the Middle East. The symposium covered areas such as statistics, computational intelligence, pattern recognition, databases, Big Data Mining and visualization. Acceptance was based on originality, significance and quality of contribution.
This paper introduces a novel evaluation framework for Arabic language models, addressing gaps in linguistic accuracy and cultural alignment. The authors analyze existing datasets and present the Arabic Depth Mini Dataset (ADMD), a curated collection of 490 questions across ten domains. Evaluating GPT-4, Claude 3.5 Sonnet, Gemini Flash 1.5, CommandR 100B, and Qwen-Max using ADMD reveals performance variations, with Claude 3.5 Sonnet achieving the highest accuracy at 30%. Why it matters: The work emphasizes the importance of cultural competence in Arabic language model evaluation, providing practical insights for improvement.