GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification
arXiv · · Notable
Summary
Researchers from Georgia Tech explored Arabic medical text classification using 82 categories from the AbjadMed dataset. They compared fine-tuned AraBERTv2 encoders with hybrid pooling against multilingual encoders and large causal decoders like Llama 3.3 70B and Qwen 3B. The study found that bidirectional encoders outperformed causal decoders in capturing semantic boundaries for fine-grained medical text classification. Why it matters: The research provides insights into optimal model selection for specialized Arabic NLP tasks, specifically highlighting the effectiveness of fine-tuned encoders for medical text categorization.
Keywords
Arabic NLP · medical text classification · AraBERT · causal decoders · AbjadMed
Get the weekly digest
Top AI stories from the GCC region, every week.