Skip to content
GCC AI Research

GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification

arXiv · · Notable

Summary

Researchers from Georgia Tech explored Arabic medical text classification using 82 categories from the AbjadMed dataset. They compared fine-tuned AraBERTv2 encoders with hybrid pooling against multilingual encoders and large causal decoders like Llama 3.3 70B and Qwen 3B. The study found that bidirectional encoders outperformed causal decoders in capturing semantic boundaries for fine-grained medical text classification. Why it matters: The research provides insights into optimal model selection for specialized Arabic NLP tasks, specifically highlighting the effectiveness of fine-tuned encoders for medical text categorization.

Get the weekly digest

Top AI stories from the GCC region, every week.