GATech at AbjadMed: Bidirectional Encoders vs. Causal Decoders: Insights from 82-Class Arabic Medical Classification

arXiv · February 17, 2026 · Notable

Summary

Researchers from Georgia Tech explored Arabic medical text classification using 82 categories from the AbjadMed dataset. They compared fine-tuned AraBERTv2 encoders with hybrid pooling against multilingual encoders and large causal decoders like Llama 3.3 70B and Qwen 3B. The study found that bidirectional encoders outperformed causal decoders in capturing semantic boundaries for fine-grained medical text classification. Why it matters: The research provides insights into optimal model selection for specialized Arabic NLP tasks, specifically highlighting the effectiveness of fine-tuned encoders for medical text categorization.

Keywords

Arabic NLP · medical text classification · AraBERT · causal decoders · AbjadMed

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.