This paper introduces a convolutional transformer model for classifying tomato maturity, along with a new UAE-sourced dataset, KUTomaData, for training segmentation and classification models. The model combines CNNs and transformers and was tested against two public datasets. Results showed state-of-the-art performance, outperforming existing methods by significant margins in mAP scores across all three datasets.
Researchers introduce TomFormer, a transformer-based model for accurate and early detection of tomato leaf diseases, with the goal of deployment on the Hello Stretch robot for real-time diagnosis. TomFormer combines a visual transformer and CNN, achieving state-of-the-art results on KUTomaDATA, PlantDoc, and PlantVillage datasets. KUTomaDATA was collected from a greenhouse in Abu Dhabi, UAE.
MBZUAI researchers have developed an AI program using vision transformers that can learn a person's handwriting style and generate text in that style. The US Patent and Trademark Office recently granted a patent for this technology, which could aid individuals with writing impairments. The system overcomes limitations of previous GAN-based approaches by processing long-range dependencies in handwriting. Why it matters: This patented AI tool enhances personalized text generation and has potential applications in assistive technology and improving handwriting recognition models.
Giovanni Puccetti from ISTI-CNR presented research on linguistic probing of language models like BERT and RoBERTa. The research investigates the ability of these models to encode linguistic properties, linking this ability to outlier parameters. Preliminary work on fine-tuning LLMs in Italian and detecting synthetic news generation was also presented. Why it matters: Understanding the inner workings and linguistic capabilities of LLMs is crucial for improving their reliability and adapting them to diverse languages like Arabic.
This paper introduces GigaBERT, a customized bilingual BERT model pre-trained for Arabic NLP and English-to-Arabic zero-shot transfer learning. The study evaluates GigaBERT's performance on four information extraction tasks: named entity recognition, part-of-speech tagging, argument role labeling, and relation extraction. Results show that GigaBERT outperforms mBERT, XLM-RoBERTa, and AraBERT in both supervised and zero-shot transfer settings. Why it matters: GigaBERT advances Arabic NLP by providing a high-performing, publicly available model tailored for the complexities of the Arabic language and cross-lingual applications.
The researchers introduce KAU-CSSL, the first continuous Saudi Sign Language (SSL) dataset focusing on complete sentences. They propose a transformer-based model using ResNet-18 for spatial feature extraction and a Transformer Encoder with Bidirectional LSTM for temporal dependencies. The model achieved 99.02% accuracy in signer-dependent mode and 77.71% in signer-independent mode, advancing communication tools for the SSL community.
The paper introduces AraModernBERT, an adaptation of the ModernBERT encoder architecture for Arabic, focusing on transtokenized embedding initialization and long-context modeling up to 8,192 tokens. Transtokenization is shown to be crucial for Arabic language modeling, significantly enhancing masked language modeling performance. The model demonstrates stable and effective long-context modeling, improving intrinsic language modeling performance at extended sequence lengths. Why it matters: This research provides practical insights for adapting modern encoder architectures to Arabic and other languages using Arabic-derived scripts, advancing Arabic NLP.