N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
This paper benchmarks the performance of OpenAI's Whisper model on diverse Arabic speech recognition tasks, using publicly available data and novel dialect evaluation sets. The study explores zero-shot, few-shot, and full finetuning scenarios. Results indicate that while Whisper outperforms XLS-R models in zero-shot settings on standard datasets, its performance drops significantly when applied to unseen Arabic dialects.