Detecting deepfakes in the presence of code-switching

MBZUAI · Significant research

Summary

MBZUAI researchers, in collaboration with Monash University, have introduced ArEnAV, a new dataset for deepfake detection featuring Arabic-English code-switching. The dataset comprises 765 hours of manipulated YouTube videos, incorporating intra-utterance code-switching and dialect variations. Experiments showed that code-switching significantly reduces the performance of existing deepfake detectors. Why it matters: This work addresses a critical gap in AI's ability to handle linguistic diversity, particularly in regions where code-switching is prevalent, enhancing the reliability of deepfake detection in real-world scenarios.

Keywords

deepfake detection · code-switching · Arabic-English · dataset · MBZUAI

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

Detecting Propaganda Techniques in Code-Switched Social Media Text

arXiv · May 23

This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.

A Survey of Code-switched Arabic NLP: Progress, Challenges, and Future Directions

arXiv · Jan 23

This paper surveys the landscape of code-switched Arabic natural language processing, covering the mixture of Modern Standard Arabic, dialects, and foreign languages. It examines current efforts, challenges, and research gaps in the field. The survey also provides recommendations for future research directions in code-switched Arabic NLP. Why it matters: Understanding code-switching is crucial for developing effective language technologies that can handle the diverse linguistic landscape of the Arab world.

GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human

arXiv · Jan 19

The GenAI Content Detection Task 1 is a shared task on detecting machine-generated text, featuring monolingual (English) and multilingual subtasks. The task, part of the GenAI workshop at COLING 2025, attracted 36 teams for the English subtask and 26 for the multilingual one. The organizers provide a detailed overview of the data, results, system rankings, and analysis of the submitted systems.

Challenges and Solutions in Developing Code-switched Arabic-English NLP Systems

MBZUAI · Invalid Date

Injy Hamed from NYU Abu Dhabi's CAMeL Lab presented work on Egyptian Arabic-English code-switching for ASR and MT. She discussed the ArzEn-ST speech translation corpus and compared end-to-end and hybrid systems for ASR. For MT, she presented data augmentation and word segmentation techniques to handle data scarcity, also addressing ASR evaluation challenges in code-switching. Why it matters: Research into code-switching is crucial for building NLP systems capable of processing real-world language use in the Arab world.