Skip to content
GCC AI Research

Detecting deepfakes in the presence of code-switching

MBZUAI · Significant research

Summary

MBZUAI researchers, in collaboration with Monash University, have introduced ArEnAV, a new dataset for deepfake detection featuring Arabic-English code-switching. The dataset comprises 765 hours of manipulated YouTube videos, incorporating intra-utterance code-switching and dialect variations. Experiments showed that code-switching significantly reduces the performance of existing deepfake detectors. Why it matters: This work addresses a critical gap in AI's ability to handle linguistic diversity, particularly in regions where code-switching is prevalent, enhancing the reliability of deepfake detection in real-world scenarios.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Detecting Propaganda Techniques in Code-Switched Social Media Text

arXiv ·

This paper introduces a new task: detecting propaganda techniques in code-switched text. The authors created and released a corpus of 1,030 English-Roman Urdu code-switched texts annotated with 20 propaganda techniques. Experiments show the importance of directly modeling multilinguality and using the right fine-tuning strategy for this task.

A Survey of Code-switched Arabic NLP: Progress, Challenges, and Future Directions

arXiv ·

This paper surveys the landscape of code-switched Arabic natural language processing, covering the mixture of Modern Standard Arabic, dialects, and foreign languages. It examines current efforts, challenges, and research gaps in the field. The survey also provides recommendations for future research directions in code-switched Arabic NLP. Why it matters: Understanding code-switching is crucial for developing effective language technologies that can handle the diverse linguistic landscape of the Arab world.

Human-Centric Approaches for Multimodal Deepfakes Analysis

MBZUAI ·

A talk explores multimodal approaches inspired by user behavior for detecting deepfakes, considering user studies on multicultural deepfakes and the ACM Multimedia 2024 benchmark. The research leverages insights into how different audiences perceive manipulated media. Abhinav Dhall from Flinders University will present findings and future directions in deepfake analysis at MBZUAI. Why it matters: Addressing deepfakes is crucial for maintaining trust in digital content, especially with the increasing sophistication and accessibility of AI-driven manipulation tools.

FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning

arXiv ·

MBZUAI researchers introduce FAID, a fine-grained AI-generated text detection framework capable of classifying text as human-written, LLM-generated, or collaboratively written. FAID utilizes multi-level contrastive learning and multi-task auxiliary classification to capture authorship and model-specific characteristics, and can identify the underlying LLM family. The framework outperforms existing baselines, especially in generalizing to unseen domains and new LLMs, and includes a multilingual, multi-domain dataset called FAIDSet.