The paper introduces FanarGuard, a bilingual moderation filter for Arabic and English language models that considers both safety and cultural alignment. A dataset of 468K prompt-response pairs was created and scored by LLM judges on harmlessness and cultural awareness to train the filter. The first benchmark targeting Arabic cultural contexts was developed to evaluate cultural alignment. Why it matters: FanarGuard advances context-sensitive AI safeguards by integrating cultural awareness into content moderation, addressing a critical gap in current alignment techniques.
Thamar Solorio from the University of Houston presented preliminary work on multimodal representation learning for detecting objectionable content in videos at MBZUAI. The research investigates two multimodal pretraining mechanisms, finding contrastive learning more effective than unimodal representation prediction. The study also assesses the value of common multimodal corpora for this task. Why it matters: This research contributes to the development of AI techniques for content moderation, an important issue for online platforms in the Middle East and globally.
Zeerak Talat, an independent scholar, gave a talk at MBZUAI on automated content moderation and the impacts of machine learning on society. Talat's research considers how machine learning interacts with and impacts societies through content moderation technologies, drawing from NLP, privacy preserving machine learning, science and technology studies, decolonize studies, and media studies. The talk highlighted research areas that can afford productive directions for the meeting between machine learning and society. Why it matters: The talk contributes to the discussion of ethical AI development and deployment in the region, particularly regarding content moderation and its societal impacts.
MBZUAI Professor Preslav Nakov discusses Meta's shift to crowdsourced fact-checking via Community Notes, replacing third-party fact-checkers. Community Notes, originating from Twitter's Birdwatch, allows users to add context to potentially misleading posts, visible after community consensus. Research indicates this approach can reduce misinformation and lead to post retractions. Why it matters: The adoption of crowdsourcing for content moderation by major platforms like Meta could significantly impact online information quality for billions of users.
A new content improvement system has been developed to address issues of randomness and incorrectness in text generated by deep learning models like GPT-3. The system uses text mining to identify correct sentences and employs syntactic/semantic generalization to substitute problematic elements. The system can substantially improve the factual correctness and meaningfulness of raw content. Why it matters: Improving the quality of automatically generated content is crucial for ensuring reliability and trustworthiness across various AI applications.
A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.
This paper introduces two shared tasks for abusive and threatening language detection in Urdu, a low-resource language with over 170 million speakers. The tasks involve binary classification of Urdu tweets into Abusive/Non-Abusive and Threatening/Non-Threatening categories, respectively. Datasets of 2400/6000 training tweets and 1100/3950 testing tweets were created and manually annotated, along with logistic regression and BERT-based baselines. 21 teams participated and the best systems achieved F1-scores of 0.880 and 0.545 on the abusive and threatening language tasks, respectively, with m-BERT showing the best performance.