MBZUAI student Zain Muhammad Mujahid is researching methods to detect media bias using NLP and LLMs. His approach profiles bias across media outlets using LLMs like ChatGPT to predict bias based on 16 identifiers. The research aims to develop a tool that instantly provides a bias profile for a given media URL. Why it matters: This research has the potential to combat misinformation and enhance media literacy in the region by providing tools to identify biased reporting, and it is expanding to Arabic and other languages.
A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.
A study by MBZUAI's Preslav Nakov and Cornell co-authors examines how to develop systems that detect fake news in a landscape where text is generated by humans and machines. The research, presented at the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics, analyzes fake news detectors' ability to identify human- and machine-written content. The study highlights biases in current detectors, which tend to classify machine-written news as fake and human-written news as true. Why it matters: Addressing these biases is crucial as machine-generated content becomes more prevalent in both real and fake news, requiring more nuanced detection methods.
Muhammad Arslan Manzoor became MBZUAI's first NLP Ph.D. graduate, focusing his research on media bias under Professor Preslav Nakov. His thesis, 'MGM,' explored using audience overlap graphs to predict the factuality and bias of news media, an approach that differs from traditional textual analysis. Manzoor's work aims to improve the efficiency of media profiling in real-time by leveraging relationships captured in media graphs. Why it matters: This research offers innovative methods for identifying bias in news, which is crucial for promoting informed social discourse and combating disinformation in the region.
Researchers from MBZUAI, University of Washington, and other institutions presented studies at EMNLP 2024 exploring how LLMs represent cultures. A survey analyzed dozens of recent studies on LLMs and culture and proposes a new framework for future research. The survey found that there is no widely accepted definition of 'culture' in NLP, making it challenging to interpret how models represent culture through language. Why it matters: This highlights a key gap in the field and emphasizes the need for a more rigorous and consistent understanding of culture in AI, especially as LLMs become more globally integrated.
MBZUAI researchers found that only 5.7% of music in existing datasets used to train generative music systems comes from non-Western genres. They discovered that 94% of the music represented Western music, while Africa, the Middle East, and South Asia accounted for only 0.3%, 0.4%, and 0.9% respectively. The team also tested whether parameter-efficient fine-tuning with adapters could improve generative music systems on underrepresented styles, presenting their findings at NAACL. Why it matters: This research highlights the critical need for more diverse datasets in AI music generation to better serve global musical traditions and audiences.
MBZUAI researchers introduce FAID, a fine-grained AI-generated text detection framework capable of classifying text as human-written, LLM-generated, or collaboratively written. FAID utilizes multi-level contrastive learning and multi-task auxiliary classification to capture authorship and model-specific characteristics, and can identify the underlying LLM family. The framework outperforms existing baselines, especially in generalizing to unseen domains and new LLMs, and includes a multilingual, multi-domain dataset called FAIDSet.
MBZUAI Professor Preslav Nakov has developed FRAPPE, an interactive website that analyzes news articles to identify persuasion techniques. FRAPPE helps users understand framing, persuasion, and propaganda at an aggregate level, across different news outlets and countries. Presented at EACL, FRAPPE uses 23 specific techniques categorized into six broader buckets, such as 'attack on reputation' and 'manipulative wording'. Why it matters: The tool addresses the increasing difficulty in discerning factual information from disinformation, providing a means to identify biases in news media from different countries.