Skip to content
GCC AI Research

LlamaLens: Specialized Multilingual LLM for Analyzing News and Social Media Content

arXiv · · Significant research

Summary

Researchers have introduced LlamaLens, a specialized multilingual LLM designed for analyzing news and social media content. The model addresses domain specificity and multilinguality, with a focus on news and social media in Arabic, English, and Hindi. LlamaLens was evaluated on 18 tasks represented by 52 datasets, outperforming the state-of-the-art on 23 testing sets. Why it matters: This work contributes a valuable resource for multilingual NLP research, particularly in the context of analyzing news and social media content across diverse languages.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

Profiling News Media for Factuality and Bias Using LLMs and the Fact-Checking Methodology of Human Experts

arXiv ·

A new methodology emulating fact-checker criteria assesses news outlet factuality and bias using LLMs. The approach uses prompts based on fact-checking criteria to elicit and aggregate LLM responses for predictions. Experiments demonstrate improvements over baselines, with error analysis on media popularity and region, and a released dataset/code at https://github.com/mbzuai-nlp/llm-media-profiling.

AlcLaM: Arabic Dialectal Language Model

arXiv ·

The paper introduces AlcLaM, an Arabic dialectal language model trained on 3.4M sentences from social media. AlcLaM expands the vocabulary and retrains a BERT-based model, using only 13GB of dialectal text. Despite the smaller training data, AlcLaM outperforms models like CAMeL, MARBERT, and ArBERT on various Arabic NLP tasks. Why it matters: AlcLaM offers a more efficient and accurate approach to Arabic NLP by focusing on dialectal Arabic, which is often underrepresented in existing models.

AraNet: A Deep Learning Toolkit for Arabic Social Media

arXiv ·

Researchers introduce AraNet, a deep learning toolkit for Arabic social media processing. The toolkit uses BERT models trained on social media datasets to predict age, dialect, gender, emotion, irony, and sentiment. AraNet achieves state-of-the-art or competitive performance on these tasks without feature engineering. Why it matters: The public release of AraNet accelerates Arabic NLP research by providing a comprehensive, deep learning-based tool for various social media analysis tasks.

ALLaM: Large Language Models for Arabic and English

arXiv ·

The paper introduces ALLaM, a series of large language models for Arabic and English, designed to support Arabic Language Technologies. The models are trained with language alignment and knowledge transfer in mind, using a decoder-only architecture. ALLaM achieves state-of-the-art results on Arabic benchmarks like MMLU Arabic and Arabic Exams. Why it matters: This work advances Arabic NLP by providing high-performing LLMs and demonstrating effective techniques for cross-lingual transfer learning and alignment with human preferences.