Skip to content
GCC AI Research

Evaluating Web Search Engines Results for Personalization and User Tracking

arXiv · · Notable

Summary

This paper presents six experiments evaluating personalization and user tracking in web search engine results. The experiments involve comparing search results based on VPN location (including UAE vs others), logged-in status, network type, search engine, browser, and trained Google accounts. The study measures total hits, first hit, and correlation between hits to identify patterns of personalization. Why it matters: The findings shed light on the extent of filter bubble effects and potential biases in search results for users in the UAE and globally.

Get the weekly digest

Top AI stories from the GCC region, every week.

Related

The Saudi Privacy Policy Dataset

arXiv ·

A new dataset called the Saudi Privacy Policy Dataset is introduced, which contains Arabic privacy policies from various sectors in Saudi Arabia. The dataset is annotated based on the 10 principles of the Personal Data Protection Law (PDPL) and includes 1,000 websites, 4,638 lines of text, and 775,370 tokens. The dataset aims to facilitate research and development in privacy policy analysis, NLP, and machine learning applications related to data protection.

SectEval: Evaluating the Latent Sectarian Preferences of Large Language Models

arXiv ·

The paper introduces SectEval, a new benchmark to evaluate sectarian biases in LLMs concerning Sunni and Shia Islam, available in English and Hindi. Results show significant inconsistencies in LLM responses based on language, with some models favoring Shia responses in English but Sunni in Hindi. Location-based experiments further reveal that advanced models adapt their responses based on the user's claimed country, while smaller models exhibit a consistent Sunni-leaning bias.

Is AI Catching Up to Human Expression? Exploring Emotion, Personality, Authorship, and Linguistic Style in English and Arabic with Six Large Language Models

arXiv ·

This study investigates the ability of six large language models, including Jais, Mistral, and GPT-4o, to mimic human emotional expression in English and personality markers in Arabic. The researchers evaluated whether machine classifiers could distinguish between human-authored and AI-generated texts and assessed the emotional/personality traits exhibited by the LLMs. Results indicate that AI-generated texts are distinguishable from human-authored ones, with classification performance impacted by paraphrasing, and that LLMs encode affective signals differently than humans. Why it matters: The findings have implications for authorship attribution, affective computing, and the responsible deployment of AI, especially in under-resourced languages like Arabic.

Evaluating Models and their Explanations

MBZUAI ·

This article discusses the increasing concerns about the interpretability of large deep learning models. It highlights a talk by Danish Pruthi, an Assistant Professor at the Indian Institute of Science (IISc), Bangalore, who presented a framework to quantify the value of explanations and the need for holistic model evaluation. Pruthi's talk touched on geographically representative artifacts from text-to-image models and how well conversational LLMs challenge false assumptions. Why it matters: Addressing interpretability and evaluation is crucial for building trustworthy and reliable AI systems, particularly in sensitive applications within the Middle East and globally.