Xi Chen from NYU Stern gave a talk at MBZUAI on digital privacy in personalized pricing using differential privacy. The talk also covered research in Web3 and decentralized finance, including delta hedging liquidity positions on Uniswap V3. Chen highlighted open problems in decentralized finance during the presentation. Why it matters: The talk suggests MBZUAI's interest in exploring the intersection of AI, privacy, and blockchain technologies, reflecting growing trends in data protection and decentralized systems.
MBZUAI Assistant Professor Samuel Horváth is researching federated learning to address the tension between data privacy and the predictive power of machine learning models. Federated learning trains models on decentralized data, keeping sensitive information on devices. Horváth's research focuses on designing algorithms that can efficiently train on distributed data while respecting user privacy. Why it matters: This work is crucial for advancing AI in sensitive domains like healthcare, where privacy regulations limit centralized data collection.
This paper introduces DaringFed, a novel dynamic Bayesian persuasion pricing mechanism for online federated learning (OFL) that addresses the challenge of two-sided incomplete information (TII) regarding resources. It formulates the interaction between the server and clients as a dynamic signaling and pricing allocation problem within a Bayesian persuasion game, demonstrating the existence of a unique Bayesian persuasion Nash equilibrium. Evaluations on real and synthetic datasets demonstrate that DaringFed optimizes accuracy and convergence speed and improves the server's utility.
A new dataset called the Saudi Privacy Policy Dataset is introduced, which contains Arabic privacy policies from various sectors in Saudi Arabia. The dataset is annotated based on the 10 principles of the Personal Data Protection Law (PDPL) and includes 1,000 websites, 4,638 lines of text, and 775,370 tokens. The dataset aims to facilitate research and development in privacy policy analysis, NLP, and machine learning applications related to data protection.
Sai Praneeth Karimireddy from UC Berkeley presented a talk on building planetary-scale collaborative intelligence, highlighting the challenges of using distributed data in machine learning due to data silos and ethical-legal restrictions. He proposed collaborative systems like federated learning as a solution to bring together distributed data while respecting privacy. The talk addressed the need for efficiency, reliability, and management of divergent goals in these systems, suggesting the use of tools from optimization, statistics, and economics. Why it matters: Collaborative AI systems can unlock valuable distributed data in the region, especially in sensitive sectors like healthcare, while ensuring privacy and addressing ethical concerns.
MBZUAI researchers have developed a new method called "Byzantine antidote" (Bant) to defend federated learning systems against Byzantine attacks, where malicious nodes intentionally disrupt the training process. Bant uses trust scores and a trial function to dynamically filter out corrupted updates, even when most nodes are compromised. The system can identify poorly labeled data while still training models effectively, addressing both unconscious mistakes and deliberate sabotage. Why it matters: This research enhances the reliability and security of federated learning in sensitive sectors like healthcare and finance, enabling safer collaborative AI development.
Researchers are exploring methods for evaluating the outcome of actions using off-policy observations where the context is noisy or anonymized. They employ proxy causal learning, using two noisy views of the context to recover the average causal effect of an action without explicitly modeling the hidden context. The implementation uses learned neural net representations for both action and context, and demonstrates outperformance compared to an autoencoder-based alternative. Why it matters: This research addresses a key challenge in applying AI in real-world scenarios where data privacy or bandwidth limitations necessitate working with noisy or anonymized data.