Theory

5 articles RSS ↗

What Really Counts: Theoretical and Empirical Aspects of Counting Behaviour in Simple RNNs

MBZUAI · Mar 25 · NLP Research

Nadine El Naggar from City, University of London presented research on RNN learning of counting behavior, formalizing it as Dyck-1 acceptance. Empirically, RNN models struggle to learn exact counting and fail on longer sequences, even when weights are correctly initialized. Theoretically, Counter Indicator Conditions (CICs) were proposed and proven necessary/sufficient for exact counting in single-cell RNNs, but experiments show these CICs are not found or are unlearned during training. Why it matters: This work highlights challenges in RNNs learning systematic tasks, suggesting gradient descent-based optimization may not achieve exact counting behavior with standard setups.

Understanding modern machine learning models through the lens of high-dimensional statistics

MBZUAI · Mar 25 · Research LLM

This talk explores modern machine learning through high-dimensional statistics, using random matrix theory to analyze learning models. The speaker, Denny Wu from University of Toronto and the Vector Institute, presents two examples: hyperparameter selection in overparameterized models and gradient-based representation learning in neural networks. The analysis reveals insights such as the possibility of negative optimal ridge penalty and the advantages of feature learning over random features. Why it matters: This research provides a deeper theoretical understanding of deep learning phenomena, with potential implications for optimizing training and improving model performance in the region.

SGD from the Lens of Markov process: An Algorithmic Stability Perspective

MBZUAI · Mar 25 · Research NLP

A Marie Curie Fellow from Inria and UIUC presented research on stochastic gradient descent (SGD) through the lens of Markov processes, exploring the relationships between heavy-tailed distributions, generalization error, and algorithmic stability. The research challenges existing theories about the monotonic relationship between heavy tails and generalization error. It introduces a unified approach for proving Wasserstein stability bounds in stochastic optimization, applicable to convex and non-convex losses. Why it matters: The work provides novel insights into the theoretical underpinnings of stochastic optimization, relevant to researchers at MBZUAI and other institutions in the region working on machine learning algorithms.

From Learning, to Meta-Learning, to Lego-Learning — theory, systems, and engineering

MBZUAI · Mar 25 · Research LLM

MBZUAI President Eric Xing delivered a talk at Carnegie Mellon University on May 13, 2022, titled “From Learning, to Meta-Learning, to Lego-Learning — theory, systems, and engineering.” Xing discussed the development of a standard model for learning, inspired by the standard model in physics, which aims to unify various machine learning paradigms. Before joining MBZUAI, Xing was a professor at CMU and founder of Petuum Inc., an AI development platform company. Why it matters: This talk highlights MBZUAI's leadership in advancing theoretical frameworks for machine learning and its commitment to unifying different AI approaches.

Problems in network archaeology: root finding and broadcasting

MBZUAI · Mar 25 · Research Theory

This article discusses a talk by Gábor Lugosi on "network archaeology," specifically the problems of root finding and broadcasting in large networks. The talk addresses discovering the past of dynamically growing networks when only a present-day snapshot is observed. Lugosi's research interests include machine learning theory, nonparametric statistics, and random structures. Why it matters: Understanding the evolution and origins of networks is crucial for various applications, including analyzing social networks, biological systems, and the spread of information.