Search

Results for "hypersphere"

Understanding modern machine learning models through the lens of high-dimensional statistics

MBZUAI · Invalid Date

This talk explores modern machine learning through high-dimensional statistics, using random matrix theory to analyze learning models. The speaker, Denny Wu from University of Toronto and the Vector Institute, presents two examples: hyperparameter selection in overparameterized models and gradient-based representation learning in neural networks. The analysis reveals insights such as the possibility of negative optimal ridge penalty and the advantages of feature learning over random features. Why it matters: This research provides a deeper theoretical understanding of deep learning phenomena, with potential implications for optimizing training and improving model performance in the region.

Deep Surface Meshes

MBZUAI · Invalid Date

Pascal Fua from EPFL presented an approach to implementing convolutional neural nets that output complex 3D surface meshes. The method overcomes limitations in converting implicit representations to explicit surface representations. Applications include single view reconstruction, physically-driven shape optimization, and bio-medical image segmentation. Why it matters: This research advances geometric deep learning by enabling end-to-end trainable models for 3D surface mesh generation, with potential impact on various applications in computer vision and biomedical imaging in the region.

The Cylindrical Representation Hypothesis for Language Model Steering

arXiv · May 3

Researchers from MBZUAI have proposed the Cylindrical Representation Hypothesis (CRH) to explain the instability and unpredictability observed in large language model steering. CRH relaxes the orthogonality assumption of the existing Linear Representation Hypothesis, positing a cylindrical structure where a central axis captures concept differences and a surrounding normal plane controls steering sensitivity. The hypothesis suggests that the intrinsic uncertainty in identifying specific sensitive sectors within this normal plane accounts for why steering outcomes frequently fluctuate even with well-aligned directions. Why it matters: This research offers a more robust theoretical framework for understanding and potentially improving the control and reliability of large language models.

Generative models, manifolds and symmetries: From QFT to molecules

MBZUAI · Invalid Date

A DeepMind researcher presented work on incorporating symmetries into machine learning models, with applications to lattice-QCD and molecular dynamics. The work includes permutation and translation-invariant normalizing flows for free-energy estimation in molecular dynamics. They also presented U(N) and SU(N) Gauge-equivariant normalizing flows for pure Gauge simulations and its extensions to incorporate fermions in lattice-QCD. Why it matters: Applying symmetry principles to generative models could improve AI's ability to model complex physical systems relevant to materials science and other fields in the region.