MBZUAI Professor Fakhri Karray and co-authors from the University of Waterloo have published "Elements of Dimensionality Reduction and Manifold Learning," a textbook on methods for extracting useful components from large datasets. The book addresses the challenge of the "curse of dimensionality," where growth in datasets complicates their use in machine learning. Karray developed the material from a popular course he taught at Waterloo. Why it matters: The textbook provides a unified resource for students and researchers in machine learning and AI, addressing a foundational challenge in processing high-dimensional data, relevant to diverse applications in the region.
Xiaolin Huang from Shanghai Jiao Tong University presented a talk at MBZUAI on training deep neural networks in tiny subspaces. The talk covered the low-dimension hypothesis in neural networks and methods to find subspaces for efficient training. It suggests that training in smaller subspaces can improve training efficiency, generalization, and robustness. Why it matters: Investigating efficient training methods is crucial for resource-constrained environments and can enable broader access to advanced AI.
This article discusses approximating a high-dimensional distribution using Gaussian variational inference by minimizing Kullback-Leibler divergence. It builds upon previous research and approximates the minimizer using a Gaussian distribution with specific mean and variance. The study details approximation accuracy and applicability using efficient dimension, relevant for analyzing sampling schemes in optimization. Why it matters: This theoretical research can inform the development of more efficient and accurate AI algorithms, particularly in areas dealing with high-dimensional data such as machine learning and data analysis.
A talk introduces a computational framework for learning a compact structured representation for real-world datasets, that is both discriminative and generative. It proposes to learn a closed-loop transcription between the distribution of a high-dimensional multi-class dataset and an arrangement of multiple independent subspaces, known as a linear discriminative representation (LDR). The optimality of the closed-loop transcription can be characterized in closed-form by an information-theoretic measure known as the rate reduction. Why it matters: The framework unifies concepts and benefits of auto-encoding and GAN and generalizes them to the settings of learning a both discriminative and generative representation for multi-class visual data.
This talk explores modern machine learning through high-dimensional statistics, using random matrix theory to analyze learning models. The speaker, Denny Wu from University of Toronto and the Vector Institute, presents two examples: hyperparameter selection in overparameterized models and gradient-based representation learning in neural networks. The analysis reveals insights such as the possibility of negative optimal ridge penalty and the advantages of feature learning over random features. Why it matters: This research provides a deeper theoretical understanding of deep learning phenomena, with potential implications for optimizing training and improving model performance in the region.
The article discusses the importance of sample correlations in computer graphics, vision, and machine learning, highlighting how tailored randomness can improve the efficiency of existing models. It covers various correlations studied in computer graphics and tools to characterize them, including the use of neural networks for developing different correlations. Gurprit Singh from the Max Planck Institute for Informatics will be presenting on the topic. Why it matters: Optimizing sampling techniques via understanding and applying correlations can lead to significant advancements and efficiency gains across multiple AI fields.
Dr. Xinwei Sun from Microsoft Research Asia presented research on trustworthy AI, focusing on statistical learning with theoretical guarantees. The work covers methods for sparse recovery with false-discovery rate analysis and causal inference tools for robustness and explainability. Consistency and identifiability were addressed theoretically, with applications shown in medical imaging analysis. Why it matters: The research contributes to addressing key limitations of current AI models regarding explainability, reproducibility, robustness, and fairness, which are crucial for real-world applications in sensitive fields like healthcare.
This paper introduces a self-supervised learning method for point cloud analysis using an upsampling autoencoder (UAE). The model uses subsampling and an encoder-decoder architecture to reconstruct the original point cloud, learning both semantic and geometric information. Experiments show the UAE outperforms existing methods in shape classification, part segmentation, and point cloud upsampling tasks.