Skip to content
GCC AI Research

Search

Results for "data distribution"

The role of data-driven models in quantifying uncertainty

KAUST ·

KAUST Professor Raul Tempone, an expert in Uncertainty Quantification (UQ), has been appointed as an Alexander von Humboldt Professor at RWTH Aachen University in Germany. This professorship will enable him to further his research on mathematics for uncertainty quantification with new collaborators. Tempone believes the KAUST Strategic Initiative for Uncertainty Quantification (SRI-UQ) contributed to this award. Why it matters: This appointment enhances KAUST's visibility and facilitates cross-fertilization between European and KAUST research groups, benefiting both institutions and attracting talent.

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

arXiv ·

This paper introduces a unified deep autoregressive model (UAE) for cardinality estimation that learns joint data distributions from both data and query workloads. It uses differentiable progressive sampling with the Gumbel-Softmax trick to incorporate supervised query information into the deep autoregressive model. Experiments show UAE achieves better accuracy and efficiency compared to state-of-the-art methods.

Gaussian Variational Inference in high dimension

MBZUAI ·

This article discusses approximating a high-dimensional distribution using Gaussian variational inference by minimizing Kullback-Leibler divergence. It builds upon previous research and approximates the minimizer using a Gaussian distribution with specific mean and variance. The study details approximation accuracy and applicability using efficient dimension, relevant for analyzing sampling schemes in optimization. Why it matters: This theoretical research can inform the development of more efficient and accurate AI algorithms, particularly in areas dealing with high-dimensional data such as machine learning and data analysis.

TII-SSRC-23 Dataset: Typological Exploration of Diverse Traffic Patterns for Intrusion Detection

arXiv ·

Researchers introduce TII-SSRC-23, a new network intrusion detection dataset designed to improve the diversity and representation of modern network traffic for machine learning models. The dataset includes a range of traffic types and subtypes to address the limitations of existing datasets. Feature importance analysis and baseline experiments for supervised and unsupervised intrusion detection are also provided.

CTRL: Closed-Loop Data Transcription via Rate Reduction

MBZUAI ·

A talk introduces a computational framework for learning a compact structured representation for real-world datasets, that is both discriminative and generative. It proposes to learn a closed-loop transcription between the distribution of a high-dimensional multi-class dataset and an arrangement of multiple independent subspaces, known as a linear discriminative representation (LDR). The optimality of the closed-loop transcription can be characterized in closed-form by an information-theoretic measure known as the rate reduction. Why it matters: The framework unifies concepts and benefits of auto-encoding and GAN and generalizes them to the settings of learning a both discriminative and generative representation for multi-class visual data.

KAUST and the Big Data age

KAUST ·

KAUST held a research workshop on Optimization and Big Data, gathering researchers to discuss challenges and opportunities in the field. Speakers presented novel optimization algorithms and distributed systems for handling large datasets. The workshop featured 20 speakers from KAUST, global universities, and Microsoft Research. Why it matters: The event highlights KAUST's role as a regional hub for advancing research and development in big data and optimization, crucial for AI and various computational fields.

Understanding the COVID wave

KAUST ·

KAUST professor David Ketcheson uses mathematical modeling to understand COVID-19 transmission. He applies differential equations to explain the progression of SARS-CoV-2, utilizing the SIR model to predict the spread. Ketcheson's analysis suggests that the reproduction number for COVID-19 could be as high as 5, emphasizing the need for social distancing. Why it matters: This highlights the role of mathematical modeling and data analysis in understanding and predicting the spread of infectious diseases, particularly in the context of pandemic response.