This paper introduces neural Bayes estimators for censored peaks-over-threshold models, enhancing computational efficiency in spatial extremal dependence modeling. The method uses data augmentation to encode censoring information in the neural network input, challenging traditional likelihood-based approaches. The estimators were applied to assess extreme particulate matter concentrations over Saudi Arabia, demonstrating efficacy in high-dimensional models. Why it matters: The research offers a computationally efficient alternative for environmental modeling and risk assessment in the region.
This article discusses methods for handling label noise in deep learning, including extracting confident examples and modeling label noise. Tongliang Liu from the University of Sydney presented these approaches. The talk aimed to provide participants with a basic understanding of learning with noisy labels. Why it matters: As AI models are increasingly trained on large, noisy datasets, techniques for robust learning become crucial for reliable real-world performance.
A new framework for constructing confidence sets for causal orderings within structural equation models (SEMs) is presented. It leverages a residual bootstrap procedure to test the goodness-of-fit of causal orderings, quantifying uncertainty in causal discovery. The method is computationally efficient and suitable for medium-sized problems while maintaining theoretical guarantees as the number of variables increases. Why it matters: This offers a new dimension of uncertainty quantification that enhances the robustness and reliability of causal inference in complex systems, but there is no indication of connection to the Middle East.
MBZUAI researchers collaborated with Carnegie Mellon University and the Broad Institute of MIT and Harvard to develop a new statistical method for analyzing data used for gene regulatory network inference. The method addresses the challenge of distinguishing true zero expression values from dropouts in single-cell RNA sequencing data. This research will be presented at the Twelfth International Conference on Learning Representations (ICLR 2024). Why it matters: Improving gene regulatory network inference can lead to better understanding of disease mechanisms and inform the development of new medicines.
The paper examines the performance of pre-trained Arabic language models on Arabic text intentionally stripped of diacritical dots to evade content classification. It proposes methods to support these "undotted" texts without retraining the models. The proposed methods achieve nearly perfect performance on one downstream task. Why it matters: The research highlights a vulnerability in Arabic NLP and offers solutions to maintain performance in the face of adversarial text manipulation.
Researchers are exploring methods for evaluating the outcome of actions using off-policy observations where the context is noisy or anonymized. They employ proxy causal learning, using two noisy views of the context to recover the average causal effect of an action without explicitly modeling the hidden context. The implementation uses learned neural net representations for both action and context, and demonstrates outperformance compared to an autoencoder-based alternative. Why it matters: This research addresses a key challenge in applying AI in real-world scenarios where data privacy or bandwidth limitations necessitate working with noisy or anonymized data.
This paper introduces a unified deep autoregressive model (UAE) for cardinality estimation that learns joint data distributions from both data and query workloads. It uses differentiable progressive sampling with the Gumbel-Softmax trick to incorporate supervised query information into the deep autoregressive model. Experiments show UAE achieves better accuracy and efficiency compared to state-of-the-art methods.