The Qatar Computing Research Institute (QCRI) has introduced PDNS-Net, a large heterogeneous graph dataset for malicious domain classification, containing 447K nodes and 897K edges. It is significantly larger than existing heterogeneous graph datasets like IMDB and DBLP. Preliminary evaluations using graph neural networks indicate that further research is needed to improve model performance on large heterogeneous graphs. Why it matters: This dataset will enable researchers to develop and benchmark graph learning algorithms on a scale relevant to real-world cybersecurity applications, particularly for identifying and mitigating malicious online activity.
A new mini-batch strategy using aggregated relational data is proposed to fit the mixed membership stochastic blockmodel (MMSB) to large networks. The method uses nodal information and stochastic gradients of bipartite graphs for scalable inference. The approach was applied to a citation network with over two million nodes and 25 million edges, capturing explainable structure. Why it matters: This research enables more efficient community detection in massive networks, which is crucial for analyzing complex relationships in various domains, but this article has no clear connection to the Middle East.
Emilio Porcu from Khalifa University presented on temporally evolving generalized networks, where graphs evolve over time with changing topologies. The presentation addressed challenges in building semi-metrics and isometric embeddings for these networks. The research uses kernel specification and network-based metrics and is illustrated using a traffic accident dataset. Why it matters: This work advances the application of kernel methods to dynamic graph structures, relevant for modeling evolving relationships in various domains.
Natasa Przulj at the Barcelona Supercomputing Center is developing an AI framework that fuses multi-omic data to improve precision medicine. The framework uses graph-regularized non-negative matrix tri-factorization (NMTF) and network science algorithms for patient stratification, biomarker prediction, and drug repurposing. It is applied to diseases like cancer, Covid-19, and Parkinson's. Why it matters: This research can enable more personalized and effective treatments by leveraging complex biological data to understand disease mechanisms and tailor therapies.
KAUST hosted the New Challenges in Heterogeneous Catalysis research conference from January 29-31. The conference brought together catalysis researchers from KAUST and abroad to inspire future research and discuss challenges in heterogeneous catalysis. Discussions focused on new chemistry, catalytic materials, understanding catalytic processes, and activation of small molecules like methane and carbon dioxide. Why it matters: Catalysis research is crucial for KAUST's research thrusts in food, water, energy, and environment, contributing to sustainable development and green chemistry in the region.
Kimon Fountoulakis from the University of Waterloo presented a talk on machine learning on graphs, covering node classification and algorithmic reasoning. The talk discussed the limitations and strengths of graph neural networks (GNNs). It also covered novel optimal architectures for node classification and the ability of looped GNNs to execute classical algorithms. Why it matters: Understanding GNN capabilities is crucial for advancing AI applications in areas like recommendation systems and drug discovery that rely on relational data.
KAUST Associate Professor Xiangliang Zhang leads the Machine Intelligence and Knowledge Engineering (MINE) group, focusing on machine learning and data mining algorithms for AI applications. The MINE group researches complex graph data to profile nodes, predict links, detect computing communities, and understand their connections. Zhang's team also works on graph alignment and recommender systems. Why it matters: This research contributes to advancing machine learning techniques at a leading GCC institution, potentially impacting various AI applications in the region.
The paper introduces Duet, a hybrid neural relation understanding method for cardinality estimation. Duet addresses limitations of existing learned methods, such as high costs and scalability issues, by incorporating predicate information into an autoregressive model. Experiments demonstrate Duet's efficiency, accuracy, and scalability, even outperforming GPU-based methods on CPU.