PDNS-Net: A Large Heterogeneous Graph Benchmark Dataset of Network Resolutions for Graph Learning

arXiv · March 15, 2022 · Notable

Research Dataset Qatar Cybersecurity Graph Learning

Summary

The Qatar Computing Research Institute (QCRI) has introduced PDNS-Net, a large heterogeneous graph dataset for malicious domain classification, containing 447K nodes and 897K edges. It is significantly larger than existing heterogeneous graph datasets like IMDB and DBLP. Preliminary evaluations using graph neural networks indicate that further research is needed to improve model performance on large heterogeneous graphs. Why it matters: This dataset will enable researchers to develop and benchmark graph learning algorithms on a scale relevant to real-world cybersecurity applications, particularly for identifying and mitigating malicious online activity.

Keywords

heterogeneous graph · dataset · graph learning · QCRI · malicious domain

Read original article →

Get the weekly digest

Top AI stories from the GCC region, every week.

TII-SSRC-23 Dataset: Typological Exploration of Diverse Traffic Patterns for Intrusion Detection

arXiv · Sep 14

Researchers introduce TII-SSRC-23, a new network intrusion detection dataset designed to improve the diversity and representation of modern network traffic for machine learning models. The dataset includes a range of traffic types and subtypes to address the limitations of existing datasets. Feature importance analysis and baseline experiments for supervised and unsupervised intrusion detection are also provided.

Duet: efficient and scalable hybriD neUral rElation undersTanding

arXiv · Jul 25

The paper introduces Duet, a hybrid neural relation understanding method for cardinality estimation. Duet addresses limitations of existing learned methods, such as high costs and scalability issues, by incorporating predicate information into an autoregressive model. Experiments demonstrate Duet's efficiency, accuracy, and scalability, even outperforming GPU-based methods on CPU.

The Saudi Privacy Policy Dataset

arXiv · Apr 5

A new dataset called the Saudi Privacy Policy Dataset is introduced, which contains Arabic privacy policies from various sectors in Saudi Arabia. The dataset is annotated based on the 10 principles of the Personal Data Protection Law (PDPL) and includes 1,000 websites, 4,638 lines of text, and 775,370 tokens. The dataset aims to facilitate research and development in privacy policy analysis, NLP, and machine learning applications related to data protection.