The Vulas vulnerability dataset is available for free download Antonino Sabetta Last updated on 19 Feb 2019 2 min read 0 Comments I am very happy to announce that my team at SAP Security Research has just released a dump of the vulnerability knowledge base that we have curated over the past four years while developing and operating our vulnerability-assessment-tool (internally known as Vulas).

A description of the dataset and a few interesting statistics are provided in our paper A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software.

The data was obtained both from the National Vulnerability Database (NVD) and from project-specific Web resources that we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct open-source Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE identifier at all and 46, which do have a CVE identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories and to augment the attributes available for each instance.

DOWNLOAD The dataset and the related scripts can be downloaded freely from GitHub. If you use the dataset in your own work, please cite our paper as follows:

@MISC{ponta2019dataset, author={Serena E. Ponta and Henrik Plate and Antonino Sabetta and Michele Bezzi and C´edric Dangremont}, url={}, title={A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software}, year=2019, month=February, }

Principal Research Scientist

I am curious about software engineering, machine learning, software security, and open-source software.