NYU Open Research : Scholarly materials produced by members of the NYU community.
Published 2024 | Version v1
Dataset Open

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

Description

OpenABC-D is a large-scale labeled dataset generated by synthesizing open source hardware IPs using state-of-art logic synthesis tool yosys-abc. We consider 29 open-source hardware IP designs collected from various sources (MIT-CEP, IWLS, OpenROAD, OpenPiton etc) and synthesized them with 1500 random synthesis flows (we call them synthesis recipes).

Each synthesis flow has a predefined length L (L=20, in our case). We preserved all AIGs: starting, intermediate and final AIGs with labels like number of nodes, longest path, sequence of atomic synthesis transformations (rewrite, refactor, balance etc.) along with graph statistics, area and delay of final AIG.

We converted the AIGs in pytorch data format that can be directly used by a machine learning engineer lessening the effort of costly labeled data generation and pre-processing. OpenABC-D can be used for a variety of learning tasks on logic synthesis such as

  1. Predicting quality of result (QoR) performance of a *synthesis recipe* on a hardware IP.
  2. Area and delay prediction post techonolgy mapping.
  3. Learn functional and structural features of AIG using self-supervised labels (useful for tasks like RL-based logic synthesis)

Our dataset can easily be used for graph-based machine learning framework like Pytorch-Geometric.

Technical info

The dataset size is 1.4TB, so it has been equally divided into a multipart zip file of 14 parts, each of 107GB. For downloading and unzipping, minimum of 3TB disk space is required. To unzip, you must first concatenate the multipart zip file or use a zip repair option.

Other

Files over 50 GB cannot be downloaded via the web browser interface. Users can download large files via the UltraViolet API or via the Globus folder for this dataset (requires Globus login).

Files

CODE_README.md

Files (1.5 TB)

Name Size
CODE_README.md md5:97b539d254e062fea1029a53c3005363
13.6 kB Preview Download
DATA_README.md md5:669f6da5b2081847e0ce6ea082ca64c7
7.2 kB Preview Download
DatagenerationPipeline.png md5:93267b3a0db0dc299413ae89fcb00cdf
361.1 kB Preview Download
OPENABC_DATASET.z01 md5:01dfb65be1fccabfef415d816ad321c7
107.4 GB
OPENABC_DATASET.z02 md5:423a6771d79ef72ef552c375ec5c121d
107.4 GB
OPENABC_DATASET.z03 md5:2621efd1f4037077b924dd3ceb9c80a8
107.4 GB
OPENABC_DATASET.z04 md5:343a2a9ece855f08324b0cf0f44fc6aa
107.4 GB
OPENABC_DATASET.z05 md5:485cdb8d77a8a304fc3a2ee6dec1389d
107.4 GB
OPENABC_DATASET.z06 md5:a29cdeb09bd7e534e096fa8767340af1
107.4 GB
OPENABC_DATASET.z07 md5:a72fb7f7a425b3de015b289545d73e85
107.4 GB
OPENABC_DATASET.z08 md5:050f00c0e458a8ee1828bf193527b4ac
107.4 GB
OPENABC_DATASET.z09 md5:b7900029e7d5e9051661247439dbe5c8
107.4 GB
OPENABC_DATASET.z10 md5:d91771c9bb290267228a6a789b4f9ff3
107.4 GB
OPENABC_DATASET.z11 md5:8d67e160e67e0d7a983d6e32506299a7
107.4 GB
OPENABC_DATASET.z12 md5:99b97618cbf08660da19a10da971aa15
107.4 GB
OPENABC_DATASET.z13 md5:166755783524f91c3caddf706fe3ef48
107.4 GB
OPENABC_DATASET.zip md5:81062078d3db135171943b627d84a731
84.2 GB Preview

Additional details

Identifiers

Related works

Is described by
Preprint: arXiv:2110.11292 (arXiv)
Is required by
Software: https://github.com/NYU-MLDA/OpenABC (URL)
Is supplemented by
Dataset: https://zenodo.org/records/6399454 (URL)