SynDelay Dataset

Liming Xu, Yunbo Long 242 words 2 minutes Synthetic Delivery Delay Prediction

Artificial intelligence (AI) is transforming supply chain management, yet progress in predictive tasks—such as delivery delay prediction—remains constrained by the scarcity of high-quality, openly available datasets. Existing datasets are often proprietary, small, or inconsistently maintained, which hinders reproducibility and benchmarking.

We present SynDelay, a synthetic dataset specifically designed for delivery delay prediction. Generated using an advanced generative model trained on real-world data, SynDelay preserves realistic delivery patterns while ensuring privacy protection. Although not entirely free of noise or inconsistencies, SynDelay provides a challenging and practical testbed for advancing predictive modelling in supply chain AI.

To support adoption, we also provide baseline results and evaluation metrics as initial benchmarks—serving as reference points for transparent model comparison, rather than state-of-the-art claims.


Download

👉 The dataset and supporting code are freely available on GitHub: SynDelay Dataset and Code on GitHub


How to Cite

If you use SynDelay in your research, please cite the following paper:

Liming Xu, Yunbo Long, Alexandra Brintrup. SynDelay: A Synthetic Dataset for Delivery Delay Prediction, 2025.

@article{xu2025syndelay,
  title={SynDelay: A Synthetic Dataset for Delivery Delay Prediction},
  author={Xu, Liming and Long, Yunbo and Brintrup, Alexandra},
  year={2025}
}

⚠️ Disclaimer:
The datasets are provided "as-is" for informational and research purposes only. We make no guarantees regarding accuracy, completeness, or suitability for any specific application. Users are solely responsible for how they use the data, including compliance with relevant laws, regulations, and ethical guidelines. By accessing or downloading any dataset, you agree to use it at your own responsibility.