Nicholas Institute for Environmental Policy Solutions
March 2020

The Synthinel-1 Dataset: A Collection of High Resolution Synthetic Overhead Imagery for Building Segmentation

Authors
Type
Pages
The Synthinel-1 Dataset: A Collection of High Resolution Synthetic Overhead Imagery for Building Segmentation cover

Recently deep learning - namely convolutional neural networks (CNNs) - have yielded impressive performance for the task of building segmentation on large overhead (e.g., satellite) imagery benchmarks. However, these benchmark datasets only capture a small fraction of the variability present in real-world overhead imagery, limiting the ability to properly train, or evaluate, models for real-world application. Unfortunately, developing a dataset that captures even a small fraction of real-world variability is typically infeasible due to the cost of imagery, and manual pixel-wise labeling of the imagery. In this work we develop an approach to rapidly and cheaply generate large and diverse synthetic overhead imagery for training segmentation CNNs. Using this approach, we generate and publicly-release a collection of synthetic overhead imagery, termed Synthinel-1, with full pixel-wise building labels. We use several benchmark datasets to demonstrate that Synthinel-1 is consistently beneficial when used to augment real-world training imagery, especially when CNNs are tested on novel geographic locations or conditions.