SpaceNet 8: A Closer Look at the Winning Approaches

9 min readDec 12, 2022

Editor’s note: SpaceNet is an initiative dedicated to accelerating open-source, artificial intelligence applied research for geospatial applications, specifically computer vision for foundational mapping (i.e., building footprint and road network detection). SpaceNet is run by co-founder Maxar and our partners Amazon Web Services (AWS), IEEE-GRSS, Oak Ridge National Laboratory and Topcoder.

Introduction

In this blog, we will explore how each of the top-scoring competitors approached the SpaceNet 8 challenge. This competition challenged participants to develop a multiclass feature extraction and characterization solution for flood detection, leveraging data and algorithms from previous challenges as well as a new dataset and baseline.

Performance Summary

First, as a refresher, let’s revisit scores and inference times for the top five submissions.

*Submissions were tested on an AWS EC2 p3.8xlarge instance; inference time does not influence a submission’s final score

Baseline

The following algorithmic baseline was provided to competitors as a starting point for the challenge. The baseline was developed by Oak Ridge National Laboratory and the broader SpaceNet team.

Data Assembly and Pre-Processing
The baseline solution relied only on the provided SpaceNet 8 training data, converting GeoJSON labels for roads, buildings and flooding into image segmentation masks.

Building & Road Detection
The baseline solution leverages a U-Net style segmentation model with a ResNet34 encoder. The architecture includes two convolutional output layers, one with two output channels (building and background) for building detection and one with eight channels (0–7 for road speed and background) for road detection. The loss function consists of binary cross-entropy (BCE) loss for building segmentation and focal and soft-Dice loss for road segmentation.

Flood Detection
The flood detection network consists of a Siamese U-Net segmentation model, also with a ResNet34 encoder. The model’s two branches have shared weights and their output features are concatenated before two final convolutional layers produce a final flood prediction mask.

Post-processing and Improvements
Segmentation masks produced by these models are simplified and refined into road and building vectors. Afterwards, small gaps and disconnected portions are removed from the road network, and buildings below a certain size are also eliminated. Flood or non-flood assignments for road and building features are decided by the majority vote of all pixels from the flood prediction mask that intersect the feature vector.

The SpaceNet 8 baseline algorithm included networks for identification of foundational features, flood attributes and post-processing of raster to vector outputs.

Takeaways

Most competitors performed data augmentation with alternative referenceable datasets or previous SpaceNet datasets for training to improve dataset diversity and model adaptability.

Just like in real-world scenarios, competitors noticed that some post-event images did not correspond perfectly to the same pre-event images due to cloud cover, observed alignment issues, or missing regions. They had to adjust their data pre-processing accordingly. These were known issues with the dataset and very common issue when working with pre- and post event imagery for real-world applications.

All winning competitors used ensembles of neural networks and three of the winners used variants of U-Net, the architecture leveraged by the baseline solution.

Recognizing that floods typically cover wide areas at a time rather than isolated road and building instances, most competitors used rule-based post-processing techniques on their flood detection outputs to reduce false positives.

First Place: Ohhan777 (KARI-AI)

The winning solution came from a group of five competitors from Korea.

Data Assembly and Pre-Processing
This solution relied primarily on the original SpaceNet 8 dataset, applying extensive augmentations to the provided images. These augmentations included adjustments to image color, clarity, hue and saturation, brightness and contrast.

Building & Road Detection
KARI-AI created individual models for building and road segmentation using a combination of HRNet, a convolutional neural network commonly used for semantic segmentation, and object-contextual representations (OCR). Like the baseline algorithm, they defined two output channels for building detection and eight channels for road detection, The group then integrated these algorithms into a single model that shared an HRNet backbone, improving training time with little effect on accuracy. The group used BCE loss for building segmentation and region mutual information (RMI) loss for road speed segmentation.

Flood Detection
The team implemented a flood detection algorithm using a Siamese HRNet and OCR model trained with RMI loss, assembling this model into a network with the road and building segmentation algorithm.

Post-processing and Improvements
Most notably, the team introduced a conservative flood threshold value to avoid false positive flood identification. Furthermore, they treated isolated flooded buildings and roads within a mostly non-flooded image as false detections.

Louisiana AOI Test Public roads reference data (top) versus proposed solution from Ohhan7777 (bottom). Normal roads are labeled in green, and flooded roads are labeled in blue. The road vectors were extracted from the pre-flood event imagery and flood labels were determined from the post-flood imagery.

Second Place: number13

The second-place solution was produced by an individual who participates in research and development in machine learning and remote sensing.

Data Assembly and Pre-Processing
Recognizing the relatively small size of the SpaceNet 8 dataset, this competitor leveraged additional data from SpaceNet competitions 2, 3 and 5 while also augmenting the SpaceNet 8 data heavily. In cases that multiple post-event images were provided for a region (e.g., due to cloud cover), this competitor fused them using median averaging.

Building & Road Detection
number13 trained four U-Net models — initialized with weights from training on previous SpaceNet datasets — on pre-event imagery to produce road and building masks. Given an imbalance in the number of annotated road images and annotated building images from previous competitions, they ensured that the same number of road and building samples appeared in each training batch. The competitor used a combination of focal and Dice loss for this training. In addition to training augmentations, the competitor also performed test time augmentation on the imagery during inference.

Flood Detection
For flood segmentation, the competitor used a Siamese U-Net initialized with ResNet50 foundation model weights. During training, the competitor introduced an additional classification loss in which an image was labeled as flooded if it contained any flooded segment. Otherwise, the image was labeled as non-flooded. The loss function for segmenting flooded and non-flooded instances was a combination of focal and Dice loss, and the loss function for the classification task was BCEWithLogits loss.

Post-processing and Improvements
Before inference, this competitor’s algorithm padded each image before cropping it afterwards to ensure better alignment and connection of roads near the image border. Also, given the wide area impact typical of flooding, the competitor’s algorithm labeled all roads and buildings in an image as flooded if a certain threshold of other roads and buildings in the image were identified as flooded.

Third Place: SIAnalytics

Third place was awarded to a team comprising of two research scientists and two machine learning engineers at SI Analytics along with a graduate student at the Korea Advanced Institute of Science & Technology.

Data Assembly and Pre-Processing
Like other competitors, this group used data from SpaceNet challenges 2 and 3 as well as other external datasets. They increased dataset diversity by applying various image transformations, including random fog and shadow adjustments and photometric distortions, as well as different crops, flips, resizes and rotations.

Building & Road Detection
For building detection, the group chose a Swin Transformer backbone pretrained with ImageNet 22k and a UPerNet decoder with two output channels. This model was trained in two stages, first with external data and next with the SpaceNet8 dataset. They used a combination of cross-entropy, Dice and Lovasz loss for the first stage of training and focal, Dice and Lovasz loss for the second. A similar model pretrained with ResNet22k with an SegFormer decoder was used for road segmentation, also trained in two stages with focal, Dice and Lovasz loss.

Flood Detection
Flood detection relied on a Siamese encoder network with a Swin Transformer backbone pretrained on ImageNet 22k and an UPerNet decoder. This model was trained in a single stage with a combination of focal, Dice and Lovasz loss.

Post-processing and Improvements
This algorithm mainly followed baseline post-processing. However, hyperparameters for flood predictions were adjusted such that the network favored positive flood prediction. Additionally, the solution standardized inputs to the same resolution and used reflect padding to better extract roads and buildings near image boundaries.

Fourth Place: ZABURO

The fourth-place competitor, working individually, is a Software Engineer at Preferred Networks, Inc.

Data Assembly and Pre-Processing
This competitor used data from previous SpaceNet competitions 2, 3 and 5 to increase variation, resampling SpaceNet2 images to account for differences in resolution. He also used the xView2 xBD dataset for flood segmentation training.

Building & Road Detection
For building segmentation, this competitor used a U-Net model with an EfficientNetV2-S encoder and two output channels: building body and border. This model was trained with a combination of BCE and Dice loss. A similar model, EfficientNetV2-M, with 9 output channels (0–7 for road speed, road skeleton and road junction) was trained with focal and Dice loss for road detection. This road-detection approach was closely based on the winning model produced by SpaceNet 5. Intersection detection was added as an auxiliary task to improve the accuracy of road network predictions.

Flood Detection
This competitor used a Siamese U-Net model with a ResNet50 encoder. This model was trained using a loss function involving BCE and Dice loss.

Post-processing and Improvements
For the most part, this competitor used the baseline post-processing to clean the output of his model. For road segmentation, the algorithm discarded short, isolated roads and joined nearby road segment endpoints into intersections.

Fifth Place: motokimura

The fifth-place solution was produced by a computer vision research engineer.

Data Assembly and Pre-Processing
motokimura excluded certain tiles that he found contained inconsistencies from training and validation. He also discarded some misaligned or cloudy post-event images for which duplicate, more clear images existed. Furthermore, he generated additional training samples by joining adjacent tiles together.

Building & Road Detection
For building and road segmentation, this competitor used two U-Net models with EfficientNet-B5 and -B6 encoders. These included five output channels: building body, building border, building contact, road skeleton and road junction. The competitor also included two pretrained models, the winning road segmentation model from SpaceNet5 (U-Net with an SE-ResNeXt-50 encoder) and a building segmentation model from the xView2 Data Challenge (U-Net with a DenseNet-161 encoder). These models were fine-tuned on SpaceNet 8 building and road labels, and all models used the same combination of BCE and Dice loss for training. The outputs from these models were averaged to arrive at a final prediction.

Flood Detection
Flood detection was achieved using two Siamese U-Net models with EfficientNet-B5 and -B6 encoders. An additional flooded building segmentation model (U-Net with a DenseNet-161 encoder) from the xView2 Data Challenge was ensembled with these models. Each model was trained using the same combination of BCE and Dice loss. Again, averaging was performed to combine the outputs from these models.

Post-processing and Improvements
Following the baseline post-processing, this competitor adjusted thresholds for model outputs and performed dilation on the flooded road mask to account for misalignments between the road skeleton and flooded road mask.

Example region of pre-event imagery from SpaceNet 8 Test Public Louisiana AOI. Imagery © 2020 Maxar Technologies.

Example SN8 Test Public Louisiana AOI post-event imagery reference building, roads and flood labels. Imagery © 2021 Maxar Technologies.

Example SN8 Test Public Louisiana AOI post-event imagery reference building and roads (green) derived from pre-event imagery; and flood labels (blue) derived from this post-event imagery. Imagery © 2021 Maxar Technologies.

Example SN8 Test Public Louisiana AOI post-event imagery building, roads and flood proposals from motokimura, the fifth-place winner. Imagery © 2021 Maxar Technologies.

Conclusion

Congratulations to the SpaceNet 8 winners and thank you to all the SpaceNet participants and contributors. We are looking forward to SpaceNet 9 in 2023!