Super-resolution

Introduction¶

In this section, we will perform the super-resolution of Sentinel-2 images to 1.5m spacing, using a simple architecture. Our model is simply trained to minimize the l1 loss between a reference high-resolution image (our Spot-7 image acquired over Paris) and the upscaled low-resolution image (a Sentinel-2 image).

Dataset¶

To create our dataset, we use the STAC API of Microsoft Planetary computer to retrieve a Sentinel-2 image superimposing the spot-7 image (the same one that we have used in the semantic segmentation, Dataset section).

Question

Create sr_dataset.py to implement the creation of the dataset. The script must accomplish the following operations end-to-end, without writing any intermediate file:
1. Use Microsoft Planetary Computer to grab a cloud-free Sentinel-2 image, acquired over the same place as the Spot-7 image, between 2022-06-15 and 2022-08-15
2. Use the Superimpose OTB application to resample the four Sentinel-2 image bands to the multispectral channel of the Spot-7 image. Don't forget to re-order the bands so they match the Spot-7 image! (4, 3, 2, 8)
3. Perform the pansharpening of the Spot-7 image with the BundleToPerfectSensor OTB application,
4. Use PatchesSelection and PatchesExtraction to perform the creation of patches images (for train, valid, test datasets).

flowchart LR

api[Planetary STAC API] -- urls --> c[ConcatenateImages]
c -- "10m spacing" --> s[Superimpose]
im([Spot-7 XS image]) --> pxs[BundleToPerfectSensor]
imp([Spot-7 Pan image]) --> pxs
im -- ref --> s
s -- "6m spacing" --> PatchesSelection -- "patches centers" --> PatchesExtration --> p([patches])
pxs -- "pansharpened image" --> PatchesExtration

Network¶

Our network consist in a convolution with stride 1, followed with a number of successive residual blocks, followed with two upscaling layers (that can be implemented using two transposed convolution layers with stride 2) with some skip connections as described in the flowchart below. We use large kernel in the first and last convolutional layer (e.g. 7x7).

Model:

flowchart LR

lr([LR input]) -- 32x32 --> c1[Conv 7x7 + activation] 

subgraph "Model"
c1 --> r[Residual blocks] 
r -- 32x32--> add["+"]
c1 --> add --> tr1[Transposed Conv 1]
tr1 -- 64x64 --> tr2[Transposed Conv 2]
tr2 -- 128x128 --> c2[Conv 7x7 + activation]
end

c2 -- 128x128 --> hr([Synth HR output])

subgraph "Residual block"
i([input]) --> c["Conv1 + activation"] --> Conv2 
Conv2 --> p(("+")) --> activation --> o([output])
i --> p
end

Question

Implement the model in SRModel deriving from otbtf.ModelBase
In the model normalize_inputs(), normalize the input image by applying a scaling of 1e-4
Normalize the target image in the dataset preprocessing function using the same scaling value
You can create an extra output that won't be used during training, that de-normalizes the output (i.e. which applies a scaling of 1e4 on the output). We could use this output to generate directly a 16-bits output image.

Training setup¶

Now we train the network in order to minimize the l1 or l2 distance between the Spot-7 image and the upscaled Sentinel-2 image.

flowchart LR

lr([LR input]) -- 32x32 --> n[Model]
n -- 128x128 --> hr([Synth HR output])
hrr([Real HR output]) --> l[l1 loss]
hr --> l --> Optimize

Question

Use callbacks to monitor the losses, and to stop the training when the loss doesn't improve anymore.
Use image summaries to monitor in real time the super-resolution results during the training
Compute the l1 and l2 losses over the upscaled image at the end of the training, on the saved model, over the test dataset.

Inference¶

Perform the inference using the super-resolution network.

Question

Use TensorflowModelServe to perform the inference with a reasonable cropping margin (you can use the analysis performed in in the semantic segmentation, Inference section to determine a good value)
Open the upsampled images in QGIS, and compare with the original Spot-7 and Sentinel-2 images