Skip to content

Dealing with multiple sources

Deep nets can be used efficiently to deal with multiple heterogeneous sources, for instance in implementing a number of branches tailored for each data source, that eventually merge together at some point to fuse the information. For example, it is easy to build a deep neural networks that can deal simultaneously with time series and one very high resolution image: while one branch composed of a recurrent neural network processes the time series (For instance Sentinel-1 or Sentinel-2 images), another branch composed of a CNN processes the high resolution image (for instance Spot 6/7 images).

This section will present how can be used with any number of images sources.

Set the number of sources

The number of sources in applications can be controlled using the environment variable OTB_TF_NSOURCES. If this environment variable is undefined, the default value for the number of sources is 1.

In the following, we will use 2 sources from the Sentinel-2 image:

  1. Channels at 20m spacing (Spectral bands 5, 6, 7, 8a, 11 and 12), stacked in a single raster
  2. Channels at 10m spacing (Spectral bands 4, 3, 2 and 8), stacked in a single raster

The architecture inputs two images at different resolution to derive a single output at the desired resolution.

Command line interface

We can change the number of sources from the terminal:

export OTB_TF_NSOURCES=2

Then, we can check that the number of sources has been increased to 2 in OTBTF applications.

otbcli_PatchesExtraction -help

or

otbcli_TensorflowModelServe -help

As you can notice, the applications have now 2 sources, namely source1 and source2, each one with its parameters.

Python

In python we can either use os.environ["OTB_TF_NSOURCES"]="2" then use OTB applications through pyotb or otbApplication, or we can directly use an otbtf-specific feature of pyotb enabling to change the number of source at runtime with the n_sources argument:

app = pyotb.TensorflowModelServe(
  n_sources=2,
  ...
)

An example with Sentinel-2 10m and 20m bands

Dataset

We will use the PatchesExtraction application to create two patches images, one for the image of bands at 20m spacing, and another one for the image of bands at 10m spacing. We will sample patches of size 8x8for the image at 20m spacing, and patches of size 16x16for the image at 10m spacing.

part_2_sample_extraction_2_sources.py
import pyotb

# Input images
img_10m = "/data/s2_tokyo_10m.tif"
img_20m = "/data/s2_tokyo_20m.tif"

# Input samples locations
pos_a = "/data/pos_a.geojson"
pos_b = "/data/pos_b.geojson"

# Output patches files prefixes
prefix_a = "/data/a_"
prefix_b = "/data/b_"

for vec, prefix in zip([pos_a, pos_b], [prefix_a, prefix_b]):
    pyotb.PatchesExtraction(
        n_sources=2,  # Tells the OTB application to use two sources
        source1_il=img_10m,
        source1_patchsizex=16,
        source1_patchsizey=16,
        source1_out=prefix + "img_10m_2src.tif",
        source1_nodata=0,
        source2_il=img_20m,
        source2_patchsizex=8,
        source2_patchsizey=8,
        source2_out=prefix + "img_20m_2src.tif",
        source2_nodata=0,
        vec=vec,
        field="class",
        outlabels=prefix + "labels_2src.tif",
    )

Note

We have used a different file name for the label patches and the 10m patches, because any patch containing no-data in the 10m or in the 20m image is discarded. So the number of retained samples could be different from the mono-source case.

Question

  • Run the sample extraction process over the two sources,
  • Check the generated patches in QGIS

Model

We propose a CNN with two inputs with different resolution: the 10m spacing bands of Sentinel-2 (Spectral band 2, 3, 4 and 8) and the 20m spacing bands of Sentinel-2 (Spectral band 5, 6, 7, 8a, 11 and 12). This model processes the two inputs jointly and outputs a land cover map at the lowest resolution, i.e. 20m spacing. The 20m spacing image patches have a size of 8x8 and the 10m spacing image a size of 16x16. The model outputs a single class value for these input patches.

flowchart TD

x1((input1)) --> n1[normalization] --> c11[conv 3x3 + ReLU]
x2((input2)) --> n2[normalization] --> c12[conv 3x3 + ReLU]

c11 --> c21[conv 3x3 + ReLU]
c21 --> c31[conv 3x3 + ReLU]
c31 --> c41[conv 2x2 + ReLU]

c12 --> c22[conv 3x3 + ReLU]
c22 --> c32[conv 3x3 + ReLU]
c32 --> c42[conv 3x3 + ReLU]
c42 --> c52[conv 3x3 + ReLU]
c52 --> c62[conv 3x3 + ReLU]

c62 --> p[Pooling 2x2]
p --> c72[conv 2x2 + ReLU]

c41 --> Concatenate
c72 --> Concatenate
Concatenate --> c8[conv 1x1 + softmax]
c8 -- 1x1x6 --> argmax -- 1x1x1 --> out((labels))
tt((terrain truth)) --> loss
c8 --> loss[Cross entropy]
loss --> Optimizer

Our DualSrcFCNNModel model normalizes both 10m and 20m images using the method introduced in the previous section. We implement the model with two independent branches that process separately the 10m and the 20m Sentinel-2 images.

class DualSrcFCNNModel(otbtf.ModelBase):
    """ " This is a subclass of `otbtf.ModelBase` to implement a CNN."""

    def normalize_inputs(self, inputs):
        """This function nomalizes both inputs, scaling values by 1e-4."""
        return {
            key: keras.ops.cast(inputs[key], "float32") * 1e-4
            for key in [inp1_key, inp2_key]
        }

    def get_outputs(self, normalized_inputs):
        """This function implements the model"""

        # The 10m branch
        inp1 = normalized_inputs[inp1_key]  # 16x16x4
        net1 = conv(inp1, 16, 5, "conv11")  # 12x12x16
        net1 = conv(net1, 16, 3, "conv12")  # 10x10x16
        net1 = conv(net1, 16, 3, "conv13")  # 8x8x16
        net1 = conv(net1, 32, 3, "conv14")  # 6x6x32
        net1 = conv(net1, 32, 3, "conv15")  # 4x4x32
        net1 = pool(net1)  # 2x2x32
        net1 = conv(net1, 32, 2, "feats1")  # 1x1x32

        # The 20m branch
        inp2 = normalized_inputs[inp2_key]  # 8x8x6
        net2 = conv(inp2, 32, 3, "conv21")  # 6x6x32
        net2 = conv(net2, 32, 3, "conv22")  # 4x4x32
        net2 = conv(net2, 32, 3, "conv23")  # 2x2x32
        net2 = conv(net2, 32, 2, "feats2")  # 1x1x32

        # Merge features
        net = keras.layers.Concatenate(name="feats", axis=-1)([net1, net2])

        # Classifier
        estim = conv(net, class_nb, 1, "softmax_layer", activation="softmax")
        argmax_op = otbtf.layers.Argmax()

        return {
            tgt_key: estim,
            "estimated_labels": argmax_op(estim),  # additional output: class id
            "features": net,  # additional output: features
        }

An important aspect is the consistency of the spatial size and dimension of the different features arrays of the model, especially when we want to use it in a fully-convolutional fashion. Let summarize the size and the physical spacing of the different features involved in our model. In the following, Size refers to the size of the tensor in the physical dimensions (i.e. x and y dimensions), and Spacing refers to the physical spacing of the tensor also in the physical dimension.

The 10m branch

Name Size (pixels) Spacing (m)
input 10m 16x16 10
conv1 10m 12x12 10
conv2 10m 10x10 10
conv3 10m 8x8 10
conv4 10m 6x6 10
conv5 10m 4x4 10
pooling 2x2 10m 2x2 20

Across the successive convolutions, features sizes disminish, but the physical spacing is preserved. Then, the final pooling layer is applied, changing the physical size of the feature map, and also the physical spacing: it downsamples the features to 20 meters spacing.

The 20m branch

Name Size (pixels) Spacing (m)
input 20m 8x8 20
conv1 20m 6x6 20
conv2 20m 4x4 20
conv3 20m 2x2 20

The 20m spacing branch processes the 20m input with three convolutions with unitary stride, disminishing the size of the output features to 2x2 but preserving the same physical spacing.

Note

This downsampling in the 10m branch is crucial so that the model can be applied in fully-convulutional mode, because now we can concatenate the features from the 10m branch with the other features from the 20m branch, since they are homogeneous in term of size and in term of physical spacing.

Question

  • Copy part_2_inference_fcn_v2.py to part_2_train_fcn_2_sources.py and edit this new file,
  • Modify the datasets so that samples include 20m spacing bands,
  • Implement the new DualSrcFCNNModel model using the provided snippet,
  • Run the training and save the model to a the /data/models/model2_dualsrc directory,
  • Compare the metrics on the validation dataset with the metrics of the single-source model.
Solution Python code:
part_2_train_fcn_2_sources.py
import argparse
import otbtf
import keras
from mymetrics import FScore


class_nb = 6  # number of classes
inp1_key = "input_10m"  # model input for 10m bands
inp2_key = "input_20m"  # model input for 20m bands
tgt_key = "estimated"  # model target


def dataset_preprocessing_fn(sample):
    return {
        inp1_key: sample["img_10m"],
        inp2_key: sample["img_20m"],
        tgt_key: otbtf.ops.one_hot(labels=sample["labels"], nb_classes=class_nb),
    }


def create_dataset(img_10m, img_20m, labels, batch_size=8):
    otbtf_dataset = otbtf.DatasetFromPatchesImages(
        filenames_dict={"img_10m": img_10m, "img_20m": img_20m, "labels": labels}
    )
    return otbtf_dataset.get_tf_dataset(
        batch_size=batch_size,
        preprocessing_fn=dataset_preprocessing_fn,
        targets_keys=[tgt_key],
    )


# Training dataset
ds_train = create_dataset(
    ["/data/a_img_10m_2src.tif"],
    ["/data/a_img_20m_2src.tif"],
    ["/data/a_labels_2src.tif"],
)
ds_train = ds_train.shuffle(buffer_size=100)

# Validation dataset
ds_valid = create_dataset(
    ["/data/b_img_10m_2src.tif"],
    ["/data/b_img_20m_2src.tif"],
    ["/data/b_labels_2src.tif"],
)


def conv(inp, depth, kernel_size, name, activation="relu"):
    conv_op = keras.layers.Conv2D(
        filters=depth,
        kernel_size=kernel_size,
        strides=1,
        activation=activation,
        padding="valid",
        name=name,
    )
    return conv_op(inp)


pool = keras.layers.MaxPool2D(pool_size=(2, 2))


class DualSrcFCNNModel(otbtf.ModelBase):
    """ " This is a subclass of `otbtf.ModelBase` to implement a CNN."""

    def normalize_inputs(self, inputs):
        """This function nomalizes both inputs, scaling values by 1e-4."""
        return {
            key: keras.ops.cast(inputs[key], "float32") * 1e-4
            for key in [inp1_key, inp2_key]
        }

    def get_outputs(self, normalized_inputs):
        """This function implements the model"""

        # The 10m branch
        inp1 = normalized_inputs[inp1_key]  # 16x16x4
        net1 = conv(inp1, 16, 5, "conv11")  # 12x12x16
        net1 = conv(net1, 16, 3, "conv12")  # 10x10x16
        net1 = conv(net1, 16, 3, "conv13")  # 8x8x16
        net1 = conv(net1, 32, 3, "conv14")  # 6x6x32
        net1 = conv(net1, 32, 3, "conv15")  # 4x4x32
        net1 = pool(net1)  # 2x2x32
        net1 = conv(net1, 32, 2, "feats1")  # 1x1x32

        # The 20m branch
        inp2 = normalized_inputs[inp2_key]  # 8x8x6
        net2 = conv(inp2, 32, 3, "conv21")  # 6x6x32
        net2 = conv(net2, 32, 3, "conv22")  # 4x4x32
        net2 = conv(net2, 32, 3, "conv23")  # 2x2x32
        net2 = conv(net2, 32, 2, "feats2")  # 1x1x32

        # Merge features
        net = keras.layers.Concatenate(name="feats", axis=-1)([net1, net2])

        # Classifier
        estim = conv(net, class_nb, 1, "softmax_layer", activation="softmax")
        argmax_op = otbtf.layers.Argmax()

        return {
            tgt_key: estim,
            "estimated_labels": argmax_op(estim),  # additional output: class id
            "features": net,  # additional output: features
        }


parser = argparse.ArgumentParser(description="Train a CNN model")
parser.add_argument("--model", required=True, help="model path (.keras file)")
parser.add_argument("--log_dir", required=True, help="logs directory")
parser.add_argument("--batch_size", type=int, default=4)
parser.add_argument("--learning_rate", type=float, default=0.0002)
parser.add_argument("--epochs", type=int, default=100)
params = parser.parse_args()

model = DualSrcFCNNModel(dataset_element_spec=ds_train.element_spec)

# Precision and recall for each class
metrics = [
    cls(class_id=class_id)
    for class_id in range(class_nb)
    for cls in [keras.metrics.Precision, keras.metrics.Recall]
]

# F1-Score for each class
metrics += [
    FScore(class_id=class_id, name=f"fscore_cls{class_id}")
    for class_id in range(class_nb)
]

model.compile(
    loss={tgt_key: keras.losses.CategoricalCrossentropy()},
    optimizer=keras.optimizers.Adam(params.learning_rate),
    metrics={tgt_key: metrics},  # compute the metrics for `tgt_key`
)
model.summary()

save_callback = keras.callbacks.ModelCheckpoint(
    params.model,  # new directory name
    save_best_only=True,  # save only the best models
    monitor="val_loss",  # metric or loss to monitor
    mode="min",  # when a new min is reached
    verbose=2,  # log something when saving
)
tb_callback = keras.callbacks.TensorBoard(log_dir=params.log_dir)
model.fit(
    ds_train,
    epochs=params.epochs,
    validation_data=ds_valid,
    callbacks=[save_callback, tb_callback],
)
Command:
python part_2_train_fcn_2_sources.py \
  --model /data/models/model2_dualsrc.keras \
  --log_dir /data/logs/model2_dualsrc

Inference

In this section, we will generate some maps with this new model!

First convert the .keras model into a TensorFlow SavedModel:

python part_2_savedmodel.py --model /data/models/model2_dualsrc.keras --savedmodel /data/models/model2_dualsrc_savedmodel

Patch based mode

First, we will run the model in patch-based mode. Generate the output classification map at the resolution of the 20m spacing image. To do this, just feed the 20m spacing image to the source1_il, which is the reference source for the output physical spacing.

part_2_inference_2_sources_pb.py
import pyotb
import argparse

parser = argparse.ArgumentParser(description="Apply the model")
parser.add_argument("--savedmodel", required=True, help="savedmodel directory")
params = parser.parse_args()

# Generate the classification map, patch-based mode, same
# resolution as the 20m spacing image
infer = pyotb.TensorflowModelServe(
    n_sources=2,  # Tells the OTB application to use two sources
    source1_il="/data/s2_tokyo_20m.tif",
    source1_rfieldx=8,
    source1_rfieldy=8,
    source1_placeholder="input_20m",
    source2_il="/data/s2_tokyo_10m.tif",
    source2_rfieldx=16,
    source2_rfieldy=16,
    source2_placeholder="input_10m",
    model_dir=params.savedmodel,
    output_names="estimated_labels",
)

infer.write(
    "/data/map_fcn_2_sources_pb.tif",
    pixel_type="uint8",
    ext_fname="box=2000:2000:500:500",
)

Question

  • Run the inference to generate the map
  • Switch source1 and source2 such as the first source is the 10m spacing image: the output image will be created with the same spacing.
  • Import the images in QGIS and analyse their spatial resolutions.
Solution
part_2_inference_2_sources_pb_sf.py
import pyotb
import argparse

parser = argparse.ArgumentParser(description="Apply the model")
parser.add_argument("--savedmodel", required=True, help="savedmodel directory")
params = parser.parse_args()

# Generate the classification map, patch-based mode, same
# resolution as the 20m spacing image
infer = pyotb.TensorflowModelServe(
    n_sources=2,  # Tells the OTB application to use two sources
    source1_il="/data/s2_tokyo_10m.tif",
    source1_rfieldx=16,
    source1_rfieldy=16,
    source1_placeholder="input_10m",
    source2_il="/data/s2_tokyo_20m.tif",
    source2_rfieldx=8,
    source2_rfieldy=8,
    source2_placeholder="input_20m",
    model_dir=params.savedmodel,
    output_names="estimated_labels",
    output_spcscale=2,
)

infer.write(
    "/data/map_fcn_2_sources_pb_spc2.tif",
    pixel_type="uint8",
    ext_fname="box=2000:2000:500:500",
)

Fully convolutional mode

Generate the output classification map at the same resolution as the 20m spacing image, using the model_fullyconv parameter to True to enable the fully convolutional mode of the processing.

part_2_inference_2_sources_fcn.py
import pyotb
import argparse

parser = argparse.ArgumentParser(description="Apply the model")
parser.add_argument("--savedmodel", required=True, help="savedmodel directory")
params = parser.parse_args()

# Generate the classification map, fully-convolutional mode, same
# resolution as the 20m spacing image
infer = pyotb.TensorflowModelServe(
    n_sources=2,  # Tells the OTB application to use two sources
    source1_il="/data/s2_tokyo_20m.tif",
    source1_rfieldx=8,
    source1_rfieldy=8,
    source1_placeholder="input_20m",
    source2_il="/data/s2_tokyo_10m.tif",
    source2_rfieldx=16,
    source2_rfieldy=16,
    source2_placeholder="input_10m",
    model_dir=params.savedmodel,
    model_fullyconv=True,
    output_names="estimated_labels",
)

infer.write(
    "/data/map_fcn_2_sources_fcn.tif",
    pixel_type="uint8",
    ext_fname="box=2000:2000:500:500",
)

Question

  • Run the inference to generate the map
  • Switch source1 and source2 such as the first source is the 10m spacing image, you have to provide an output scale factor (parameter spcscale) to the TensorflowModelServe application, to make consistent the physical spacing of the output. In this example, the spcscale would be 2.
Solution
part_2_inference_2_sources_fcn_sf.py
import pyotb
import argparse

parser = argparse.ArgumentParser(description="Apply the model")
parser.add_argument("--savedmodel", required=True, help="savedmodel directory")
params = parser.parse_args()

# Generate the classification map, fully-convolutional mode, same
# resolution as the 20m spacing image
infer = pyotb.TensorflowModelServe(
    n_sources=2,  # Tells the OTB application to use two sources
    source1_il="/data/s2_tokyo_10m.tif",
    source1_rfieldx=16,
    source1_rfieldy=16,
    source1_placeholder="input_10m",
    source2_il="/data/s2_tokyo_20m.tif",
    source2_rfieldx=8,
    source2_rfieldy=8,
    source2_placeholder="input_20m",
    model_dir=params.savedmodel,
    model_fullyconv=True,
    output_names="estimated_labels",
    output_spcscale=2,
)

infer.write(
    "/data/map_fcn_2_sources_fcn_spc2.tif",
    pixel_type="uint8",
    ext_fname="box=2000:2000:500:500",
)