Skip to content

Pixel-wise FCNN

Fully convolutional network runs faster that patch based, enough to process the full image in reasonable time. However, pooling decimates the resolution of the output (the output image spacing is 4 times greater than the input pixel spacing). We can modify a bit our original model to create a pixel-wise fully convolutional network which preserves the input image spacing.

We remove all pooling operators, and add convolutional layers with unitary strides. Our goal is to keep the same receptive field and expression field as the previously used architecture (this way, we can reuse our images patches that we have generated previously.

The following figure summarizes our new architecture.

flowchart TD

i((input)) --> normalization -- 16x16x4 --> c1[conv 5x5 + ReLU]
c1 -- 12x12x16 --> c2[conv 3x3 + ReLU]
c2 -- 10x10x16 --> c3[conv 3x3 + ReLU]
c3 -- 8x8x16   --> c4[conv 3x3 + ReLU]
c4 -- 6x6x32   --> c5[conv 3x3 + ReLU]
c5 -- 4x4x32   --> c6[conv 3x3 + ReLU]
c6 -- 2x2x32   --> c7[conv 2x2 + ReLU]
c7 -- 1x1x32   --> c8[conv 1x1 + Softmax]
c8 -- 1x1x6    --> argmax -- 1x1x1 --> p((labels))

Note that there is no stride in convolution, preserving the physical spacing in spatial dimensions. Now let's implement our fully-convolutional model:

    def get_outputs(self, normalized_inputs):
        """This function implements the model"""
        inp = normalized_inputs[inp_key]
        net = conv(inp, 16, 5, "conv1")  # 12x12x16
        net = conv(net, 16, 3, "conv2")  # 10x10x16
        net = conv(net, 16, 3, "conv3")  # 8x8x16
        net = conv(net, 32, 3, "conv4")  # 6x6x32
        net = conv(net, 32, 3, "conv5")  # 4x4x32
        net = conv(net, 32, 3, "conv6")  # 2x2x32
        net = conv(net, 32, 2, "feats")  # 1x1x32

        # Classifier
        estim = conv(net, class_nb, 1, "softmax_layer", activation="softmax")
        argmax_op = otbtf.layers.Argmax()

        return {
            tgt_key: estim,
            "estimated_labels": argmax_op(estim),  # additional output: class id
            "features": net,  # additional output: features
        }

As there is no stride in convolution, and stride in pooling, our new model has an output scale factor of 1 and will be able to produce the classification map at the same physical spacing as the input image.

Question

  • Copy part_2_train.py to part_2_train_fcn.py and implement the model in this new file.
Solution
import argparse
import otbtf
import keras
import os
from mymetrics import FScore


class_nb = 6  # number of classes
inp_key = "input"  # model input
tgt_key = "estimated"  # model target


def dataset_preprocessing_fn(sample):
    return {
        inp_key: sample["img"],
        tgt_key: otbtf.ops.one_hot(labels=sample["labels"], nb_classes=class_nb),
    }


def create_dataset(img, labels, batch_size=8):
    otbtf_dataset = otbtf.DatasetFromPatchesImages(
        filenames_dict={"img": img, "labels": labels}
    )
    return otbtf_dataset.get_tf_dataset(
        batch_size=batch_size,
        preprocessing_fn=dataset_preprocessing_fn,
        targets_keys=[tgt_key],
    )


# Training dataset
ds_train = create_dataset(["/data/a_img_10m.tif"], ["/data/a_labels.tif"])
ds_train = ds_train.shuffle(buffer_size=100)

# Validation dataset
ds_valid = create_dataset(["/data/b_img_10m.tif"], ["/data/b_labels.tif"])


def conv(inp, depth, kernel_size, name, activation="relu"):
    conv_op = keras.layers.Conv2D(
        filters=depth,
        kernel_size=kernel_size,
        strides=1,
        activation=activation,
        padding="valid",
        name=name,
    )
    return conv_op(inp)


class FCNNModel(otbtf.ModelBase):
    """ " This is a subclass of `otbtf.ModelBase` to implement a CNN"""

    def normalize_inputs(self, inputs):
        """This function nomalizes the input, scaling values by 1e-4"""
        return {inp_key: keras.ops.cast(inputs[inp_key], "float32") * 1e-4}

    def get_outputs(self, normalized_inputs):
        """This function implements the model"""
        inp = normalized_inputs[inp_key]
        net = conv(inp, 16, 5, "conv1")  # 12x12x16
        net = conv(net, 16, 3, "conv2")  # 10x10x16
        net = conv(net, 16, 3, "conv3")  # 8x8x16
        net = conv(net, 32, 3, "conv4")  # 6x6x32
        net = conv(net, 32, 3, "conv5")  # 4x4x32
        net = conv(net, 32, 3, "conv6")  # 2x2x32
        net = conv(net, 32, 2, "feats")  # 1x1x32

        # Classifier
        estim = conv(net, class_nb, 1, "softmax_layer", activation="softmax")
        argmax_op = otbtf.layers.Argmax()

        return {
            tgt_key: estim,
            "estimated_labels": argmax_op(estim),  # additional output: class id
            "features": net,  # additional output: features
        }


parser = argparse.ArgumentParser(description="Train a CNN model")
parser.add_argument("--model", required=True, help="model path (.keras file)")
parser.add_argument("--log_dir", required=True, help="logs directory")
parser.add_argument("--batch_size", type=int, default=4)
parser.add_argument("--learning_rate", type=float, default=0.0002)
parser.add_argument("--epochs", type=int, default=100)
params = parser.parse_args()

# Logs directory
log_dir = os.path.join(params.log_dir, "fcn")

model = FCNNModel(dataset_element_spec=ds_train.element_spec)

metrics = [
    cls(class_id=class_id)
    for class_id in range(class_nb)
    for cls in [keras.metrics.Precision, keras.metrics.Recall]
]
metrics += [
    FScore(class_id=class_id, name=f"fscore_cls{class_id}")
    for class_id in range(class_nb)
]

model.compile(
    loss={tgt_key: keras.losses.CategoricalCrossentropy()},
    optimizer=keras.optimizers.Adam(params.learning_rate),
    metrics={tgt_key: metrics},  # compute the metrics for `tgt_key`
)
model.summary()

save_callback = keras.callbacks.ModelCheckpoint(
    params.model,  # model file path
    save_best_only=True,  # save only the best models
    monitor="val_loss",  # metric or loss to monitor
    mode="min",  # when a new min is reached
    verbose=2,  # log something when saving
)
tb_callback = keras.callbacks.TensorBoard(log_dir=log_dir)
model.fit(
    ds_train,
    epochs=params.epochs,
    validation_data=ds_valid,
    callbacks=[save_callback, tb_callback],
)

Training

We can now train this new model over the same patches as for the simple CNN.

python part_2_train_fcn.py \
  --model /data/models/model1_fcn.keras \
  --log_dir /data/logs/model1_fcn

Note

You can observe that the new model architecture and model setup is more prone to overfitting compared to the previous one.

Question

  • Use a keras.callbacks.EarlyStopping to halt the optimization process when the training stops improving,
  • Test various settings of the callback and observe their effects on tensorbard (remember that you can set the number of epochs using the --epochs parameter of the CLI parser).

Inference

We now want to produce the classification map over the entire image. We use the following script that performs the inference in fully-convolutional mode with a unitary scale factor, meaning that the ratio between the input image pixels spacing and the output image pixel spacing is 1. The output image is saved to /data/map1_fcn.tif. The entire input image is processed.

part_2_inference_fcn.py
import pyotb
import argparse

parser = argparse.ArgumentParser(description="Apply the model")
parser.add_argument("--savedmodel", required=True, help="savedmodel directory")
params = parser.parse_args()


infer = pyotb.TensorflowModelServe(
    source1_il="/data/s2_tokyo_10m.tif",
    source1_rfieldx=16,
    source1_rfieldy=16,
    source1_placeholder="input",
    model_dir=params.savedmodel,
    output_names="estimated_labels",
    model_fullyconv=True,
)

infer.write(
    "/data/map1_fcn.tif", pixel_type="uint8", ext_fname="box=4000:4000:1000:1000"
)

Note

The optim parameter group of TensorflowModelServe enables to adjust settings for the model execution to reach better performances. To speed-up the process, you can force the application to produce the result using large tiles. In the following, we force the output tiles to large rectangular tiles of 128 pixels height, in order to reduce the computation from partially overlapping input areas.

infer = pyotb.TensorflowModelServe(
    optim_tilesizex=256,
    optim_tilesizey=256,
    ...
)
Note that you must have enough memory for processing large tiles. If the memory is not enough, TensorFlow will throw warning messages indicating that the efficiency of the computation is compromised. In this case, you can reduce the tile height, e.g. 64 or 128.

Question

  • Generate the map for the entire image,
  • Set the source1_nodata to 0 and generate the map (change the output image filename),
  • Open the first map (without no-data) and the second (with no-data specified) and compare the resulting maps.