Pixel-wise FCNN
Fully convolutional network runs faster that patch based, enough to process the full image in reasonable time. However, pooling decimates the resolution of the output (the output image spacing is 4 times greater than the input pixel spacing). We can modify a bit our original model to create a pixel-wise fully convolutional network which preserves the input image spacing.
We remove all pooling operators, and add convolutional layers with unitary strides. Our goal is to keep the same receptive field and expression field as the previously used architecture (this way, we can reuse our images patches that we have generated previously.
The following figure summarizes our new architecture.
flowchart TD
i((input)) --> normalization -- 16x16x4 --> c1[conv 5x5 + ReLU]
c1 -- 12x12x16 --> c2[conv 3x3 + ReLU]
c2 -- 10x10x16 --> c3[conv 3x3 + ReLU]
c3 -- 8x8x16 --> c4[conv 3x3 + ReLU]
c4 -- 6x6x32 --> c5[conv 3x3 + ReLU]
c5 -- 4x4x32 --> c6[conv 3x3 + ReLU]
c6 -- 2x2x32 --> c7[conv 2x2 + ReLU]
c7 -- 1x1x32 --> c8[conv 1x1 + Softmax]
c8 -- 1x1x6 --> argmax -- 1x1x1 --> p((labels))
Note that there is no stride in convolution, preserving the physical spacing in spatial dimensions. Now let's implement our fully-convolutional model:
def get_outputs(self, normalized_inputs):
"""This function implements the model"""
inp = normalized_inputs[inp_key]
net = conv(inp, 16, 5, "conv1") # 12x12x16
net = conv(net, 16, 3, "conv2") # 10x10x16
net = conv(net, 16, 3, "conv3") # 8x8x16
net = conv(net, 32, 3, "conv4") # 6x6x32
net = conv(net, 32, 3, "conv5") # 4x4x32
net = conv(net, 32, 3, "conv6") # 2x2x32
net = conv(net, 32, 2, "feats") # 1x1x32
# Classifier
estim = conv(net, class_nb, 1, "softmax_layer", activation="softmax")
argmax_op = otbtf.layers.Argmax()
return {
tgt_key: estim,
"estimated_labels": argmax_op(estim), # additional output: class id
"features": net, # additional output: features
}
As there is no stride in convolution, and stride in pooling, our new model has an output scale factor of 1 and will be able to produce the classification map at the same physical spacing as the input image.
Question
- Copy
part_2_train.pytopart_2_train_fcn.pyand implement the model in this new file.
Solution
import argparse
import otbtf
import keras
import os
from mymetrics import FScore
class_nb = 6 # number of classes
inp_key = "input" # model input
tgt_key = "estimated" # model target
def dataset_preprocessing_fn(sample):
return {
inp_key: sample["img"],
tgt_key: otbtf.ops.one_hot(labels=sample["labels"], nb_classes=class_nb),
}
def create_dataset(img, labels, batch_size=8):
otbtf_dataset = otbtf.DatasetFromPatchesImages(
filenames_dict={"img": img, "labels": labels}
)
return otbtf_dataset.get_tf_dataset(
batch_size=batch_size,
preprocessing_fn=dataset_preprocessing_fn,
targets_keys=[tgt_key],
)
# Training dataset
ds_train = create_dataset(["/data/a_img_10m.tif"], ["/data/a_labels.tif"])
ds_train = ds_train.shuffle(buffer_size=100)
# Validation dataset
ds_valid = create_dataset(["/data/b_img_10m.tif"], ["/data/b_labels.tif"])
def conv(inp, depth, kernel_size, name, activation="relu"):
conv_op = keras.layers.Conv2D(
filters=depth,
kernel_size=kernel_size,
strides=1,
activation=activation,
padding="valid",
name=name,
)
return conv_op(inp)
class FCNNModel(otbtf.ModelBase):
""" " This is a subclass of `otbtf.ModelBase` to implement a CNN"""
def normalize_inputs(self, inputs):
"""This function nomalizes the input, scaling values by 1e-4"""
return {inp_key: keras.ops.cast(inputs[inp_key], "float32") * 1e-4}
def get_outputs(self, normalized_inputs):
"""This function implements the model"""
inp = normalized_inputs[inp_key]
net = conv(inp, 16, 5, "conv1") # 12x12x16
net = conv(net, 16, 3, "conv2") # 10x10x16
net = conv(net, 16, 3, "conv3") # 8x8x16
net = conv(net, 32, 3, "conv4") # 6x6x32
net = conv(net, 32, 3, "conv5") # 4x4x32
net = conv(net, 32, 3, "conv6") # 2x2x32
net = conv(net, 32, 2, "feats") # 1x1x32
# Classifier
estim = conv(net, class_nb, 1, "softmax_layer", activation="softmax")
argmax_op = otbtf.layers.Argmax()
return {
tgt_key: estim,
"estimated_labels": argmax_op(estim), # additional output: class id
"features": net, # additional output: features
}
parser = argparse.ArgumentParser(description="Train a CNN model")
parser.add_argument("--model", required=True, help="model path (.keras file)")
parser.add_argument("--log_dir", required=True, help="logs directory")
parser.add_argument("--batch_size", type=int, default=4)
parser.add_argument("--learning_rate", type=float, default=0.0002)
parser.add_argument("--epochs", type=int, default=100)
params = parser.parse_args()
# Logs directory
log_dir = os.path.join(params.log_dir, "fcn")
model = FCNNModel(dataset_element_spec=ds_train.element_spec)
metrics = [
cls(class_id=class_id)
for class_id in range(class_nb)
for cls in [keras.metrics.Precision, keras.metrics.Recall]
]
metrics += [
FScore(class_id=class_id, name=f"fscore_cls{class_id}")
for class_id in range(class_nb)
]
model.compile(
loss={tgt_key: keras.losses.CategoricalCrossentropy()},
optimizer=keras.optimizers.Adam(params.learning_rate),
metrics={tgt_key: metrics}, # compute the metrics for `tgt_key`
)
model.summary()
save_callback = keras.callbacks.ModelCheckpoint(
params.model, # model file path
save_best_only=True, # save only the best models
monitor="val_loss", # metric or loss to monitor
mode="min", # when a new min is reached
verbose=2, # log something when saving
)
tb_callback = keras.callbacks.TensorBoard(log_dir=log_dir)
model.fit(
ds_train,
epochs=params.epochs,
validation_data=ds_valid,
callbacks=[save_callback, tb_callback],
)
Training¶
We can now train this new model over the same patches as for the simple CNN.
python part_2_train_fcn.py \
--model /data/models/model1_fcn.keras \
--log_dir /data/logs/model1_fcn
Note
You can observe that the new model architecture and model setup is more prone to overfitting compared to the previous one.
Question
- Use a
keras.callbacks.EarlyStoppingto halt the optimization process when the training stops improving, - Test various settings of the callback and observe their effects on
tensorbard (remember that you can set the number of epochs using the
--epochsparameter of the CLI parser).
Inference¶
We now want to produce the classification map over the entire image. We use the following script that performs the inference in fully-convolutional mode with a unitary scale factor, meaning that the ratio between the input image pixels spacing and the output image pixel spacing is 1. The output image is saved to /data/map1_fcn.tif. The entire input image is processed.
import pyotb
import argparse
parser = argparse.ArgumentParser(description="Apply the model")
parser.add_argument("--savedmodel", required=True, help="savedmodel directory")
params = parser.parse_args()
infer = pyotb.TensorflowModelServe(
source1_il="/data/s2_tokyo_10m.tif",
source1_rfieldx=16,
source1_rfieldy=16,
source1_placeholder="input",
model_dir=params.savedmodel,
output_names="estimated_labels",
model_fullyconv=True,
)
infer.write(
"/data/map1_fcn.tif", pixel_type="uint8", ext_fname="box=4000:4000:1000:1000"
)
Note
The optim parameter group of TensorflowModelServe enables to adjust settings for the model execution to reach better performances. To speed-up the process, you can force the application to produce the result using large tiles. In the following, we force the output tiles to large rectangular tiles of 128 pixels height, in order to reduce the computation from partially overlapping input areas.
Note that you must have enough memory for processing large tiles. If the memory is not enough, TensorFlow will throw warning messages indicating that the efficiency of the computation is compromised. In this case, you can reduce the tile height, e.g. 64 or 128.Question
- Generate the map for the entire image,
- Set the
source1_nodatato0and generate the map (change the output image filename), - Open the first map (without no-data) and the second (with no-data specified) and compare the resulting maps.