Overview

This tutorial presents an approach to perform semantic segmentation on very high resolution remote sensing images using a deep convolutional neural network. In previous sections, we introduced a patch based classification approach, that is suited to sparsely annotated data. However, this kind of approach is limited in term of semantic precision in spatial location. Indeed, a patch based network is designed to attribute one single class to an input patch, and does not make use of the context of the actual classes labels. On the contrary, semantic segmentation methods, also known as dense prediction, use the semantic spatial context. These approaches train networks estimating the semantic of patches of pixels, rather that just a single pixel. Semantic segmentation methods allows to tackle the spatial precision of the classification. However, the training of networks require densely annotated terrain truth. In this section, we will use terrain truth data over the city of Paris, France, to train a model to map buildings footprints from Spot 6/7 products. The terrain truth data was created from the French Institute of Geography BD TOPO© at the UMR TETIS Lab.