Classification

Set directory

Set parameters

Set filenames

Input files

Output files

1. Prepare classification features associated to in situ data

1.1 Rasterize in situ data calibration shapefile

1.2 List all the classification features

Create an empty list to append all feature rasters one by one

1 NDVI image per month

S1 monthly mean composite (obtained with Google Earth Engine)

Merge all the 2D matrices from the list into one 3D matrix

1.3 Pairing in situ data (Y) with EO classification features (X)

Now that we have the image we want to classify (our X feature inputs), and the ROI with the land cover labels (our Y labeled data), we need to pair them up in NumPy arrays so we may feed them to Random Forest.

What are our classification labels?

We need :

These will have n_samples rows.

2. Train the Random Forest

Now that we have our X 2D-matrix of feature inputs and our y 1D-matrix containing the labels, we can train our model.

Visit this web page to find the usage of RandomForestClassifier from scikit-learn.

With our Random Forest model fit, we can check out the "Out-of-Bag" (OOB) prediction score.

Score of the training dataset obtained using an out-of-bag estimate. This attribute exists only when oob_score is True.

To help us get an idea of which features bands were important, we can look at the feature importance scores.

The impurity-based feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.

Let's look at a crosstabulation to see the class confusion

Unbelievable? I highly doubt the real confusion matrix will be 100% accuracy. What is likely going on is that we used a large number of trees within a machine learning algorithm to best figure out the pattern in our training data. Given enough information and effort, this algorithm precisely learned what we gave it. Asking to validate a machine learning algorithm on the training data is a useless exercise that will overinflate the accuracy.

Instead, we could have done a crossvalidation approach where we train on a subset the dataset, and then predict and assess the accuracy using the sections we didn't train it on.

3. Predict the rest of the image

With our Random Forest classifier fit, we can now proceed by trying to classify the entire image.

4. Reclassify classification

4.1 Open LUT and sort values

4.2 Reclassify prediction

5. Filter classification with moving window

6. Write classification products into GeoTIFF files

Open template image to get metadata

6.1 Write classification

6.2 Write re-classification

6.3 Write re-classification with moving window filtering