L12 – Unsupervised classification

Unsupervised classification
Introduction to Photogrammetry and
Remote Sensing (SGHG 1473)
Dr. Muhammad Zulkarnain Abdul Rahman
Unsupervised Classification
• Unsupervised classification (commonly referred to as clustering) is
an effective method of partitioning remote sensor image data in
multispectral feature space and extracting land-cover information.
• Compared to supervised classification, unsupervised classification
normally requires only a minimal amount of initial input from the
analyst.
• This is because clustering does not normally require training data.
• The process where numerical operations are performed that search
for natural groupings of the spectral properties of pixels, as
examined in multispectral feature space
• The clustering process results in a classification map consisting of m
spectral classes. The analyst then attempts a posteriori (after the
fact) to assign or transform the spectral classes into thematic
information classes of interest (e.g., forest, agriculture).
Unsupervised Classification
• This may be difficult. Some spectral clusters may be
meaningless because they represent mixed classes of Earth
surface materials.
• The analyst must understand the spectral characteristics of the
terrain well enough to be able to label certain clusters as specific
information classes.
•
Hundreds of clustering algorithms have been developed. Two examples of
conceptually simple but not necessarily efficient clustering algorithms will
be used to demonstrate the fundamental logic of unsupervised
classification of remote sensor data:
•
Clustering using the Chain Method
•
Clustering using the Iterative Self-Organizing Data Analysis Technique (ISODATA).
Clustering Using the Chain Method
• The Chain Method clustering algorithm operates in a two-pass
mode (i.e., it passes through the multispectral dataset two times).
• Pass #1: The program reads through the dataset and sequentially
builds clusters (groups of points in spectral space). A mean vector is
then associated with each cluster.
• Pass #2: A minimum distance to means classification algorithm is
applied to the whole dataset on a pixel-by-pixel basis whereby each
pixel is assigned to one of the mean vectors created in pass #1. The
first pass, therefore, automatically creates the cluster signatures
(class mean vectors) to be used by the minimum distance to means
classifier.
Clustering Using the Chain Method
Phase 1: Cluster Building –
• R, a radius distance in spectral space used to determine when a
new cluster should be formed (e.g., when raw remote sensor data
are used, it might be set at 15 brightness value units).
• C, a spectral space distance parameter used when merging clusters
(e.g., 30 units ) when N is reached.
• N, the number of pixels to be evaluated between each major
merging of the clusters (e.g., 2000 pixels).
• Cmax, the maximum number of clusters to be identified by the
clustering algorithm (e.g., 20 clusters).
• Phase 2: Assignment of pixels to one of the Cmax clusters using
minimum distance to means classification logic
Phase 1:
Cluster Building
–
Clustering
Using
the Chain
• R, a radius distance in spectral space used to
Method
determine
when a new cluster should be formed (e.g.,
when raw remote sensor data are used, it might be set
at 15 brightness value units).
• C, a spectral space distance parameter used when
merging clusters (e.g., 30 units ) when N is reached.
• N, the number of pixels to be evaluated between
each major merging of the clusters (e.g., 2000 pixels).
• Cmax, the maximum number of clusters to be
identified by the clustering algorithm (e.g., 20 clusters).
Phase 2: Assignment of pixels to one of the Cmax
clusters using minimum distance to means
classification logic
Jensen,
2011
Clustering Using the Chain
Method
Starting at the origin of the multispectral dataset (i.e.,
line 1, column 1), pixels are evaluated sequentially
from left to right as if in a chain. After one line is
processed, the next line of data is considered. We will
analyze the clustering of only the first three pixels in a
hypothetical image and label them pixels 1, 2, and 3.
Jensen,
2011
Original brightness values of pixels 1, 2,
and 3 as measured in Bands 4 and 5 of
the hypothetical remote sensed
dataset.
Jensen,
2011
The distance (D) in 2-dimensional spectral space
between pixel 1 (cluster 1) and pixel 2 (potential cluster
2) in the first iteration is computed and tested against
the value of R=15, the minimum acceptable radius. In
this case, D does not exceed R. Therefore, clusters 1
and 2 are merged as shown in the next illustration.
Jensen,
2011
Pixels 1 and 2 now represent cluster #1. Note that the
location of cluster 1 has migrated from 10,10 to 15,15
after the first iteration. Now, pixel 3 distance (D=15.81)
is computed to see if it is greater than the minimum
threshold, R=15. It is, so pixel location 3 becomes
cluster #2. This process continues until all 20 clusters
are identified. Then the 20 clusters are evaluated using
a distance measure, C (not shown), to merge the
clusters that are closest to one another.
Jensen,
2011
Clusters migrate during the many iterations of a
clustering algorithm. The final ending point represents
the mean vector that would be used in Phase #2 of the
clustering process when the minimum distance
classification is performed.
Jensen,
2011
• Note: As more points are added to a cluster, the
mean shifts less dramatically since the new
computed mean is weighted by the number of
pixels currently in a cluster. The ending point is
the spectral location of the final mean vector that
is used as a signature in the minimum distance
classifier applied in Pass #2.
• Some clustering algorithms allow the analyst to
initially seed the mean vector for several of the
important classes. The seed data are usually
obtained in a supervised fashion, as discussed
previously. Others allow the analyst to use a priori
information to direct the clustering process.
Jensen,
2011
Pass 2: Assignment of Pixels to One of the Cmax Clusters Using Minimum
Distance Classification Logic
The final cluster mean data vectors are used in a minimum
distance to means classification algorithm to classify all
the pixels in the image into one of the Cmax clusters. The
analyst usually produces a co-spectral plot display to
document where the clusters reside in 3-dimensional
feature space. It is then necessary to evaluate the location
of the clusters in the image, label them if possible, and see
if any clusters should be combined. It is usually necessary
to combine some clusters. This is where an intimate
knowledge of the terrain is critical.
Jensen,
2011
The mean vectors of the 20 clusters displayed using only
bands 2 and 3 (e.g., green and red). The mean vector
values are summarized in Table 9-11. Notice the
substantial amount of overlap among clusters 1 through 5
and 19.
Jensen,
2011
The mean vectors of the 20 clusters displayed using
only bands 3 and 4 (red and near-infrared) data. The
mean vector values are summarized in Table 9-11.
Jensen,
2011
Results of Clustering
on Thematic Mapper
Bands 2, 3, and 4 of
the Charleston, SC,
Landsat TM scene.
Jensen,
2011
Grouping (labeling) of the original 20 spectral clusters
into information classes. The labeling was performed
by analyzing the mean vector locations in bands 3 and
4.
Jensen,
2011
ISODATA Clustering
The Iterative Self-Organizing Data Analysis Technique (ISODATA)
represents a comprehensive set of heuristic (rules of thumb)
procedures that have been incorporated into an iterative classification
algorithm. Many of the steps incorporated into the algorithm are a
result of experience gained through experimentation.
The ISODATA algorithm is a modification of the k-means clustering
algorithm, which includes a) merging clusters if their separation
distance in multispectral feature space is below a user-specified
threshold, and b) rules for splitting a single cluster into two clusters.
Jensen,
2011
ISODATA Clustering
• ISODATA is iterative because it makes a large number of passes
through the remote sensing dataset until specified results are
obtained, instead of just two passes.
• ISODATA does not allocate its initial mean vectors based on the
analysis of pixels in the first line of data the way the two-pass chain
algorithm does. Rather, an initial arbitrary assignment of all Cmax
clusters takes place along an n-dimensional vector that runs between
very specific points in feature space. The region in feature space is
defined using the mean, µk, and standard deviation, σk, of each band
in the analysis. This method of automatically seeding the original Cmax
vectors makes sure that the first few lines of data do not bias the
creation of clusters.
Jensen, 2011
ISODATA Clustering
ISODATA is self-organizing because it requires relatively little human
input. Typical ISODATA algorithms normally require the analyst to
specify the following criteria:
• Cmax: the maximum number of clusters to be identified by the
algorithm (e.g., 20 clusters). However, it is not uncommon for fewer to
be found in the final classification map after splitting and merging
take place.
• T: the maximum percentage of pixels whose class values are allowed
to be unchanged between iterations. When this number is reached,
the ISODATA algorithm terminates. Some datasets may never reach
the desired percentage unchanged. If this happens, it is necessary to
interrupt processing and edit the parameter.
Jensen,
2011
ISODATA Clustering
M: the maximum number of times ISODATA is to classify pixels and
recalculate cluster mean vectors. The ISODATA algorithm terminates
when this number is reached.
Minimum members in a cluster (%): If a cluster contains less than the
minimum percentage of members, it is deleted and the members are
assigned to an alternative cluster. This also affects whether a class is
going to be split (see maximum standard deviation). The default
minimum percentage of members is often set to 0.01.
Jensen,
2011
ISODATA Clustering
Maximum standard deviation (σmax): When the standard deviation for
a cluster exceeds the specified maximum standard deviation and the
number of members in the class is greater than twice the specified
minimum members in a class, the cluster is split into two clusters. The
mean vectors for the two new clusters are the old class centers ±1σ.
Maximum standard deviation values between 4.5 and 7 are typical.
Split separation value: If this value is changed from 0.0, it takes the
place of the standard deviation in determining the locations of the
new mean vectors plus and minus the split separation value.
Minimum distance between cluster means (C): Clusters with a
weighted distance less than this value are merged. A default of 3.0 is
often used.
Jensen,
2011
ISODATA Clustering
Phase 1: ISODATA Cluster Building using many
passes through the dataset.
Phase 2: Assignment of pixels to one of the Cmax
clusters using minimum distance to means
classification logic.
Jensen,
2011
a) ISODATA initial distribution of five
(5) hypothetical mean vectors using
±1σ standard deviations in both bands
as beginning and ending points. b) In
the first iteration, each candidate pixel
is compared to each cluster mean and
assigned to the cluster whose mean is
closest in Euclidean distance. c) During
the second iteration, a new mean is
calculated for each cluster based on
the actual spectral locations of the
pixels assigned to each cluster, instead
of the initial arbitrary calculation. This
involves analysis of several parameters
to merge or split clusters. After the
new cluster mean vectors are
selected, every pixel in the scene is
assigned to one of the new clusters. d)
This split–merge–assign process
continues until there is little change in
class assignment between iterations
(the T threshold is reached) or the
Jensen,
maximum number of iterations
is
2011
reached (M).
a) Distribution of 20 ISODATA
mean vectors after just one
iteration using Landsat TM
band 3 and 4 data of
Charleston, SC. Notice that the
initial mean vectors are
distributed along a diagonal in
2-dimensional feature space
according to the ±2σ standard
deviation logic discussed. b)
Distribution of 20 ISODATA
mean vectors after 20
iterations. The bulk of the
important feature space (the
gray background) is partitioned
rather well after just 20
iterations.
Jensen,
2011
Classification
Based on
ISODATA
Clustering
Jensen,
2011
ISODATA
Clustering
Logic
Jensen,
2011
Unsupervised Cluster
Busting
It is common when performing unsupervised classification using the
chain algorithm or ISODATA to generate n clusters (e.g., 100) and have
no confidence in labeling q of them to an appropriate information
class (let us say 30 in this example). This is because (1) the terrain
within the IFOV of the sensor system contained at least two types of
terrain, causing the pixel to exhibit spectral characteristics unlike
either of the two terrain components, or (2) the distribution of the
mean vectors generated during the unsupervised classification
process was not good enough to partition certain important portions
of feature space. When this occurs, it may be possible to perform
cluster busting if in fact there is still some unextracted information of
value in the dataset.
Jensen,
2011
Unsupervised Cluster
Busting
First, all the pixels associated with the q clusters (30 in a hypothetical
example) that are difficult to label (e.g., mixed clusters 13, 22, 45, 92,
etc.) are all recoded to a value of 1 and a binary mask file is created. A
mask program is then run using (1) the binary mask file, and (2) the
original remote sensor data file. The output of the mask program is a
new multiband image file consisting of only the pixels that could not
be adequately labeled during the initial unsupervised classification.
The analyst then performs a new unsupervised classification on this
file, perhaps requesting an additional 25 clusters. The analyst displays
these clusters using standard techniques and keeps as many of these
new clusters as possible (e.g., 15). Usually, there are still some clusters
that contain mixed pixels, but the proportion definitely goes down.
The analyst may want to iterate the process one more time to see if
an additional unsupervised classification breaks out additional
clusters. Perhaps five good clusters are extracted during the final
iteration.
Jensen,
2011
Unsupervised Cluster
Busting
In this hypothetical example, the final cluster map would be
composed of :
• 70 good clusters from the initial classification,
• 15 good clusters from the first cluster-busting pass (recoded as
values
71 to 85), and
• 5 good clusters from the second cluster-busting pass (recoded as
values 86 to 90).
The final cluster map file may be put together using a simple GIS
maximum dominate function. The final cluster map is then recoded to
create the final classification map.
Jensen,
2011
Analog and
Digital Image
Analysis Tasks
Jensen,
2011
Single-pixel Classification versus
Object-oriented Image
Segmentation
Classification algorithms based on single-pixel analysis often are not
capable of extracting the information we desire from high-spatialresolution remote sensor data (e.g., QuickBird 61 × 61 cm). For
example, the spectral complexity of urban land-cover materials results
in specific limitations using per-pixel analysis for the separation of
human-made materials such as roads and roofs and natural materials
such as vegetation, soil, and water. Furthermore, a significant but
usually ignored problem with per-pixel characterization of land cover
is that a substantial proportion of the signal apparently coming from
the land area represented by a pixel comes from the surrounding
terrain. Improved algorithms are needed that take into account not
only the spectral characteristics of a single pixel but those of the
surrounding (contextual) pixels. We need information about the
spatial characteristics of the surrounding pixels so that we can identify
areas (often called segments or patches) of pixels that are
homogeneous.
Jensen,
2011
Object-oriented Image
Segmentation
This need has given rise to the creation of image classification
algorithms based on object-oriented image segmentation. The
algorithms incorporate both spectral and spatial information in the
image segmentation phase. The result is the creation of image objects
defined as individual areas with shape and spectral homogeneity
which one may recognize as segments or patches in the landscape. In
many instances, carefully extracted image objects can provide a
greater number of meaningful features for image classification. In
addition, objects don’t have to be derived from just image data but
can also be developed from any spatially distributed variable (e.g.,
elevation, slope, aspect, population density). Homogeneous image
objects are then analyzed using traditional classification algorithms
(e.g., nearest-neighbor, minimum distance, maximum likelihood) or
knowledge-based approaches and fuzzy classification logic.
Jensen,
2011
Object-oriented Image
Segmentation
There are many algorithms that can be used to segment an image into
relatively homogeneous image objects. Most can be grouped into two
classes:
• edge-based algorithms, and
• area-based algorithms.
Unfortunately, the majority do not incorporate both spectral and
spatial information, and very few have been used for remote sensing
digital image classification.
Jensen,
2011
Object-oriented Image
Segmentation
One of the most promising approaches to remote sensing image
segmentation was developed by Baatz and Schape (2000). The image
segmentation involves looking at individual pixel values and their
neighbors to compute a (Baatz et. al., 2001):
• color criterion (hcolor), and
• a shape or spatial criterion (hshape).
Jensen,
2011
Object-oriented Image
Segmentation
These two criteria are then used to create image objects
(patches) of relatively homogeneous pixels in the remote sensing
dataset using the general segmentation function (Sf):
S f = wcolor ⋅ hcolor + (1− wcolor) ⋅ hshape
where the user-defined weight for spectral color versus shape is
0 < wcolor < 1. If the user wants to place greater emphasis on the
spectral (color) characteristics in the creation of homogeneous
objects (patches) in the dataset, then wcolor is weighted more
heavily (e.g., wcolor = 0.8). Conversely, if the spatial characteristics
of the dataset are believed to be more important in the creation
of the homogeneous patches, then shape should be weighted
more heavily.
Jensen,
2011
Object-oriented Image
Segmentation
Spectral (i.e., color) heterogeneity (h) of an image object is
computed as the sum of the standard deviations of spectral
values of each layer (σk) (i.e., band) multiplied by the weights for
each layer (wk):
m
h = ∑ wk ⋅σ k
k =1
Jensen,
2011
Object-oriented Image
Segmentation
The color criterion is computed as the weighted mean of all
changes in standard deviation for each channel k of the m band
remote sensing dataset. The standard deviation σk are weighted
by the object sizes nob (Definiens, 2003):
m
[
h = ∑wk nmg ⋅σ k
mg
(
− nob1 ⋅σ k
ob1
+ nob2 ⋅σ k
ob2
k =1
where mg means merge.
Jensen,
2011
)]
Object-oriented Image
Segmentation
The shape criterion is computed using two landscape ecology
metrics: compactness and smoothness. Heterogeneity as
deviation from a compact shape (cpt) is described by the ratio of
the pixel perimeter length l and the square root of the number of
pixels n forming an image object (i.e., a patch):
1
cpt =
n
Jensen,
2011
Object-oriented Image
Segmentation
Shape heterogeneity may also be described as smoothness, which
is the ratio of the pixel perimeter length l and the shortest
possible border length b of a box bounding the image object (i.e.,
a patch) parallel to the raster:
l
smooth=
b
Jensen,
2011
Object-oriented Image
Segmentation
The shape criterion incorporates these two measurements using
the equation (Definiens, 2003):
hshape = wcpt ⋅ hcpt + (1− wcpt )⋅ hsmooth
where 0 < wcpt <1 is the user-defined weight for the compactness
criterion.
Jensen,
2011
Object-oriented Image
Segmentation
Jensen,
2011
The change in shape heterogeneity caused by each merge is
evaluated by calculating the difference between the situation
after and before image objects (ob) are merged. This results in
the following algorithms for computing roughness and
smoothness (Definiens, 2003):
hcpt = nmg ⋅
lmg
nmg


l
l
−  nob1 ⋅ ob1 + nob2 ⋅ ob2 


n
n
ob1
ob2 

lmg 
lob1
lob2 

hsmooth = nmg ⋅
−  nob1 ⋅
+ nob2 ⋅
bmg 
bob1
bob2 
Where n is the object size in pixels.
Jensen,
2011
Object-oriented Image
Segmentation
One criteria used to segment a remotely
sensed image into image objects is a pixel
neighborhood function, which compares an
image object being grown with adjacent
pixels. The information is used to
determine if the adjacent pixel should be
merged with the existing image object or
be part of a new image object. a) If a plane
4 neighbor- hood function is selected, then
two image objects would be created
because the pixels under investigation are
not connected at their plane borders. b)
Pixels and objects are defined as neighbors
in a diagonal 8 neighborhood if they are
connected at a plane border or a corner
point. In this example, image object 1 can
be expanded because it connects at a
diagonal corner point. This results in a
larger image object 1. Other types Jensen,
of
2011
neighborhood functions could be used.
Classification
based on
Image
Segmentation
Logic
takes into account
spatial and spectral
characteristics
Jensen,
2011
Object-oriented Image
Segmentation
The object-oriented classification of a segmented image is
substantially different from performing a per-pixel classification.
First, the analyst is not constrained to using just spectral
information. He or she may choose to use a) the mean spectral
information in conjunction with b) various shape measures
associated with each image object (polygon) in the dataset. This
introduces flexibility and robustness. Once selected, the spectral
and spatial attributes of each polygon can be input to a variety of
classification algorithms for analysis (e.g., nearest-neighbor,
minimum distance, maximum likelihood).
Jensen,
2011
Classification
based on
Image
Segmentation
Logic
takes into account
spatial and spectral
characteristics
Jensen,
2011