Unsupervised classification Introduction to Photogrammetry and Remote Sensing (SGHG 1473) Dr. Muhammad Zulkarnain Abdul Rahman Unsupervised Classification • Unsupervised classification (commonly referred to as clustering) is an effective method of partitioning remote sensor image data in multispectral feature space and extracting land-cover information. • Compared to supervised classification, unsupervised classification normally requires only a minimal amount of initial input from the analyst. • This is because clustering does not normally require training data. • The process where numerical operations are performed that search for natural groupings of the spectral properties of pixels, as examined in multispectral feature space • The clustering process results in a classification map consisting of m spectral classes. The analyst then attempts a posteriori (after the fact) to assign or transform the spectral classes into thematic information classes of interest (e.g., forest, agriculture). Unsupervised Classification • This may be difficult. Some spectral clusters may be meaningless because they represent mixed classes of Earth surface materials. • The analyst must understand the spectral characteristics of the terrain well enough to be able to label certain clusters as specific information classes. • Hundreds of clustering algorithms have been developed. Two examples of conceptually simple but not necessarily efficient clustering algorithms will be used to demonstrate the fundamental logic of unsupervised classification of remote sensor data: • Clustering using the Chain Method • Clustering using the Iterative Self-Organizing Data Analysis Technique (ISODATA). Clustering Using the Chain Method • The Chain Method clustering algorithm operates in a two-pass mode (i.e., it passes through the multispectral dataset two times). • Pass #1: The program reads through the dataset and sequentially builds clusters (groups of points in spectral space). A mean vector is then associated with each cluster. • Pass #2: A minimum distance to means classification algorithm is applied to the whole dataset on a pixel-by-pixel basis whereby each pixel is assigned to one of the mean vectors created in pass #1. The first pass, therefore, automatically creates the cluster signatures (class mean vectors) to be used by the minimum distance to means classifier. Clustering Using the Chain Method Phase 1: Cluster Building – • R, a radius distance in spectral space used to determine when a new cluster should be formed (e.g., when raw remote sensor data are used, it might be set at 15 brightness value units). • C, a spectral space distance parameter used when merging clusters (e.g., 30 units ) when N is reached. • N, the number of pixels to be evaluated between each major merging of the clusters (e.g., 2000 pixels). • Cmax, the maximum number of clusters to be identified by the clustering algorithm (e.g., 20 clusters). • Phase 2: Assignment of pixels to one of the Cmax clusters using minimum distance to means classification logic Phase 1: Cluster Building – Clustering Using the Chain • R, a radius distance in spectral space used to Method determine when a new cluster should be formed (e.g., when raw remote sensor data are used, it might be set at 15 brightness value units). • C, a spectral space distance parameter used when merging clusters (e.g., 30 units ) when N is reached. • N, the number of pixels to be evaluated between each major merging of the clusters (e.g., 2000 pixels). • Cmax, the maximum number of clusters to be identified by the clustering algorithm (e.g., 20 clusters). Phase 2: Assignment of pixels to one of the Cmax clusters using minimum distance to means classification logic Jensen, 2011 Clustering Using the Chain Method Starting at the origin of the multispectral dataset (i.e., line 1, column 1), pixels are evaluated sequentially from left to right as if in a chain. After one line is processed, the next line of data is considered. We will analyze the clustering of only the first three pixels in a hypothetical image and label them pixels 1, 2, and 3. Jensen, 2011 Original brightness values of pixels 1, 2, and 3 as measured in Bands 4 and 5 of the hypothetical remote sensed dataset. Jensen, 2011 The distance (D) in 2-dimensional spectral space between pixel 1 (cluster 1) and pixel 2 (potential cluster 2) in the first iteration is computed and tested against the value of R=15, the minimum acceptable radius. In this case, D does not exceed R. Therefore, clusters 1 and 2 are merged as shown in the next illustration. Jensen, 2011 Pixels 1 and 2 now represent cluster #1. Note that the location of cluster 1 has migrated from 10,10 to 15,15 after the first iteration. Now, pixel 3 distance (D=15.81) is computed to see if it is greater than the minimum threshold, R=15. It is, so pixel location 3 becomes cluster #2. This process continues until all 20 clusters are identified. Then the 20 clusters are evaluated using a distance measure, C (not shown), to merge the clusters that are closest to one another. Jensen, 2011 Clusters migrate during the many iterations of a clustering algorithm. The final ending point represents the mean vector that would be used in Phase #2 of the clustering process when the minimum distance classification is performed. Jensen, 2011 • Note: As more points are added to a cluster, the mean shifts less dramatically since the new computed mean is weighted by the number of pixels currently in a cluster. The ending point is the spectral location of the final mean vector that is used as a signature in the minimum distance classifier applied in Pass #2. • Some clustering algorithms allow the analyst to initially seed the mean vector for several of the important classes. The seed data are usually obtained in a supervised fashion, as discussed previously. Others allow the analyst to use a priori information to direct the clustering process. Jensen, 2011 Pass 2: Assignment of Pixels to One of the Cmax Clusters Using Minimum Distance Classification Logic The final cluster mean data vectors are used in a minimum distance to means classification algorithm to classify all the pixels in the image into one of the Cmax clusters. The analyst usually produces a co-spectral plot display to document where the clusters reside in 3-dimensional feature space. It is then necessary to evaluate the location of the clusters in the image, label them if possible, and see if any clusters should be combined. It is usually necessary to combine some clusters. This is where an intimate knowledge of the terrain is critical. Jensen, 2011 The mean vectors of the 20 clusters displayed using only bands 2 and 3 (e.g., green and red). The mean vector values are summarized in Table 9-11. Notice the substantial amount of overlap among clusters 1 through 5 and 19. Jensen, 2011 The mean vectors of the 20 clusters displayed using only bands 3 and 4 (red and near-infrared) data. The mean vector values are summarized in Table 9-11. Jensen, 2011 Results of Clustering on Thematic Mapper Bands 2, 3, and 4 of the Charleston, SC, Landsat TM scene. Jensen, 2011 Grouping (labeling) of the original 20 spectral clusters into information classes. The labeling was performed by analyzing the mean vector locations in bands 3 and 4. Jensen, 2011 ISODATA Clustering The Iterative Self-Organizing Data Analysis Technique (ISODATA) represents a comprehensive set of heuristic (rules of thumb) procedures that have been incorporated into an iterative classification algorithm. Many of the steps incorporated into the algorithm are a result of experience gained through experimentation. The ISODATA algorithm is a modification of the k-means clustering algorithm, which includes a) merging clusters if their separation distance in multispectral feature space is below a user-specified threshold, and b) rules for splitting a single cluster into two clusters. Jensen, 2011 ISODATA Clustering • ISODATA is iterative because it makes a large number of passes through the remote sensing dataset until specified results are obtained, instead of just two passes. • ISODATA does not allocate its initial mean vectors based on the analysis of pixels in the first line of data the way the two-pass chain algorithm does. Rather, an initial arbitrary assignment of all Cmax clusters takes place along an n-dimensional vector that runs between very specific points in feature space. The region in feature space is defined using the mean, µk, and standard deviation, σk, of each band in the analysis. This method of automatically seeding the original Cmax vectors makes sure that the first few lines of data do not bias the creation of clusters. Jensen, 2011 ISODATA Clustering ISODATA is self-organizing because it requires relatively little human input. Typical ISODATA algorithms normally require the analyst to specify the following criteria: • Cmax: the maximum number of clusters to be identified by the algorithm (e.g., 20 clusters). However, it is not uncommon for fewer to be found in the final classification map after splitting and merging take place. • T: the maximum percentage of pixels whose class values are allowed to be unchanged between iterations. When this number is reached, the ISODATA algorithm terminates. Some datasets may never reach the desired percentage unchanged. If this happens, it is necessary to interrupt processing and edit the parameter. Jensen, 2011 ISODATA Clustering M: the maximum number of times ISODATA is to classify pixels and recalculate cluster mean vectors. The ISODATA algorithm terminates when this number is reached. Minimum members in a cluster (%): If a cluster contains less than the minimum percentage of members, it is deleted and the members are assigned to an alternative cluster. This also affects whether a class is going to be split (see maximum standard deviation). The default minimum percentage of members is often set to 0.01. Jensen, 2011 ISODATA Clustering Maximum standard deviation (σmax): When the standard deviation for a cluster exceeds the specified maximum standard deviation and the number of members in the class is greater than twice the specified minimum members in a class, the cluster is split into two clusters. The mean vectors for the two new clusters are the old class centers ±1σ. Maximum standard deviation values between 4.5 and 7 are typical. Split separation value: If this value is changed from 0.0, it takes the place of the standard deviation in determining the locations of the new mean vectors plus and minus the split separation value. Minimum distance between cluster means (C): Clusters with a weighted distance less than this value are merged. A default of 3.0 is often used. Jensen, 2011 ISODATA Clustering Phase 1: ISODATA Cluster Building using many passes through the dataset. Phase 2: Assignment of pixels to one of the Cmax clusters using minimum distance to means classification logic. Jensen, 2011 a) ISODATA initial distribution of five (5) hypothetical mean vectors using ±1σ standard deviations in both bands as beginning and ending points. b) In the first iteration, each candidate pixel is compared to each cluster mean and assigned to the cluster whose mean is closest in Euclidean distance. c) During the second iteration, a new mean is calculated for each cluster based on the actual spectral locations of the pixels assigned to each cluster, instead of the initial arbitrary calculation. This involves analysis of several parameters to merge or split clusters. After the new cluster mean vectors are selected, every pixel in the scene is assigned to one of the new clusters. d) This split–merge–assign process continues until there is little change in class assignment between iterations (the T threshold is reached) or the Jensen, maximum number of iterations is 2011 reached (M). a) Distribution of 20 ISODATA mean vectors after just one iteration using Landsat TM band 3 and 4 data of Charleston, SC. Notice that the initial mean vectors are distributed along a diagonal in 2-dimensional feature space according to the ±2σ standard deviation logic discussed. b) Distribution of 20 ISODATA mean vectors after 20 iterations. The bulk of the important feature space (the gray background) is partitioned rather well after just 20 iterations. Jensen, 2011 Classification Based on ISODATA Clustering Jensen, 2011 ISODATA Clustering Logic Jensen, 2011 Unsupervised Cluster Busting It is common when performing unsupervised classification using the chain algorithm or ISODATA to generate n clusters (e.g., 100) and have no confidence in labeling q of them to an appropriate information class (let us say 30 in this example). This is because (1) the terrain within the IFOV of the sensor system contained at least two types of terrain, causing the pixel to exhibit spectral characteristics unlike either of the two terrain components, or (2) the distribution of the mean vectors generated during the unsupervised classification process was not good enough to partition certain important portions of feature space. When this occurs, it may be possible to perform cluster busting if in fact there is still some unextracted information of value in the dataset. Jensen, 2011 Unsupervised Cluster Busting First, all the pixels associated with the q clusters (30 in a hypothetical example) that are difficult to label (e.g., mixed clusters 13, 22, 45, 92, etc.) are all recoded to a value of 1 and a binary mask file is created. A mask program is then run using (1) the binary mask file, and (2) the original remote sensor data file. The output of the mask program is a new multiband image file consisting of only the pixels that could not be adequately labeled during the initial unsupervised classification. The analyst then performs a new unsupervised classification on this file, perhaps requesting an additional 25 clusters. The analyst displays these clusters using standard techniques and keeps as many of these new clusters as possible (e.g., 15). Usually, there are still some clusters that contain mixed pixels, but the proportion definitely goes down. The analyst may want to iterate the process one more time to see if an additional unsupervised classification breaks out additional clusters. Perhaps five good clusters are extracted during the final iteration. Jensen, 2011 Unsupervised Cluster Busting In this hypothetical example, the final cluster map would be composed of : • 70 good clusters from the initial classification, • 15 good clusters from the first cluster-busting pass (recoded as values 71 to 85), and • 5 good clusters from the second cluster-busting pass (recoded as values 86 to 90). The final cluster map file may be put together using a simple GIS maximum dominate function. The final cluster map is then recoded to create the final classification map. Jensen, 2011 Analog and Digital Image Analysis Tasks Jensen, 2011 Single-pixel Classification versus Object-oriented Image Segmentation Classification algorithms based on single-pixel analysis often are not capable of extracting the information we desire from high-spatialresolution remote sensor data (e.g., QuickBird 61 × 61 cm). For example, the spectral complexity of urban land-cover materials results in specific limitations using per-pixel analysis for the separation of human-made materials such as roads and roofs and natural materials such as vegetation, soil, and water. Furthermore, a significant but usually ignored problem with per-pixel characterization of land cover is that a substantial proportion of the signal apparently coming from the land area represented by a pixel comes from the surrounding terrain. Improved algorithms are needed that take into account not only the spectral characteristics of a single pixel but those of the surrounding (contextual) pixels. We need information about the spatial characteristics of the surrounding pixels so that we can identify areas (often called segments or patches) of pixels that are homogeneous. Jensen, 2011 Object-oriented Image Segmentation This need has given rise to the creation of image classification algorithms based on object-oriented image segmentation. The algorithms incorporate both spectral and spatial information in the image segmentation phase. The result is the creation of image objects defined as individual areas with shape and spectral homogeneity which one may recognize as segments or patches in the landscape. In many instances, carefully extracted image objects can provide a greater number of meaningful features for image classification. In addition, objects don’t have to be derived from just image data but can also be developed from any spatially distributed variable (e.g., elevation, slope, aspect, population density). Homogeneous image objects are then analyzed using traditional classification algorithms (e.g., nearest-neighbor, minimum distance, maximum likelihood) or knowledge-based approaches and fuzzy classification logic. Jensen, 2011 Object-oriented Image Segmentation There are many algorithms that can be used to segment an image into relatively homogeneous image objects. Most can be grouped into two classes: • edge-based algorithms, and • area-based algorithms. Unfortunately, the majority do not incorporate both spectral and spatial information, and very few have been used for remote sensing digital image classification. Jensen, 2011 Object-oriented Image Segmentation One of the most promising approaches to remote sensing image segmentation was developed by Baatz and Schape (2000). The image segmentation involves looking at individual pixel values and their neighbors to compute a (Baatz et. al., 2001): • color criterion (hcolor), and • a shape or spatial criterion (hshape). Jensen, 2011 Object-oriented Image Segmentation These two criteria are then used to create image objects (patches) of relatively homogeneous pixels in the remote sensing dataset using the general segmentation function (Sf): S f = wcolor ⋅ hcolor + (1− wcolor) ⋅ hshape where the user-defined weight for spectral color versus shape is 0 < wcolor < 1. If the user wants to place greater emphasis on the spectral (color) characteristics in the creation of homogeneous objects (patches) in the dataset, then wcolor is weighted more heavily (e.g., wcolor = 0.8). Conversely, if the spatial characteristics of the dataset are believed to be more important in the creation of the homogeneous patches, then shape should be weighted more heavily. Jensen, 2011 Object-oriented Image Segmentation Spectral (i.e., color) heterogeneity (h) of an image object is computed as the sum of the standard deviations of spectral values of each layer (σk) (i.e., band) multiplied by the weights for each layer (wk): m h = ∑ wk ⋅σ k k =1 Jensen, 2011 Object-oriented Image Segmentation The color criterion is computed as the weighted mean of all changes in standard deviation for each channel k of the m band remote sensing dataset. The standard deviation σk are weighted by the object sizes nob (Definiens, 2003): m [ h = ∑wk nmg ⋅σ k mg ( − nob1 ⋅σ k ob1 + nob2 ⋅σ k ob2 k =1 where mg means merge. Jensen, 2011 )] Object-oriented Image Segmentation The shape criterion is computed using two landscape ecology metrics: compactness and smoothness. Heterogeneity as deviation from a compact shape (cpt) is described by the ratio of the pixel perimeter length l and the square root of the number of pixels n forming an image object (i.e., a patch): 1 cpt = n Jensen, 2011 Object-oriented Image Segmentation Shape heterogeneity may also be described as smoothness, which is the ratio of the pixel perimeter length l and the shortest possible border length b of a box bounding the image object (i.e., a patch) parallel to the raster: l smooth= b Jensen, 2011 Object-oriented Image Segmentation The shape criterion incorporates these two measurements using the equation (Definiens, 2003): hshape = wcpt ⋅ hcpt + (1− wcpt )⋅ hsmooth where 0 < wcpt <1 is the user-defined weight for the compactness criterion. Jensen, 2011 Object-oriented Image Segmentation Jensen, 2011 The change in shape heterogeneity caused by each merge is evaluated by calculating the difference between the situation after and before image objects (ob) are merged. This results in the following algorithms for computing roughness and smoothness (Definiens, 2003): hcpt = nmg ⋅ lmg nmg l l − nob1 ⋅ ob1 + nob2 ⋅ ob2 n n ob1 ob2 lmg lob1 lob2 hsmooth = nmg ⋅ − nob1 ⋅ + nob2 ⋅ bmg bob1 bob2 Where n is the object size in pixels. Jensen, 2011 Object-oriented Image Segmentation One criteria used to segment a remotely sensed image into image objects is a pixel neighborhood function, which compares an image object being grown with adjacent pixels. The information is used to determine if the adjacent pixel should be merged with the existing image object or be part of a new image object. a) If a plane 4 neighbor- hood function is selected, then two image objects would be created because the pixels under investigation are not connected at their plane borders. b) Pixels and objects are defined as neighbors in a diagonal 8 neighborhood if they are connected at a plane border or a corner point. In this example, image object 1 can be expanded because it connects at a diagonal corner point. This results in a larger image object 1. Other types Jensen, of 2011 neighborhood functions could be used. Classification based on Image Segmentation Logic takes into account spatial and spectral characteristics Jensen, 2011 Object-oriented Image Segmentation The object-oriented classification of a segmented image is substantially different from performing a per-pixel classification. First, the analyst is not constrained to using just spectral information. He or she may choose to use a) the mean spectral information in conjunction with b) various shape measures associated with each image object (polygon) in the dataset. This introduces flexibility and robustness. Once selected, the spectral and spatial attributes of each polygon can be input to a variety of classification algorithms for analysis (e.g., nearest-neighbor, minimum distance, maximum likelihood). Jensen, 2011 Classification based on Image Segmentation Logic takes into account spatial and spectral characteristics Jensen, 2011
© Copyright 2024