Applied Soft Computing 11 (2011) 5745–5754 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc A hybrid ensemble approach for the Steiner tree problem in large graphs: A geographical application Abdelhamid Bouchachia a,∗ , Markus Prossegger b a b University of Klagenfurt, Department of Informatics, Group of Software Engineering and Soft Computing, Klagenfurt 9020, Austria Carinthia University of Applied Sciences, School of Network Engineering and Communication, Klagenfurt 9020, Austria a r t i c l e i n f o Article history: Received 16 June 2010 Received in revised form 26 December 2010 Accepted 12 March 2011 Available online 27 March 2011 Keywords: Parallel ant colony optimization Spectral clustering Steiner tree problem Ensemble clustering Divide-and-conquer a b s t r a c t Hybrid approaches are often recommended for dealing in an efficient manner with complex problems that require considerable computational time. In this study, we follow a similar approach consisting of combining spectral clustering and ant colony optimization in a two-stage algorithm for the purpose of efficiently solving the Steiner tree problem in large graphs. The idea of the two-stage approach, called ESC–IAC, is to apply a divide-and-conquer strategy which consists of breaking down the problem into sub-problems to find local solutions before combining them. In the first stage, graph segments (clusters) are generated using an ensemble spectral clustering method for enhancing the quality; whereas in the second step, parallel independent ant colonies are implemented to find local and global minima of the Steiner tree. To illustrate the efficiency and accuracy, ESC–IAC is applied in the context of a geographical application relying on real-world as well as artificial benchmarks. © 2011 Elsevier B.V. All rights reserved. 1. Introduction Combining various computational models for building systems aims at capitalizing on the insufficiency and/or shortcoming of each of the models involved in order to achieve highly efficient systems. Hybridization assumes that the models are complementary. Viewing this from the perspective of optimization in complex systems, the goal is to tackle the multifaceted complexity via a divideand-conquer-like strategy which consists of decomposing large problems into smaller tractable sub-problems. Often a bottom-up approach is adopted in order to find the final solution. In this paper, we investigate a hybrid two-phase approach relying on a divide-and-conquer strategy to deal with large geographical data sets involving a combinatorial optimization problem. Applications like wiring and pipelining in urban areas are typically complex problems. They require searching the famous minimal Steiner tree in the huge graphs that model the real-world topology of the urban areas. Because optimization for hard problems is often accomplished by means of search heuristics, the optimal solution may only be approximated. The present paper suggests the application of an instance of swarm intelligence algorithms, that is ant colony optimization ∗ Corresponding author. E-mail addresses: [email protected] (A. Bouchachia), [email protected] (M. Prossegger). 1568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2011.03.005 (ACO). This meta heuristic relies on a natural metaphor inspired from real ant colonies behavior. We are interested in investigating the application of divide-and-conquer intertwined with ant colony systems to the Steiner tree problem (STP) [13,15]. In general terms, we investigate in the present research a multi-colony strategy stemming from the divide-and-conquer concept to solve STP modeling the problem of wiring and pipelining in an urban area (e.g., city). Given that the geographical data representing a map looks like a huge graph whose vertices are the topological elements of the area, the application of ACO is well justified by the fact that STP in its most versions is NP-complete [16]. However, ACO alone may not be able to cope with such a complexity, hence the application of clustering. In fact the graph is segmented using spectral clustering producing a set of subgraphs (regions or clusters). With the aim of enhancing the quality of the resulting clusters, we apply an ensemble method for clustering. Three spectral clustering algorithms are used. Their combination generate the final segmentation of the data which will be used as input to the next stage. During such a stage, the ACO algorithm attempts to find a local minimal Steiner trees on the subgraphs and to compute a global minimal Steiner tree on the hypergraph resulting from combining the clusters. We apply a parallel version of ACO, that is, parallel independent ant colonies (IAC), to efficiently handle the optimization problem. In all, the approach combines ensemble spectral clustering (ESC) and IAC to cope with the optimization complexity, hence the name ESC–IAC. Before delving into the details of ESC–IAC, the structure of the paper is presented as follows. Section 2 introduces some prelimi- 5746 A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754 are independent, parallelism can be used to compute a minimal Steiner tree in each of them. This is achieved by parallel independent ant colonies. In a nutshell, the required steps of ESC–IAC are highlighted in Algorithm 1. Algorithm 1. 1: 2: 3: Algorithmic steps The graph is clustered using each of the algorithms described in Section ce:cross-ref refid=”sec4”/ce:cross-ref to obtain k clusters. Here there are two alternatives: (1) use individual results of the algorithms, (2) use of the best results and (3) use of the combination of the individual clustering results. Once obtained, the proposed independent ant colony system is used to calculate the local Steiner trees. This local solutions are then compressed in the form of hypergraphs to allow the calculation of the global Steiner tree. Once the global solution is computed, an expansion (reconstruction) is applied to obtain the minimal Steiner tree in the original graph. Fig. 1. Topology of an urban area. naries and context of the present research. Section 3 is providing the description of ESC–IAC highlighting the spectral clustering algorithms in Section 3.1 and the independent ant colonies strategy in Section 3.3. In Section 4, a set of experiments are discussed to show the effectiveness of the proposed approach. Section 5 highlights the contributions and future work. 2. Preliminaries The problem is formulated on an undirected graph G = (V, E, d), consisting of n = | V | vertices and m = | E | edges. The distance of an edge e ∈ E between two vertices i ∈ V and j ∈ V is given as a cost function dij : E → R. The Steiner tree Problem is about calculating a minimum spanning tree in G connecting a given set of terminals T ⊂ V. Any non-terminals V \ T spanned by the Steiner tree are called Steiner Points. This is known to be a challenging and hard problem [16]. This Steiner tree problem can be encountered in various applications like electrical power supply networks, telecommunication networks, routing, VLSI, etc. For instance in our case, the problem is to minimize the cable routing (connection) costs such that all buildings (i.e., terminals T of graph G) get connected (i.e. included in the spanning tree). Clearly, our approach is intended to deal with complex graphs resulting from geographical data. The initial step of our investigation is to construct the graph from the geographical data (map). The vertices of such a graph represent the topological elements of the map; whereas edges represents the connections between these topological elements. The vertices to be connected by the Steiner tree are marked as terminals. The resulting graph, as a modeling instrument, will allow to simulate and optimize routes in presence of real-world topology. A sample of such graph is illustrated in Fig. 1 showing the order of complexity we are facing in our investigations. Therefore, our approach aims at handling graphs characterized by n > 9000 and m > 10, 000. The number of terminals |T | is at least 500. It is worth mentioning that due to the fact that finding the minimal Steiner tree is an offline task, we have implemented the approach to handle huge data structures instead of focusing just on time constraints. 3. ESC–IAC approach The investigated approach ESC–IAC aims at dealing with minimal Steiner trees in complex graphs. Relying on the concept of divide and conquer, this approach is hybrid in the sense of involving two different mechanisms: spectral clustering and ant colony optimization. The first mechanism allows segmenting large graphs into subgraphs. Using ACO, we intend to obtain from the subgraphs local minimal Steiner trees. These trees are then combined and refined to obtain a final solution to the original graph. Because subgraphs The stages of the approach are described below in Section 3.1 and Section 3.3. 3.1. Spectral clustering Clustering aims at partitioning data into compact clusters. In general this is achieved by minimizing the intra-cluster distances and maximizing the between-clusters distances. Data points lying inside the same clusters are closer to each other than to those lying in other clusters. This criterion applies also to graph clustering. Partitions of a graph correspond to disconnected subgraphs (clusters), such that each cluster is strongly connected and weakly connected to outside. There exist two popular implementations of this criterion relying on the notion of min-cut and max-flow and many variants of these [9]. However, the problem of obtaining the optimal cut is in general NP-hard. To overcome this difficulty, often spectral relaxation techniques relying on the matrix representation of graphs are applied [8,26]. These techniques relate graph partitions to the eigenvectors of the graph matrix L or its Laplacian (D–L). It is well established in the literature (see for instance [4,25,27]) that spectral clustering is the most efficient graph partitioning technique compared to the other techniques such as: (i) recursive bisection, (ii) geometry-based partitioning (Coordinate bisection, Inertial bisection, Geometric partitioning), (iii) Greedy algorithms. It is important also to note that there exist many varieties like multilevel partitioning which, in principle, can rely on any partitioning algorithm at each level, but due to efficiency reasons, authors always prefer to apply spectral clustering [12,1]. On the other hand classical clustering techniques depart from the assumption that objects are described by feature vectors. The dissimilarity between objects is computed by means of a distance measure. On the contrast, graphs are objects that cannot easily be described using the feature vector representation [31]. They involve notions like connectivity, reachability, degree, etc. Moreover, in general edge weights do not correspond to a distance. It becomes clear that classical clustering algorithms like FCM do not fit graph partitioning very well. The reason is simple as mentioned; such algorithms do not use the characteristics of graphs. If applied they will generate very poor partitions which are mostly useless. There are some attempts to apply fuzzy clustering but in conjunction with spectral clustering [14]. In the present paper, we use three spectral graph partitioning algorithms. These are an updated version of [24,21,19] respectively. A brief description of each of these algorithms will follow. The algorithm proposed by Ng et al. [24] relies on the computation of eigenvectors of the normalized affinity matrix. The idea of the algorithm is to infer the partitions of the original data from clustering the eigenvectors of the largest eigenvalues of the affinity matrix. While in the original algorithm, k-means is used, in this study we rely on the kernelized fuzzy c-means clustering algo- A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754 5747 rithm proposed by Bouchachia and Pedrycz in [3]. The steps of the algorithm are portrayed in Algorithm 2. Algorithm 2. 1: 2: 3: 4: 5: 6: First spectral clustering algorithm Let X be the set of points (i.e., graph vertices) to be clustered: X = {x1 , . . . , xn } and k the number of clusters. Compute the weight (or affinity) matrix S ∈ Rn×n using a similarity / j and Sii = 0). measure (for instance Sij = exp(−||xi − xj ||2 /2 2 ) if i = Define D to be the diagonal matrix whose (i, i)-element is the sum of S’s i −1/2 −1/2 SD . th row and compute the matrix L = D Compute the first k largest eigenvalues (e1 , . . . , ek ) of L. Form the matrix V = [v1 , . . . , vk ] containing the corresponding eigenvectors (arranged column wise). Form the matrix Y ∈ Rn×k from V by normalizing each of the V ’s rows to k have norm 1: yij = vij /( 7: 8: j v2ij ) 1/2 . Apply the kernelized fuzzy c-means to cluster Y . Assign the vertices xi to cluster j if and only if yi was assigned to cluster j. Fig. 2. Segmentation of an urban area using clustering with k = 6. The second algorithm applied in this study is proposed by Meila [21]. It is similar to the previous algorithm up to some details. In the previous algorithm, the Laplacian matrix (D−1/2 SD−1/2 ) is used as input to the algorithm and a normalization of the rows of the selected k eigenvectors is performed. The main steps of the second algorithm are given in Algorithm 3 Algorithm 3. 1: 2: 3: 4: 5: 6: Second spectral clustering algorithm Let X be the set of points (i.e., graph vertices) to be clustered: X = {x1 , . . . , xn } and k the number of clusters. Compute the edge weight matrix S ∈ Rn×n . Compute the first k largest eigenvalues (e1 , . . . , ek ) of S. Form the matrix V = [v1 , . . . , vk ] containing the corresponding eigenvectors (arranged column wise). Apply the k-means to cluster V . Assign the vertices xi to cluster j if and only if yi was assigned to cluster j. Fig. 3. Ensemble clustering. The third algorithm is proposed by Lim et al. [19] and follows the same idea of the previous algorithms. However, it differs in the sense that it require the matrix S to be double stochastic (see Algorithm 4) and does not need the computation of Laplacian matrix or a normalization of the rows of the selected k eigenvectors. 3.2. Ensemble spectral clustering Algorithm 4. • Input of the algorithm (various instances or various subsets of features) • Clustering algorithm • The parameter setting of the algorithm 1: 2: 3: Third spectral clustering algorithm Let X be the set of points (i.e., graph vertices) to be clustered: X = {x1 , . . . , xn } and k the number of clusters. Compute the edge weight matrix S ∈ Rn×n . Make the matrix S double stochastic (that is, all its eigenvalues are real them exactly equal to one) and smaller than or equal to one, with one of cost(x, x ) = 1, for all by normalizing the costs of the edges node ( x∈X 4: 5: 6: 7: x ∈ X). Compute the first k largest eigenvalues (e1 , . . . , ek ) of S. Form the matrix V = [v1 , . . . , vk ] containing the corresponding eigenvectors (arranged column wise). Apply the k-means to cluster V . Assign the vertices xi to cluster j if and only if yi was assigned to cluster j. Because we are targeting quite complex problems represented as huge graphs, one could apply an approach similar to multilevel clustering [2] where the original graph G(V, E) is approximated by another less complex but coarse graph Gc (Vc , Ec ). The latter is then partitioned before mapping back (expanding) the obtained clusters to the original graph. If Gc is also large, then an approximation of Gc is computed and clustered. This procedure is recursively executed as long as the approximation is still large. We do proceed the other way round by clustering the graph resulting in a set of subgraphs. Once processed, these subgraphs are compressed and connected to produce a hypergraph which is then processed. However, the similarity between our approach and multilevel clustering is worth mentioning, since it will be the focus of our future investigations. To illustrate the clustering procedure, let us consider an urban area. Using the spectral clustering (Algorithm 2) and setting k to 6, we obtain the result shown in Fig. 2. Ensemble clustering methods have been the subject of intensive research along with ensemble classification methods [10,11]. The idea consists of generating partitions of the data by changing: The motivation of ensemble clustering methods is to take advantage of the diversity of clusterings in order to enhance the quality of the clustering results. As shown in Fig. 3, ensemble clustering consists of two stages: (i) clusterings generation by varying different aspects, and (ii) combination of clusterings relying on a consensus function that finds commonalities of the base clusterings. Among other ensemble methods, there are three major class methods: graph-based methods, greedy optimization based methods, and matrix-based methods [30]. In the first class of methods, the clusters represent the hyperedges of a hypergraph where the nodes are the data points. Three methods were developed by Stehl [28]: cluster-based similarity partitioning algorithm (CSPA), hyper-graph partitioning algorithm (HGPA), and meta-clustering algorithm (MCLA). Other methods based on bipartite graphs are proposed in [7,6]. In the second class of methods, the goal is to find the optimal consensus between clusterings. As proposed in [28], the best combination of clusterings is first found based on average normalized mutual information (see Eq. (4)), then an attempt to find better labeling by randomly moving data points to another cluster. In the third class of methods [18,23], the idea is to combine the clustering results (clustering matrices of the base clusterings) into other matrices such as co-association, consensus, and nonnegative matrix to determine the final labeling. 5748 A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754 Often to compare two clustering C = {c1 , c2 , . . . } and K = {k1 , k2 , . . . } of n data points, the normalized mutual information (NMI) [28] which will be also used in this study. NMI measures the overlap between clusterings and is based on two measures: the mutual information (I) and entropy (H) which are given as follows: |c ∩ k| I(C, K) = c∈C k∈K and |c| H(C) = − log2 n c∈C n log2 n|c ∩ k| |c||k| |c| n (1) (2) NMI is then expressed as: NMI(C, K) = I(C, K) (3) H(C)H(K) If NMI = 1, then both clusterings are the same. Moreover, if we want to check the similarity of the base clusterings to the ensemble’s result, we may use NMI (Eq. (3)). Another way to do this is to rely on the averaged NMI measure which is given as: ANMI(E, C) = 1 NMI(E, C) |E| Once local minimal Steiner trees are found, the clusters are then compressed yielding a hypergraph (where nodes represent clusters and edges represent cluster neighborhood). A (hyper) ant colony is then applied on the derived hypergraph Gh = (Vh , Eh ) to compute a (hyper) minimal Steiner tree (designated as the global minimal Steiner tree). As indicated by Step 3 of Algorithm 1, an expansion (reconstruction) is applied to obtain a minimal Steiner tree of the original graph. This expansion is simply the aggregation of local optimal Steiner trees into the global tree. A colony c is characterized by a number of parameters that correlate with the actual graph size and in particular with the number of terminals. Before describing the algorithmic steps of IAC, we shall define the symbols to be used and their default values for all benchmarks in our experimental evaluation. (4) E∈E where E is the set of individual clusterings. In this paper we will rely on the graph-based methods described in [28] which refer to Eq. (4). The basic clusterings are generated using the three algorithms presented in the previous section (Algorithms 2, 3, 4). • ]nants : Number of ants within one colony (500) • ]minants : Min. number of ants per nest (50) • ]pants : Percentage of ants on best path to enlarge colony space (2/3) • ]ncycles : Number of cycles until forcing ant’s move (30) • ]˛: Order of trace effect (2) • ]ˇ: Order of effect of ant’s sight (0.1) • ]e: Evaporation coefficient (0.2) In a colony c, an ant a moves from a vertex i to a vertex j with a probability expressed by the following transition rule: ˇ ˇ ˛ ik ik (5) k/ ∈Toura 3.3. Parallel ant colony optimization Step 2 of Algorithm 1 is realized using independent ant colonies (IAC). In this scheme of parallel ant colony optimization, colonies correspond to clusters produced by the ensemble spectral clustering (Section 3.2) and resulting from Step 1 of Algorithm 1. During the simulation, each of the ant colonies is assigned to one processing unit. Each colony computes a local minimal Steiner tree on subgraphs (i.e., cluster of urban areas). Interestingly enough, the application of IAC is well motivated by the nature of the problem but also by their performance (less communication overhead). To enable a time near optimization, the graph G has to be clustered into k subgraphs Gc = (Vc , Ec ) where c = 1 · · · k and Gc ⊂ G. To find a minimal Steiner tree in each subgraph, a multiple ant colonies approach is applied. The ants of the colony associated with a given subgraph are split into sub-colonies in order to tackle the complexity of the graph and to enhance the search efficiency. Hence, the ant colony optimization consists of two levels: (i) each subgraph is assigned a colony acting independently, but (ii) each colony is divided into sub-colonies that communicate at the end of an execution cycle. In precise terms, each sub-colony picks a random vertex i of the subgraph as its own nest (i.e., starting point) before the conventional ACO is applied to find the minimal Steiner tree connecting all terminals Tc ⊂ T of the respective cluster (all terminals in the subgraph have to be included in the minimal tree). This methodology is different from the known parallel ACO schemes discussed by several authors [5,20,22,29]. Independent sub-colonies run in parallel (on different processors) but at the end of each execution cycle, if the partial solutions obtained by sub-colonies share a vertex, then these colonies are merged together. However, because this merge operation might yield a subgraph that contains a cycle, it is important to re-initialize a single sub-colony, again solving the STP on the merged subgraph. ij˛ ij Pija = where ij represents the intensity of the pheromone between vertex i and vertex j. The parameter ˛ and ˇ indicate the influence of ij and ij respectively. Parameter ij = 1/dij is the visibility of vertex j from vertex i which is inversely proportional to the cost dij between i and j. Toura indicates the tour made by the ant a. Once a cycle is completed, the pheromone matrix is update according to the Eq. (6) ij (t + 1) = (1 − e)ij (t) + ij + b (6) such that ij = nants ija (7) s=1 where ija indicates the change of the pheromone intensity on edge (i, j) produced by the a th ant. If the current tour is the best one, an additional bonus b = 0.2 is added (otherwise b = 0). This change is quantified as follows: ija = 1 La 0 if ant a traverses along the edge (i, j) (8) otherwise where La is the length (i.e., cumulated edge costs) of tour found by the a th ant. Relying on the given expressions, the local Steiner tree is computed for each of the graph clusters Gc using the algorithm IAC-COPY (shown in Algorithm 5). Each ant a of a colony c starts its tour from a random terminal t ∈ T which stands for a nest (Tourant = {random(T)}). Each colony c is enlarged by adding the verc tices of the tours Tourbest as nests Vc . If two colonies c1 and c2 share a vertex v1 ∈ Vc1 and v1 ∈ Vc2 , they are merged and a single subcolony solves the STP on the merged subgraph to get rid of potential cycles. This procedure is repeated until only one colony remains in the cluster. A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754 Then once these local optimization problems are solved, the global minimal Steiner tree is calculated using again Algorithm 5. Note that IAC-COPY refers to the functions INITIALIZE() and CYCLE(). The former allows to initialize different parameters especially the pheromone matrix, while the latter illustrates the conventional steps of ACO. Algorithm 5. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: IAC-COPY() 4. Empirical evaluation To evaluate the proposed ESC–IAC, we use a number of graph instances available from the SteinLib Testdata Library [17]. For the sake of illustration, we use the benchmarks es500fst01 and es500fst02 obtained from the SteinLib testdata library and one realworld benchmark urban500 representing an area in an Austrian city. The details of these three benchmarking data sets are shown in /*initialization*/ for each colony c ∈ C do Choose a random terminal as nest Vc INITIALIZE(Vc , nants , minants ) end for repeat for each colony c ∈ C do for m = 1 to ncycles do c Build the tour T ourm using CYCLE(Ac ) c c Find the colony’s best tour T ourbest with min(cost(T ourm )) if Percentage of ants running on the best tour ≥ pants then Break the loop (i.e., that tour is supposed to be the optimal solution) end if end for c c Add all vertices of T ourbest to the colony c (Vc T ourbest ) end for if two sub-colonies ci and cj share some vertices, i.e., Vci Vcj = Ø then Merge ci and cj to obtain a new sub-colony ck (Vck = Vci Vcj then remove ci and cj end if /*Initialize next step*/ for each colony c ∈ C do INITIALIZE(Vc , nants minants ) end for until One colony c owns all terminals T ⊆ Vc Algorithm 6. INITIALIZE(Vc , nAnts , minAnts ) Require: The nests Vc of the colony c (i.e. the starting nodes of the subcolonies) Require: Number of ants in colony nAnts > 0 Require: Minimum ants per vertex minAnts > 0 1: Initialize local pheromone matrix with 10−4 2: Initialize local sight matrix with inverse edge costs 3: for each vertex v ∈ Vc do c| 4: Place max( n|V , minAnts ) ants on v Ants 5: Initialize the tours T oura = {v} of the placed ants 6: end for Algorithm 7. 5749 CYCLE(Ac ) Require: The ants Ac of colony c 1: for each ant a ∈ Ac do 2: repeat 3: Apply the transition rule (Eq.5) to choose any vertex v such that / Vc and v is not member of the tou / T oura ) v∈ r of ant a (v ∈ 4: Add vertex v into ant’s tour T oura v 5: until v ∈ T // T is the set of terminals 6: end for 7: Update the pheromone matrix (Eq. 6). 5750 A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754 Table 1 Graphs from the SteinLib library. Name |V | |E | |T | Optimum es500fst01 es500fst02 1.250 1.408 1.763 2.056 500 500 162.978.810 160.756.854 Name |V | |E | |T | Optimum urban500 9.128 12.409 569 – Table 2 Real-world graph. Tables 1 and 2 respectively. While for es500fst01 and es500fst02, the cost of the minimal Steiner tree is known, that of urban500 is not. One can also notice that the number of terminals in these benchmarks is very high meeting the purpose of our ESC–IAC approach. Because the huge size of the real-world graph, urban500, the optimum Steiner tree requires high computational time, we decided to use the minimum spanning tree heuristic MST to get comparable results. This heuristic aims at finding the shortest paths between the terminals in order to build up a minimum spanning tree MST(T). To explore the performance of ESC–IAC, three investigations are conducted. The first aims at exploring the effect of changing the number of clusters on the overall execution time of ESC–IAC. The second set of experiments deal with the quality of the results obtained by the algorithm on each the benchmarks described earlier, whereas the last experiments aim at comparing ESC–IAC with the conventional ACO. 4.1. Effect of clustering Given that the size of benchmark graphs is big, it is desirable to check the effect of graph segmentation via clustering. Recall that the number of subgraphs corresponds to the number of clusters and each cluster contains a number of colonies. In this experiment, the effect of clustering is observed from the computational time of the whole algorithm ESC–IAC, that is the ensemble spectral clustering algorithm followed by the IAC optimization when executed on the graphs. But, first the ensemble clustering results are presented. Figs. 4(a)–(d), 5(a)–(d), and 6(a)–(h) show the results of the ensemble clustering on two data sets, es500fst02 and urban500 by setting the number of clusters to 6 and 12. Fig. 4. Ensemble clustering of es500fst02 data set (6 clusters). A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754 5751 Fig. 5. Ensemble clustering of es500fst02 data set (12 clusters). Table 3 Similarity of the individual clusterings to the ensemble clustering. Table 4 Effect of the cluster number on the execution time. Instance # Clusters Algorithm 2 Algorithm 3 Algorithm 4 Instance # Cluster Time [s] es500fst01 06 12 0.89406 0.95618 1 0.88000 0.85873 0.85595 urban500 es500fst02 06 12 1.0000 0.9728 1.0000 0.9728 0.77981 0.87096 1 6 12 10, 000 1500 700 es500fst01 Urban500 06 12 0.80597 0.61835 0.87244 0.66017 0.63059 0.60881 1 6 12 930 120 90 es500fst02 1 6 12 1100 150 80 Moreover the similarity of the individual clusterings to the final one obtained by consensus is displayed in Table 3. The columns with labels Algorithm 2, Algorithm 3, and Algorithm 4 mean the value of NMI between the ensemble clustering and the different clusterings. One can notice that Lim et al.’s algorithm (Algorithm 4) offers in most cases the closest results to the consensus results. Based on this one could use the best algorithm or the ensemble. Coming to the computational time, Table 4 shows the execution time of the algorithm when the number of clusters is set to 6 and 12 for each of the graphs. The first number of clusters corresponds to the number of clusters used by the communication network, while the second corresponds to the number of electoral districts and used for comparison purposes. The results show clearly that a significant improvement of the ESC–IAC’s efficiency is achieved by the spectral clustering ensemble. The time required to compute the minimal Steiner tree decreases proportionally as the number of clusters increases. For instance, in the case of es500fst01, the ESC–IAC algorithm saves 90.3% of the time (compare the results with 1 cluster against those with 12 clusters), whereas the optimization result is less than 3.1% worse as will be discussed in the next section. In the case of es500fst02 and urban500, the time gain is 92.7% and more than 93% respectively. 5752 A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754 Fig. 6. Ensemble clustering of Urban500 data set (6 and 12 clusters). A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754 Table 5 Results related to es500fst01 [optimum=162.978.810]. 5753 tree using the sequential approach is about 0.3% lower but the average time needed for the optimization is about 550% higher. Approach # Clusters Average result Obtained-Optimum [%] MST ESC–IAC ESC–IAC ESC–IAC 1 1 6 12 171.542.977 167.602.151 171.039.501 172.793.480 +5.25 ± 0 +2.84 ± 0.17 +4.95 ± 0.27 +6.02 ± 0.31 Table 6 Results related to es500fst02 [optimum=160.756.854]. Approach # Cluster Average result Obtained-Optimum [%] MST ESC–IAC ESC–IAC ESC–IAC 1 1 6 12 170.945.745 167.196.505 174.114.834 175.622.291 +6.34 ± 0 +4.01 ± 0.27 +8.31 ± 0.41 +7.76 ± 0.38 Table 7 Results related to urban500 with unknown optimum. Approach # Cluster Average result Obtained-MST value [%] MST ESC–IAC ESC–IAC ESC–IAC 1 1 6 12 19.608 19.350 19.724 19.886 0 − 1.32 ± 0.28 +0.59 ± 0.33 +1.42 ± 0.27 5. Conclusions The present paper introduces a new approach to deal with minimal Steiner trees. Methodologically, the novelty concerns (1) parallelism in ant colony optimization which is enhanced by the ensemble spectral clustering and (2) handling large and complex problems by ant colony systems. The ESC–IAC approach consists of three main steps: spectral clustering to segment large graphs, application of multiple colonies on each graph segment to find local solutions, then application of ant colony to the hypergraph obtained by compressing the graph segments. The empirical studies show that ESC–IAC can be successfully applied on real-world complex problems and compares very well to standard algorithms. As future work, it would be interesting to investigate the ESC–IAC algorithm in order to handle real-world constraints especially in the context of spectral clustering algorithm. The current version is general for all applications modeled as graphs; hence it might be seen as “naive”. However, in geographical applications various constraints are encountered. Another interesting aspect is multilevel clustering which is worth applying in the context of such applications. 4.2. Performance of ESC–IAC References The optimization results for the instance es500fst01 and es500fst02 are displayed in Tables 5 and 6 respectively, whereas those related to the urban graph stemming from the real-world geoinformation data are shown in Table 7. These results illustrate the numerical optimization results (cost of the solution), the total time needed for clustering and sequential optimization, and the difference to the known cost of the optimum Steiner tree. The first outcome of this set of experiments suggests that the more clusters used, the less time is needed. The most important outcome is that pertaining to the quality of the optimization result. Considering es500fst01 and es500fst02, ESC–IAC produces results close to the known optimum but as the number of clusters increases, the performance decreases. However, ESC–IAC performs better than the minimum spanning tree (MST) algorithm which is the standard in this context when the number of clusters is set to 1 (under the same conditions). ESC–IAC also can outperform MST when the number of clusters is small compared to the size of the graph (e.g. with es500fst01, ESC–IAC is better even when the number of clusters is set to 6). This performance coupled with the execution time allow to state that ESC–IAC performs reasonably well. One might wonder why in some cases (i.e., when the number of clusters exceeds a certain limit) MST outperforms ESC–IAC. The reason is that MST has access to the whole graph and, therefore, it could reach the optimum, while ESC–IAC does not use the whole graph and local optimum may not lead to global optimum if the number of colonies (i.e., number of clusters) increases. In the case of the real-world urban500 data, the optimum is not known, therefore we decided to compare the other results against the standard MST. Again the execution time decreases as the number of clusters increases and the performance of ESC–IAC is less than 2% worse in case of 12 clusters. [1] S. Barnard, H. Simon, Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems, Concurrency: Pract. Exp. 6 (2) (1994) 101–117. [2] S. Barnard, H. Simon, A parallel implementation of multilevel recursive spectral besection for application to adaptive unstructured meshes, in: Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995, pp. 627–632. [3] A. Bouchachia, W. Pedrycz, Enhancement of fuzzy clustering by mechanisms of partial supervision, Fuzzy Sets Syst. 735 (13) (2006) 776–786. [4] B. Chamberlain, Graph partitioning algorithms for distributing workloads of parallel computations, Tech. Re TR-98-10-03, Univ. of Washington, Dept. of Computer Science & Engineering, 1998. [5] S. Chu, J. Roddick, J. Pan, C. Su, Parallelization strategies for ant colony optimization, in: Proceedings of the 5th International Symposium on Methodologies for Intelligent Systems, Springer, 2003, pp. 279–284. [6] C. Domeniconi, M. Al-Razgan, Weighted cluster ensembles: methods and analysis, ACM Trans. Knowl. Discov. Data 2 (4) (2009) 1–40. [7] X. Fern, C. Brodley, Solving cluster ensemble problems by bipartite graph partitioning, in: Proceedings of the Twenty-first International Conference on Machine Learning, 2004, p. 36. [8] M. Fiedler, A property of eigenvectors of non-negative symmetric matrices and its application to graph theory, Czech. Math. J. 25 (1975) 619–633. [9] G. Flake, R. Tarjan, K. Tsioutsiouliklis, Graph clustering and minimum cut trees, Internet Math. 1 (3) (2004) 355–378. [10] D. Greene, A. Tsymbal, N. Bolshakova, P. Cunningham, Ensemble clustering in medical diagnostics, in: 17th IEEE Symposium on Computer-based Medical Systems, 2004, pp. 576–581. [11] S. Hadjitodorov, L. Kuncheva, L. Todorova, Moderate diversity for better cluster ensembles, Inf. Fusion 7 (3) (2006) 264–275. [12] B. Hendrickson, R. Leland, A multilevel algorithm for partitioning graphs, in: Proceedings of the 1995 ACM/IEEE Conference on Supercomputing (CDROM), Supercomputing’95, ACM, New York, NY, USA, 1995. [13] F. Hwang, D. Richards, P. Winter, The Steiner Tree Problem, North-Holland, 1992. [14] K. Inoue, K. Urahama, Sequential fuzzy cluster extraction by a graph spectral method, Pattern Recognit. Lett. 20 (7) (1999) 699–705. [15] A. Ivanov, A. Tuzhelin, Minimal Networks: The Steiner Problem and its Generalizations, CRC Press, 1994. [16] R. Karp, Complexity of Computer Computations, chap. Reducibility among Combinatorial Problems, Plenum Press, New York, 1972, pp. 85–103. [17] T. Koch, A. Martin, S. Voß, SteinLib: an updated library on steiner tree problems in graphs, Tech. Re ZIB-Report 00-37, Konrad-Zuse-Zentrum für Informationstechnik Berlin, Takustr. 7, Berlin, 2000. http://elib.zib.de/steinlib. [18] T. Li, C. Ding, M. Jordan, Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization, in: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, 2007, pp. 577–582. [19] C. Lim, S. Bohacek, J. Hespanha, K. Obraczka, Hierarchical max-flow routing, in: IEEE Conference on Global Telecommunications Conference, 2005, pp. 550–556. [20] M. Manfrin, M. Birattari, T. Stützle, M. Dorigo, Parallel ant colony optimization for the traveling salesman problem, in: ANTS Workshop, 2006, pp. 224–234. 4.3. Comparison against sequential ACO The proposed algorithm behaves as a sequential ant colony optimization if the number of clusters is set to 1 and the number of colonies is limited |C | = 1. We ran all benchmarks using sequential ACO instead of ESC–IAC. The average costs of the minimal Steiner 5754 A. Bouchachia, M. Prossegger / Applied Soft Computing 11 (2011) 5745–5754 [21] M. Meila, J. Shi, Learning segmentation by random walks, in: Neural Information Processing Systems, NIPS, 2001, pp. 873–879. [22] M. Middendorf, F. Reischle, H. Schmeck, Multi colony ant algorithms, J. Heuristics 8 (2002) 305–320. [23] S. Monti, P. Tamayo, J. Mesirov, T. Golub, Consensus clustering: a resamplingbased method for class discovery and visualization of gene expression microarray data, Mach. Learn. 52 (1–2) (2003) 91–118. [24] A. Ng, M. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, in: Proceedings of Advances in Neural Information Processing Systems (14), MIT Press, 2001, pp. 849–856. [25] A. Pothen, Parallel Numerical Algorithms, chap. Graph Partitioning with Application to Scientific Computing, Kluwer Academic Press, 1995. [26] A. Pothen, D. Simon, K. Liou, Partitioning sparse matrices with eigenvectors of graphs, SIAM J. Matrix Anal. Appl. 11 (3) (1990) 430–452. [27] S. Schaeffer, Graph clustering, Comput. Sci. Rev. 1 (1) (2007) 27–64. [28] A. Strehl, J. Ghosh, C. Cardie, Cluster ensembles—a knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res. 3 (2002) 583–617. [29] T. Stuetzle, Parallelization strategies for ant colony optimization, in: Proceedings of the 5th International Conference on Parallel Problem Soving for Nature, Springer, 1998, pp. 722–731. [30] H. Wang, H. Shan, A. Banerjee, Bayesian cluster ensembles, Statistical Analysis Data Mining 4 (2011) 57–70. [31] Y. Zhou, H. Cheng, J. Yu, Graph clustering based on structural/attribute similarities, Proceedings of VLDB’09 2 (2009) 718–729.
© Copyright 2024