Please use this identifier to cite or link to this item:
|Title:||On the parallel complexity of hierarchical clustering and CC-complete problems|
|Abstract:||Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor CREW PRAM algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUITVALUE PROBLEM class. This class contains a variety of interesting problems,and nowfor the first time a problem fromdata mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc.|
|Appears in Collections:||CMUL: Journal Articles|
Files in This Item:
There are no files associated with this item.
Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.