当前位置:首页 >> 机械/仪表 >>

Clustering by fast search and find ofdensity peaks


Clustering by fast search and find ofdensity peaks
Alex Rodriguez and Alessandro Laio

Cluster analysis is aimed at classifying elements into categories on the basis of their similarity. 聚类分析是在分类元素进行分类的基础上发掘他们的相似之处 Its applications range from astronomy to bioinformatics, bibliometrics, and pattern recognition. 其应用范围从天文学到生物信息学、文献计量学和模式识别 We propose an approach based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities. 该算法基于这样的假设:类簇中心被具有较低局部密度的邻居点包围,且与具 有更高密度的任何点有相对较大的距离。 This idea forms the basis of a clustering procedure in which the number of clusters arises intuitively, outliers are automatically spotted and excluded fromthe analysis, and clusters are recognized regardless of their shape and of the dimensionality of the space in which they are embedded.We demonstrate the power of the algorithm on several test cases. Clustering algorithms attempt to classify elements into categories, or clusters, on the basis of their similarity. Several different clustering strategies have been proposed (1), but no consensus has been reached even on the definition of a cluster. In K-means (2) and K-medoids (3) methods, clusters are groups of data characterized by a small distance to the cluster center. An objective function, typically the sum of the distance to a set of putative cluster centers, is optimized (3–6) until the best cluster centers candidates are found. However, because a data point is always assigned to the nearest center, these approaches are not able to detect nonspherical clusters (7). In distribution-based algorithms,one attempts to reproduce the observed realization of data points as a mix of predefined probability distribution functions (8); the accuracy of such methods depends on the capability of the trial probability to represent the data. Clusters with an arbitrary shape are easily detected by approaches based on the local density of data points. In density-based spatial clustering of applications with noise (DBSCAN) (9), one chooses a density threshold, discards as noise the points in regions with densities lower than this threshold, and assigns to different clusters disconnected regions of high density. However,choosing an appropriate threshold can be nontrivial,a drawback not present in the mean-shift clustering method (10, 11). There a cluster is defined as a set of points that converge to the same local maximum of the density distribution function.This method allows the finding of nonsphericalclusters but works only for data defined by a set of coordinates and is computationally costly. Here, we propose an alternative approach.Similar to the K-medoids method, it has its basis only in the distance between data points.Like DBSCAN and the mean-shift method, it is able to detect nonspherical clusters and to automatically find the correct number of clusters.

The cluster centers are defined, as in the meanshift method, as local maxima in the density of data points. However, unlike the mean-shift method,our procedure does not require embeddingthe data in a vector space and maximizing explicitlythe density field for each data point. The algorithm has its basis in the assumptions that cluster centers are surrounded by neighbors with lower local density and that they are at a relatively large distance from any points with ahigher local density. For each data point i, we compute two quantities: its local density ri and its distance di from points of higher density. Both these quantities depend only on the distances between data points, which are assumed to satisfy the triangular

inequality. The local density ri of data point i is defined as


相关文章:
matlab学习笔记
FindMinimum[f, {x,x0}] 以 x0 为初值,寻找...of the peaks function shading interp 命令:...The fast Finite Fourier Transform optimset Create ...
英语四级最新预测卷二01
D) To find out how monkeys search food. 34. A) To give the monkey a...snowy peaks of the Andes the world is heating up right now, and fast. ...
航海英语历届考试易错题
fast while she is swinging D 33.A line ...A.Design of individual search patterns C.Use of...A.deeps B.domestics C.peaks D.settlers C 39....
Do you find getting up in the morning so difficult that it_...
For some people the peak comes during the ...Avoid the troublesome search for clean clothes by...(1 分) In today’s age of fast travel, the...
分析软件JADE一点使用心得
打开 analyze/find peaks,或右键单击快捷工具栏中的图标。出现一个悬浮框,在“search”里面可以对寻峰 的判据进行设置,大家可以改动不同的限制条件,然后按 “apply...
更多相关标签: