User Tools

Site Tools


en:non-hier

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:non-hier [2019/01/26 20:20]
David Zelený
en:non-hier [2019/04/06 18:53] (current)
David Zelený
Line 1: Line 1:
-====== ​Numerical classification ​======+Section: [[en:​classification|Numerical classification]]
 ===== K-means (non-hierarchical classification) ===== ===== K-means (non-hierarchical classification) =====
  
Line 7: Line 7:
 [[{|width: 7em; background-color:​ white; color: navy}non-hier_exercise|Exercise {{::​lock-icon.png?​nolink|}}]] [[{|width: 7em; background-color:​ white; color: navy}non-hier_exercise|Exercise {{::​lock-icon.png?​nolink|}}]]
  
-==== kmeans ==== +**K-means** is a non-hierarchical ​agglomerative clustering algorithmbased on Euclidean distances among samples and using an iterative algorithm to find the solution. It minimizes the total error sum of squares (TESS), the same objective function as in the case of Ward’s algorithm. The number of clusters (k) is defined by the userOther than Euclidean distance can be used, but they need to be converted into metric distances and submitted to PCoA. For example, in the case of Bray-Curtis distance, which is not metric, one may calculate square-rooted Bray-Curtis distances (which ​are metric), submit them to PCoA, and then use all PCoA axes as the input matrix in K-means method instead of the raw dataThe K-means algorithm, similarly ​to other iterative methods (like NMDS) can get trapped in local minima, and it may be useful to repeat ​the analysis many times and choose the solution with the lowest overall TESS. 
-Non-hierarchical ​classification, using method ​of //k// meansNon-hierarchical methods ​are overlookedeven if they give ecological interesting ​and relevant resultsYou need to a priori set up the number of clusters ​you want the data divide into.+ 
 +The method can run in two modes, unsupervised or supervised. In the unsupervised mode, it searches for optimal clustering of samples into a predefined ​number of clusters; in the supervised mode, the user supplies the //k// centroids (e.g. typical samples) and the method searches for an optimal solution how to cluster the samples in the dataset around these centroids.
  
-<code rsplus> 
-cluster.kmeans <- kmeans (dis, centers = 5) 
-cluster.kmeans$cluster 
-</​code>​ 
-<​file>​ 
- ​1 ​ 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  
- ​4 ​ 4  4  4  5  5  1  2  4  4  5  4  4  4  5  2  1  2  4  1  4  5  2  2  4  3  
-27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52  
- ​2 ​ 2  2  5  5  1  1  1  1  1  2  4  4  4  2  2  2  5  4  4  4  1  5  1  1  2  
-53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78  
- ​2 ​ 3  5  3  3  3  3  3  3  3  3  5  3  3  3  3  3  2  2  2  2  2  3  4  4  5  
-79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97  
- ​3 ​ 3  3  5  3  3  3  5  4  4  4  5  2  1  2  5  2  1  1 
-</​file>​ 
en/non-hier.1548505252.txt.gz · Last modified: 2019/01/26 20:20 by David Zelený