Introduction
Theory, R functions & Examples
Section: Numerical classification
hclust
- calculates hierarchical cluster analysis. Requires at least two arguments: d
for distance matrix, and method
for agglomerative algorithm, one of ward.D
, ward.D2
, single
, complete
, average
(= UPGMA), mcquitty
(= WPGMA), median
(= WPGMC) or centroid
(= UPGMC). Has it's own plot
function.rect.hclust
- divides dendrogram into given number of groups (argument k
) and draws rectangles around samples in these groups (argument border
specifies the color of the rectangle).cutree
- cuts the tree (dendrogram) into given number of clusters (argument k
) or according to given level of similarity (argument h
). Returns vector with assignment of samples into groups.agnes
(library cluster) - contains six agglomerative algorithms, some not included in hclust
. Has it's own plot
method.
Murtagh & Legendre (2014) have shown that what literature refers to as Ward's clustering algorithm are in fact two slightly different methods, while only one of them is identical with the algorithm originally described by Ward. Both functions hclust
and agnes
have the method = 'ward
', but with different default. While hclust
function implements both Ward's algorithms (the genuine one, named ward.D2
, as well as the second one, called ward.D
), the agnes
function implements only the genuine one. For historical reason, the argument method = 'ward'
in hclust
calls the ward.D
algorithm instead of ward.D2
one. This means that hclust
and agnes
function, if both to set to method = 'ward'
, return slighly different results. To calculate “genuine” Ward's algorithm in both methods, you need to set up method = 'ward.D2'
in hclust
(and method = 'ward'
in agnes
, but there is no other option for Ward algorithm anyway).