User Tools

Site Tools


en:hier-divisive

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
en:hier-divisive [2019/05/22 10:19]
David Zelený
en:hier-divisive [2019/05/22 10:25]
David Zelený
Line 9: Line 9:
 <​imgcaption twinspan-orig-modif|>​{{ :​obrazky:​twinspan-original-modified.jpg?​direct|}}</​imgcaption>​ <​imgcaption twinspan-orig-modif|>​{{ :​obrazky:​twinspan-original-modified.jpg?​direct|}}</​imgcaption>​
  
-**TWINSPAN** (abbreviation standing for __T__wo-__w__ay __in__dicator __sp__ecies __an__alysis) is hierarchical and divisive method of numerical classification,​ which uses the results of ordination (namely CA) to divide the whole dataset into subdivisions. The method has been introduced by Mark O. Hill in 1979. It is not the only divisive algorithm in hand (others like DIANA or COINSPAN exist), but it is with no doubt far the most commonly used one. The algorithm itself is rather complex, and consist of the following steps: 1) ordination of samples along the first axis of correspondence analysis (CA1) and splitting the axis near the middle; 2) identify the indicator species which have high fidelity to each side (negative and positive) of the axis, and use them to further refine the classification of samples which are near the middle to avoid their misclassification;​ 3) take samples in each subdivision and apply steps 1 and 2 on them. Two stopping rules are applied to stop the division: minimum size of the subdivision (for example 5 – groups with five and fewer samples are not further divided) and the number of levels to which subdivision advances (for example 3 – only three levels of division are used). In each level of division, all groups of samples are divided (unless they are too small), which means that the number of resulting clusters is 2, 4, 8, 16, 32, ... 2<​sup>​2</​sup>​ for one, two, three, four, five ... //n// levels of divisions. A simple modification of the original algorithm allows the user to choose the desired number of clusters: in step 3, instead of dividing all subdivisions,​ divide only the one that is the most compositionally heterogeneous (has the highest cluster heterogeneity,​ measured by one of chosen beta diversity metrics). This **modified TWINSPAN** (Roleček et al. 2009) allows choosing any number of clusters (<imgref twinspan-orig-modif>​).+**TWINSPAN** (abbreviation standing for __T__wo-__w__ay __in__dicator __sp__ecies __an__alysis) is hierarchical and divisive method of numerical classification,​ which uses the results of ordination (namely CA) to divide the whole dataset into subdivisions. The method has been introduced by Mark O. Hill in 1979. It is not the only divisive algorithm in hand (others like DIANA or COINSPAN exist), but it is with no doubt far the most commonly used one. 
  
-Because the concept of indicator species is working with species presences and absences, the whole TWINSPAN algorithm is using only presence-absence data. To include also quantitative information about species abundances, species with higher abundances are multiplicated in the matrix by being converted into pseudospecies (i.e. the more abundant is the species in the original data, the more pseudospecies represent it in the pre-processed data); the conversion is done using pre-defined cut-levels ​(see FIG 1). Result of TWINSPAN is a hierarchical dendrogram showing the relationship between individual subdivisions,​ a list of (one or several) indicator species for each split, and also the optional two-way ordered table, where both sites and species are ordered; sites are ordered according to the splits of the hierarchical classification,​ while species are ordered to form the blocks within the groups.+The algorithm itself is rather complex, and consist of the following steps: 
 +  - ordination of samples along the first axis of correspondence analysis (CA1) and splitting the axis near the middle; ​     
 +  - identify the indicator species which have high fidelity to each side (negative and positive) of the axis, and use them to further refine the classification of samples which are near the middle to avoid their misclassification;​  
 +  - take samples in each subdivision and apply steps 1 and 2 on them.  
 +Two stopping rules are applied to stop the division: minimum size of the subdivision (for example 5 – groups with five and fewer samples are not further divided) and the number of levels to which subdivision advances (for example 3 – only three levels of division are used). In each level of division, all groups of samples are divided (unless they are too small), which means that the number of resulting clusters is 2, 4, 8, 16, 32, ... 2<​sup>​2</​sup>​ for one, two, three, four, five ... //n// levels of divisions. A simple modification of the original algorithm allows the user to choose the desired number of clusters: in step 3, instead of dividing all subdivisions,​ divide only the one that is the most compositionally heterogeneous (has the highest cluster heterogeneity,​ measured by one of chosen beta diversity metrics). This **modified TWINSPAN** (Roleček et al. 2009) allows choosing any number of clusters (<imgref twinspan-orig-modif>​). 
 + 
 +Because the concept of indicator species is working with species presences and absences, the whole TWINSPAN algorithm is using only presence-absence data. To include also quantitative information about species abundances, species with higher abundances are multiplicated in the matrix by being converted into pseudospecies (i.e. the more abundant is the species in the original data, the more pseudospecies represent it in the pre-processed data); the conversion is done using pre-defined cut-levels. Result of TWINSPAN is a hierarchical dendrogram showing the relationship between individual subdivisions,​ a list of (one or several) indicator species for each split, and also the optional two-way ordered table, where both sites and species are ordered; sites are ordered according to the splits of the hierarchical classification,​ while species are ordered to form the blocks within the groups ​(<imgref twinspan-table>​).
  
 TWINSPAN is often criticized by statisticians for being a not-elegant sequence of arbitrary and not fully documented steps but is favoured by ecologists for often returning ecologically intuitive results. Especially vegetation ecologists have a long tradition in using TWINSPAN, which is justified by the fact that the author (M. O. Hill) is himself a vegetation ecologist and he designed the method to closely resemble the traditional Braun-Blanquet approach of classifying vegetation (e.g. because it produces the two-way ordered table and emphasized the use of indicator species with high fidelity to individual groups, <imgref twinspan-table>​). TWINSPAN is often criticized by statisticians for being a not-elegant sequence of arbitrary and not fully documented steps but is favoured by ecologists for often returning ecologically intuitive results. Especially vegetation ecologists have a long tradition in using TWINSPAN, which is justified by the fact that the author (M. O. Hill) is himself a vegetation ecologist and he designed the method to closely resemble the traditional Braun-Blanquet approach of classifying vegetation (e.g. because it produces the two-way ordered table and emphasized the use of indicator species with high fidelity to individual groups, <imgref twinspan-table>​).
Line 17: Line 23:
 <​imgcaption twinspan-table|>​{{:​obrazky:​twinspan-table-vltava.jpg?​direct|}}</​imgcaption>​ <​imgcaption twinspan-table|>​{{:​obrazky:​twinspan-table-vltava.jpg?​direct|}}</​imgcaption>​
  
-The true algorithm is actually much more complex, and even the original description by Hill (1979) does not contain all details (some changes have been introduced later by other authors directly in the FORTRAN code of the TWINSPAN program). Perhaps the most detailed description of the algorithm with attention to some of the details is given in Kent (2012). ​Several ​software ​offer TWINSPAN (note that implementations ​in each of them actually slightly ​differ, since some are using a different version of the FORTRAN code): TWINSPAN for Windows, PC-ORD, CAP and JUICE. In R, I created a simple experimental package ''​twinspanR'',​ which is an R-wrapper around the twinspan.exe program and works only on Windows platform (this implementation includes both original and modified TWINSPAN algorithm).+The true algorithm is actually much more complex, and even the original description by Hill (1979) does not contain all details (some changes have been introduced later by other authors directly in the FORTRAN code of the TWINSPAN program). Perhaps the most detailed description of the algorithm with attention to some of the details is given in Kent (2012). ​Some software ​offers ​TWINSPAN (note that the implementation ​in each of them actually slightly ​differs, since some are using a different version of the FORTRAN code): TWINSPAN for Windows, PC-ORD, CAP and JUICE. In R, I created a simple experimental package ''​twinspanR'',​ which is an R-wrapper around the twinspan.exe program and works only on Windows platform (this implementation includes both original and modified TWINSPAN algorithm).
  
  
en/hier-divisive.txt · Last modified: 2019/05/23 09:59 by David Zelený