User Tools

Site Tools


en:hier-divisive

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:hier-divisive [2015/07/30 03:44]
David Zelený
en:hier-divisive [2019/05/23 09:59] (current)
David Zelený
Line 1: Line 1:
-====== ​Numerical classification ​====== +Section: [[en:​classification|Numerical classification]] 
-===== TWINSPAN ​hierarchical divisive classification =====+===== TWINSPAN ​(hierarchical divisive classification=====
  
-==== library (twinspanR) ==== +[[{|width: 7em; background-color:​ light; color: firebrick}hier-divisive|**Theory**]] 
-I created experimental ​library with TWINSPAN algorithm ​- you may install it from GitHub repository ​(notethis library ​is currently in beta stage under development, ​and some parts may not be functional). To install any library from GitHubyou will need to first install package ''​devtools''​ written by Wickham Hadley, which contains a set of tools for development ​of R packagesAfter installing ''​devtools''​use the function ''​install_github''​Note that the use of the library ​has some limitations:​ it can be installed ​only on Windows platform ​(since the engine ​of the library ​is based on running *.exe file externally) and you need permanent access ​to the folder where the library ​is installed ​(usually ​in Program Files/R/R-x.x.x/librarybut could be also in some other personalized place). Without ​the access ​to this folder ​the function ​''​twinspan'' ​cannot run correctly.+[[{|width: 7em; background-color:​ white; color: navy}hier-divisive_R|functions]] 
 +[[{|width: 7em; background-color:​ white; color: navy}hier-divisive_examples|Examples]] 
 +[[{|width: 7em; background-color:​ white; color: navy}hier-divisive_exercise|Exercise {{::​lock-icon.png?​nolink|}}]] 
 + 
 +<​imgcaption twinspan-orig-modif|Dendrogram of the original (a) and modified (b) TWINSPAN algorithm. While in the original TWINSPAN, at each level of the division each cluster is divided into two clusters ​(unless the cluster contains too few samples), in the modified TWINSPAN only the most compositionally heterogeneous cluster is divided into two clusters.>​{{ ​:obrazky:​twinspan-original-modified.jpg?​direct|}}</​imgcaption>​ 
 + 
 +**TWINSPAN** (abbreviation standing for __T__wo-__w__ay __in__dicator __sp__ecies __an__alysis) ​is hierarchical ​and divisive method of numerical classification,​ which uses the results of ordination (namely CA) to divide the whole dataset into subdivisions. The method has been introduced by Mark O. Hill in 1979. It is not the only divisive algorithm in hand (others like DIANA or COINSPAN exist), but it is with no doubt far the most commonly used one 
 + 
 +The algorithm itself is rather complexand consist of the following steps: 
 +  - ordination of samples along the first axis of correspondence analysis (CA1) and splitting the axis near the middle; ​     
 +  - identify the indicator species which have high fidelity to each side (negative and positive) of the axisand use them to further refine the classification of samples ​which are near the middle to avoid their misclassification;​  
 +  - take samples in each subdivision and apply steps 1 and 2 on them.  
 +Two stopping rules are applied to stop the division: minimum size of the subdivision (for example 5 – groups with five and fewer samples are not further divided) and the number ​of levels to which subdivision advances (for example 3 – only three levels of division are used)In each level of divisionall groups of samples are divided (unless they are too small), which means that the number of resulting clusters is 2, 4, 8, 16, 32, ... 2<​sup>​2</​sup>​ for one, two, three, four, five ... //n// levels of divisions. A simple modification of the original algorithm allows the user to choose the desired number ​of clusters: in step 3, instead of dividing all subdivisions,​ divide only the one that is the most compositionally heterogeneous (has the highest cluster heterogeneity,​ measured by one of chosen beta diversity metrics). This **modified TWINSPAN** (Roleček et al. 2009) allows choosing any number of clusters (<imgref twinspan-orig-modif>​). 
 + 
 +Because the concept of indicator species is working with species presences and absences, the whole TWINSPAN algorithm is using only presence-absence data. To include also quantitative information about species abundances, species with higher abundances are multiplicated in the matrix by being converted into pseudospecies ​(i.e. the more abundant is the species in the original data, the more pseudospecies represent it in the pre-processed data); the conversion is done using pre-defined cut-levels. Result ​of TWINSPAN is a hierarchical dendrogram showing ​the relationship between individual subdivisions,​ a list of (one or several) indicator species for each split, and also the optional two-way ordered table, where both sites and species are ordered; sites are ordered according to the splits of the hierarchical classification,​ while species are ordered to form the blocks within the groups (<imgref twinspan-table>​). 
 + 
 +TWINSPAN ​is often criticized by statisticians for being a not-elegant sequence of arbitrary and not fully documented steps but is favoured by ecologists for often returning ecologically intuitive resultsEspecially vegetation ecologists have a long tradition in using TWINSPAN, which is justified by the fact that the author (M. O. Hillis himself a vegetation ecologist ​and he designed the method ​to closely resemble ​the traditional Braun-Blanquet approach of classifying vegetation (e.g. because it produces ​the two-way ordered table and emphasized the use of indicator species with high fidelity to individual groups, <imgref twinspan-table>​). 
 + 
 +<​imgcaption twinspan-table|Two-way ordered table resulting from TWINSPAN algorithm (this example ​is based on Vltava dataset, which has been reduced to contain only species occurring in at least 30 samples ​(out of 97) to contain fewer species; the calculation was done in twinspanR library.>​{{:​obrazky:​twinspan-table-vltava.jpg?​direct|}}<​/imgcaption>​ 
 + 
 +The true algorithm is actually much more complexand even the original description by Hill (1979) does not contain all details (some changes have been introduced later by other authors directly in the FORTRAN code of the TWINSPAN program). Perhaps ​the most detailed description of the algorithm with attention ​to some of the details is given in Kent (2012). Some software offers TWINSPAN (note that the implementation in each of them actually slightly differs, since some are using a different version of the FORTRAN code): TWINSPAN for Windows, PC-ORD, CAP and JUICE. In R, I created a simple experimental package ​''​twinspanR''​, which is an R-wrapper around the twinspan.exe program and works only on Windows platform (this implementation includes both original and modified TWINSPAN algorithm).
  
-=== Install the library === 
  
-<code r> 
-install.packages ('​devtools'​) 
-devtools::​install_github("​zdealveindy/​twinspanR"​) 
-</​code>​ 
  
-Run TWINSPAN example((You would get the same result as the script below if you run ''​example (twinspan)''​ - this will run the example which comes with the help file of ''​twinspan''​ function (see the section //​Examples//​ in ''?​twinspan''​).)),​ which shows modified TWINSPAN on traditional Ellenberg'​s Danube meadow dataset, projected on DCA ordination diagram and compared with original classification into three vegetation types made by tabular sorting: 
-<code r> 
-library (twinspanR) 
-library (vegan) 
-data (danube) 
-res <- twinspan (danube$spe,​ modif = TRUE, clusters = 4) 
-k <- cut (res) 
-dca <- decorana (danube$spe) 
-par (mfrow = c(1,2)) 
-ordiplot (dca, type = '​n',​ display = '​si',​ main = '​Modified TWINSPAN'​) 
-points (dca, col = k) 
-for (i in c(1,2,4)) ordihull (dca, groups = k, show.group = i, col = i, 
- draw = '​polygon',​ label = TRUE) 
-ordiplot (dca, type = '​n',​ display = '​si',​ main = '​Original assignment\n (Ellenberg 1954)'​) 
-points (dca, col = danube$env$veg.type) 
-for (i in c(1:3)) ordihull (dca, groups = danube$env$veg.type,​ 
- ​show.group = unique (danube$env$veg.type)[i],​ col = i, 
- draw = '​polygon',​ label = TRUE) 
-</​code>​ 
  
-{{youtube>​MriaKa1wRfI}} 
-{{youtube>​7211v3jfS8E}} 
  
en/hier-divisive.1438199081.txt.gz · Last modified: 2017/10/11 20:36 (external edit)