User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:hier-divisive [2015/07/30 03:48]
David Zelený
en:hier-divisive [2020/03/29 11:59] (current)
David Zelený [TWINSPAN (hierarchical divisive classification)]
Line 1: Line 1:
-====== ​Numerical classification ​====== +Section: [[en:​classification|Numerical classification]] 
-===== TWINSPAN ​hierarchical divisive classification =====+===== TWINSPAN ​(hierarchical divisive classification=====
-==== library (twinspanR) ==== +[[{|width: 7em; background-colorlight; colorfirebrick}hier-divisive|**Theory**]] 
-I created experimental R library with TWINSPAN algorithm ​you may install it from GitHub repository (notethis library is currently in beta stage under development,​ and some parts may not be functional). To install any library from GitHub, you will need to first install package ''​devtools''​ written by Wickham Hadley, which contains a set of tools for development of R packages. After installing ''​devtools'',​ use the function ''​install_github''​. Note that the use of the library has some limitationsit can be installed only on Windows platform (since the engine of the library is based on running ​*.exe file externally) and you need permanent access to the folder where the library is installed (usually in Program Files/R/R-x.x.x/​library,​ but could be also in some other personalized place). Without the access to this folder the function ''​twinspan''​ cannot run correctly.+[[{|width: 7em; background-color:​ white; color: navy}hier-divisive_R|functions]] 
 +[[{|width: 7em; background-color: white; color: navy}hier-divisive_examples|Examples]] 
 +[[{|width: 7em; background-color:​ white; color: navy}hier-divisive_exercise|Exercise {{::​lock-icon.png?​nolink|}}]]
-=== Install ​the library ===+<​imgcaption twinspan-orig-modif|Dendrogram of the original (a) and modified (b) TWINSPAN algorithm. While in the original TWINSPAN, at each level of the division each cluster is divided into two clusters (unless the cluster contains too few samples), in the modified TWINSPAN only the most compositionally heterogeneous cluster is divided into two clusters.>​{{ :​obrazky:​twinspan-original-modified.jpg?​direct|}}</​imgcaption>​
-<code r> +**TWINSPAN** (abbreviation standing for __T__wo-__w__ay __in__dicator __sp__ecies __an__alysis) is hierarchical and divisive method of numerical classification,​ which uses the results of ordination (namely CA) to divide the whole dataset into subdivisions. The method has been introduced by Mark O. Hill in 1979It is not the only divisive algorithm in hand (others like DIANA or COINSPAN exist), but it is with no doubt far the most commonly used one.  
-install.packages ​('​devtools'​+ 
-devtools::install_github("​zdealveindy/twinspanR"​+The algorithm itself is rather complex, and consist of the following steps: 
-</code>+  - ordination of samples along the first axis of correspondence analysis (CA1) and splitting the axis near the middle; ​     
 +  - identify the indicator species which have high fidelity to each side (negative and positive) of the axis, and use them to further refine the classification of samples which are near the middle to avoid their misclassification;​  
 +  - take samples in each subdivision and apply steps 1 and 2 on them.  
 +Two stopping rules are applied to stop the divisionminimum size of the subdivision ​(for example 5 – groups with five and fewer samples are not further divided) and the number of levels to which subdivision advances (for example 3 – only three levels of division are used). In each level of division, all groups of samples are divided (unless they are too small), which means that the number of resulting clusters is 2, 4, 8, 16, 32, ... 2<​sup>​2<​/sup> for one, two, three, four, five ... //n// levels of divisions. A simple modification of the original algorithm allows the user to choose the desired number of clusters: in step 3, instead of dividing all subdivisions,​ divide only the one that is the most compositionally heterogeneous (has the highest cluster heterogeneity,​ measured by one of chosen beta diversity metrics). This **modified TWINSPAN** (Roleček et al. 2009) allows choosing any number of clusters (<imgref twinspan-orig-modif>​). 
 +Because the concept of indicator species is working with species presences and absences, the whole TWINSPAN algorithm is using only presence-absence data. To include also quantitative information about species abundances, species with higher abundances are multiplicated in the matrix by being converted into pseudospecies (i.e. the more abundant is the species in the original data, the more pseudospecies represent it in the pre-processed data); the conversion is done using pre-defined cut-levels. Result of TWINSPAN is a hierarchical dendrogram showing the relationship between individual subdivisions,​ a list of (one or several) indicator species for each split, and also the optional two-way ordered table, where both sites and species are ordered; sites are ordered according to the splits of the hierarchical classification,​ while species are ordered to form the blocks within the groups (<imgref twinspan-table>​). 
 +TWINSPAN is often criticized for being a not-elegant sequence of arbitrary and not fully documented steps. On the other side, ecologists often love it, since it does return results which are ecologically quite intuitive. Vegetation ecologists especially have a long tradition in using TWINSPAN, mainly because the author (M. O. Hill) is himself a vegetation ecologist and he designed the method to closely resemble the traditional Braun-Blanquet approach of classifying vegetation (e.g. because it produces the two-way ordered table and emphasized the use of indicator species with high fidelity to individual groups, <imgref twinspan-table>​). 
 +<​imgcaption twinspan-table|Two-way ordered table resulting from TWINSPAN algorithm (this example is based on Vltava dataset, which has been reduced to contain only species occurring in at least 30 samples (out of 97) to contain fewer species; the calculation was done in twinspanR library.>​{{:​obrazky:​twinspan-table-vltava.jpg?​direct|}}</imgcaption> 
 +The true algorithm is actually much more complex, and even the original description by Hill (1979) does not contain all details (some changes have been introduced later by other authors directly in the FORTRAN code of the TWINSPAN program). Perhaps the most detailed description of the algorithm with attention to some of the details is given in Kent (2012). Some software offers TWINSPAN (note that the implementation in each of them actually slightly differs, since some are using a different version of the FORTRAN code): TWINSPAN for Windows, PC-ORD, CAP and JUICE. In R, I created a simple experimental package ''​twinspanR'',​ which is an R-wrapper around the twinspan.exe program and works only on Windows platform (this implementation includes both original and modified TWINSPAN algorithm).
-=== Example === 
-Run TWINSPAN example((You would get the same result as the script below if you run ''​example (twinspan)''​ - this will run the example which comes with the help file of ''​twinspan''​ function (see the section //​Examples//​ in ''?​twinspan''​).)),​ which shows modified TWINSPAN on traditional Ellenberg'​s Danube meadow dataset, projected on DCA ordination diagram and compared with original classification into three vegetation types made by tabular sorting: 
-<code r> 
-library (twinspanR) 
-library (vegan) 
-data (danube) 
-res <- twinspan (danube$spe,​ modif = TRUE, clusters = 4) 
-k <- cut (res) 
-dca <- decorana (danube$spe) 
-par (mfrow = c(1,2)) 
-ordiplot (dca, type = '​n',​ display = '​si',​ main = '​Modified TWINSPAN'​) 
-points (dca, col = k) 
-for (i in c(1,2,4)) ordihull (dca, groups = k, = i, col = i, 
- draw = '​polygon',​ label = TRUE) 
-ordiplot (dca, type = '​n',​ display = '​si',​ main = '​Original assignment\n (Ellenberg 1954)'​) 
-points (dca, col = danube$env$veg.type) 
-for (i in c(1:3)) ordihull (dca, groups = danube$env$veg.type,​ 
- ​ = unique (danube$env$veg.type)[i],​ col = i, 
- draw = '​polygon',​ label = TRUE) 
en/hier-divisive.1438199308.txt.gz · Last modified: 2017/10/11 20:36 (external edit)