User Tools

Site Tools


en:similarity

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:similarity [2019/02/09 19:42]
David Zelený
en:similarity [2019/02/26 22:08] (current)
David Zelený [Double-zero problem]
Line 13: Line 13:
 There is a number of measures of similarities or distances ([[references|Legendre & Legendre 2012]] list around 30 of them). The first decision one has to make is whether the aim is R- or Q-mode analysis (R-mode focuses on differences among species, Q-mode on differences among samples), since some of the measures differ between both modes (e.g. Pearson'​s //r// correlation coefficient makes sense for association between species (R-mode), but not for association between samples (Q-mode); in contrast, e.g. Sørensen index can be used in both Q- and R-mode analysis, called Dice index in R-mode analysis). Further, if focusing on differences between samples (Q-mode), the most relevant measures in ecology are asymmetric indices ignoring double zeros (more about //​double-zero problem// below). Then, it also depends whether the data are qualitative (i.e. binary, presence-absence) or quantitative (species abundances). In the case of distance indices, an important criterium is whether they are metric (they can be displayed in Euclidean space) or not, since this influences the choice of the index for some ordination or clustering methods. There is a number of measures of similarities or distances ([[references|Legendre & Legendre 2012]] list around 30 of them). The first decision one has to make is whether the aim is R- or Q-mode analysis (R-mode focuses on differences among species, Q-mode on differences among samples), since some of the measures differ between both modes (e.g. Pearson'​s //r// correlation coefficient makes sense for association between species (R-mode), but not for association between samples (Q-mode); in contrast, e.g. Sørensen index can be used in both Q- and R-mode analysis, called Dice index in R-mode analysis). Further, if focusing on differences between samples (Q-mode), the most relevant measures in ecology are asymmetric indices ignoring double zeros (more about //​double-zero problem// below). Then, it also depends whether the data are qualitative (i.e. binary, presence-absence) or quantitative (species abundances). In the case of distance indices, an important criterium is whether they are metric (they can be displayed in Euclidean space) or not, since this influences the choice of the index for some ordination or clustering methods.
  
-[[references|Legendre & Legendre (2012)]] offers a kind of "key" ​how to select an appropriate measure for given data and problem (Tables 7.4-7.6). Generally, as a rule of thumb, Bray-Curtis and Hellinger distances are better choices than Euclidean or Chi-square distances.+[[references|Legendre & Legendre (2012)]] offers a key how to select an appropriate measure for given data and problem (check their Tables 7.4-7.6). Generally, as a rule of thumb, Bray-Curtis and Hellinger distances are better choices than Euclidean or Chi-square distances.
  
 ===== Double-zero problem ===== ===== Double-zero problem =====
Line 25: Line 25:
 <​imgcaption double-zero-table |For details see the text.>{{ :​obrazky:​double-zero-table.jpg?​direct&​400|}}</​imgcaption>​ <​imgcaption double-zero-table |For details see the text.>{{ :​obrazky:​double-zero-table.jpg?​direct&​400|}}</​imgcaption>​
  
-<imgref double-zero-table>​ shows an ecological example of double zero problem. Samples 1 to 3 are sorted according to the wetness of their habitat – sample 1 is the wettest and sample 3 is the driest. In samples 1 and 3, no mesic species occur, since sample 1 is too wet and sample 3 too dry - these is the double zero. The fact that the mesic species is missing does not say anything about ecological similarity or difference between both samples; simply there is no information,​ and it is better to ignore it. In the case of symmetrical indices of similarity, the absence of mesic species in sample 1 and sample 3 (0-0, double zero) will increase similarity of sample 1 and 2; in asymmetrical indices, double zeros will be ignored and only presences (1-1, 1-0, 0-1) will be considered.+<imgref double-zero-table>​ shows an ecological example of double zero problem. Samples 1 to 3 are sorted according to the wetness of their habitat – sample 1 is the wettest and sample 3 is the driest. In samples 1 and 3, no mesic species occur, since sample 1 is too wet and sample 3 too dry - these is the double zero. The fact that the mesic species is missing does not say anything about ecological similarity or difference between both samples; simply there is no information,​ and it is better to ignore it. In the case of symmetrical indices of similarity, the absence of mesic species in sample 1 and sample 3 (0-0, double zero) will increase similarity of sample 1 and 3; in asymmetrical indices, double zeros will be ignored and only presences (1-1, 1-0, 0-1) will be considered.
  
  
en/similarity.1549712540.txt.gz · Last modified: 2019/02/09 19:42 by David Zelený