User Tools

Site Tools


en:data_preparation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
en:data_preparation [2019/05/23 09:44]
David Zelený [Missing values]
en:data_preparation [2019/05/23 09:54]
David Zelený [Special case: transformation and standardisation of species composition matrix]
Line 99: Line 99:
 While the variables in the environmental or trait matrix are often of very different types (qualitative,​ quantitative,​ ordinal) and measured in very different units, the species composition matrix is homogeneous,​ with all variables (species) measured in the same units (frequencies,​ abundances, covers, presences-absences). ​ While the variables in the environmental or trait matrix are often of very different types (qualitative,​ quantitative,​ ordinal) and measured in very different units, the species composition matrix is homogeneous,​ with all variables (species) measured in the same units (frequencies,​ abundances, covers, presences-absences). ​
  
-It is always good to check which units and what range of values is used to quantify the occurrence of species in the samples, and **transform data** accordingly. For example, if the values are percentage estimates of plant covers (often used in vegetation studies), log or sqrt transformation may be necessary, since these covers have often very right-skewed distribution (covers between 1-15% are far more common than covers >25%). However, if the estimates of the plant cover have been done in Braun-Blanquet scale (//r// = 0.01% of cover, //+// = 0.1%, //1// = 1%, //2m// = 5%, //2a// = , //2b// = , //3// = , //4// = , //5// = ) and these values are then transformed into ordinal scale (//r// -> 1, //+// -> 2, //1// -> 3, ..., //5// -> 9), these 1-9 ordinal values in comparison to percentage cover already contain implicit log-transformation and does not need to be further transformed. In some cases, transforming data into presences-absences may be useful, e.g. if the estimates of species abundances are inaccurate or data are merged from different sources using different scales or estimation methods.+It is always good to check which units and what range of values is used to quantify the occurrence of species in the samples, and **transform data** accordingly. For example, if the values are percentage estimates of plant covers (often used in vegetation studies), log or sqrt transformation may be necessary, since these covers have often very right-skewed distribution (covers between 1-15% are far more common than covers >25%). However, if the estimates of the plant cover have been done in Braun-Blanquet scale (//r// = 0.01% of cover, //+// = 0.1%, //1// = 1%, //2m// = 5%, //2a// = 10%, //2b// = 20%, //3// = 37.5%, //4// = 62.5%, //5// = 87.5%) and these values are then transformed into ordinal scale (//r// -> 1, //+// -> 2, //1// -> 3, ..., //5// -> 9), these 1-9 ordinal values in comparison to percentage cover already contain implicit log-transformation and does not need to be further transformed. In some cases, transforming data into presences-absences may be useful, e.g. if the estimates of species abundances are inaccurate or data are merged from different sources using different scales or estimation methods.
  
 Species composition data are also often subjected to standardisation,​ either by species (columns) or samples (rows)(<​imgref stand-row-col>​). **Standardization by species** makes species to have the same importance (i.e. species with overall lower abundances will be the same important as species with overall higher abundances). It may not always be meaningful, e.g. if species occurs only in one sample, standardization by species will put a high weight on this sample and it will become very different from the others. **Standardization by samples** is useful in the case that the analysis is focused on relative proportions of species, not their absolute abundances, e.g. because recorded abundances are dependent on sampling effort, and this effort differs between samples (the effort is related to time spent at the plot, number of traps, or can be influenced by bad weather affecting mobility of the sampled organisms). Species composition data are also often subjected to standardisation,​ either by species (columns) or samples (rows)(<​imgref stand-row-col>​). **Standardization by species** makes species to have the same importance (i.e. species with overall lower abundances will be the same important as species with overall higher abundances). It may not always be meaningful, e.g. if species occurs only in one sample, standardization by species will put a high weight on this sample and it will become very different from the others. **Standardization by samples** is useful in the case that the analysis is focused on relative proportions of species, not their absolute abundances, e.g. because recorded abundances are dependent on sampling effort, and this effort differs between samples (the effort is related to time spent at the plot, number of traps, or can be influenced by bad weather affecting mobility of the sampled organisms).
en/data_preparation.txt · Last modified: 2019/05/23 09:54 by David Zelený