User Tools

Site Tools


en:data_preparation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
en:data_preparation [2019/05/23 09:43]
David Zelený
en:data_preparation [2019/05/23 09:44]
David Zelený [Missing values]
Line 8: Line 8:
  
 ==== Missing values ==== ==== Missing values ====
-This is not as trivial as it may sound. Missing data are elements in the matrix with no value, in R usually replaced by ''​NA''​ (not available). Note that there is an important difference between ''​0''​ and ''​NA''​. It makes sense to replace missing value by zero if the entity is really missing (e.g. species was not recorded and gets zero cover or abundance), but it make not sense to replace it by zero if the entity was not recorded (e.g., if I didn't measure pH in some samples because the pH-meter got broken, I should not replace these values by 0, since it does not mean that the pH of that sample is so low). Samples with missing values will be removed from the analysis (often silently without reporting any warning message), and if there are many missing values scattered across different variables, the analysis will be based on rather few samples. One way to reduce this effect is to remove those variables with the highest proportion of missing values from the analysis. Another option is to replace the missing values by estimates if these could be reasonably accurate (mostly by interpolation,​ e.g. from similar plots, neighbours, values measured at the same time somewhere close, or values predicted by a model). ​+This is not as trivial as it may sound. Missing data are elements in the matrix with no value, in R usually replaced by ''​NA''​ (not available). Note that there is an important difference between ''​0''​ and ''​NA''​. It makes sense to replace missing value by zero if the entity is really missing (e.g. species was not recorded and gets zero cover or abundance), but it makes no sense to replace it by zero if the entity was not recorded (e.g., if I didn't measure pH in some samples because the pH-meter got broken, I should not replace these values by 0, since it does not mean that the pH of that sample is so low). Samples with missing values will be removed from the analysis (often silently without reporting any warning message), and if there are many missing values scattered across different variables, the analysis will be based on rather few samples. One way to reduce this effect is to remove those variables with the highest proportion of missing values from the analysis. Another option is to replace the missing values by estimates if these could be reasonably accurate (mostly by interpolation,​ e.g. from similar plots, neighbours, values measured at the same time somewhere close, or values predicted by a model). ​
  
 ==== Outliers ==== ==== Outliers ====
en/data_preparation.txt · Last modified: 2019/05/23 09:54 by David Zelený