Trace:

en:suppl_vars

This shows you the differences between two versions of the page.

Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||

en:suppl_vars [2019/02/25 20:56] David Zelený |
en:suppl_vars [2019/03/16 06:14] David Zelený [Supplementary variables (unconstrained ordination)] |
||
---|---|---|---|

Line 7: | Line 7: | ||

[[{|width: 7em; background-color: white; color: navy}suppl_vars_exercise|Exercise {{::lock-icon.png?nolink|}}]] | [[{|width: 7em; background-color: white; color: navy}suppl_vars_exercise|Exercise {{::lock-icon.png?nolink|}}]] | ||

- | The ecological meaning of the axes in unconstrained ordination can be interpreted by relating the sample scores on these axes to external supplementary variables (usually measured or estimated environmental variables). This relationship can be done either by correlating the supplementary variable to the first two or few main axes using Pearson’s correlation coefficient or by regressing the supplementary on the sample scores of selected ordination axes using (weighted) multiple regression. The **correlation** method is more intuitive to understand, but the application is limited only to linear ordination methods, while the **(weighted) multiple regression** is less intuitive, but more general, applicable to both linear and unimodal ordination methods. The results are often used to project supplementary variables passively onto the ordination diagram while reporting the strength of the relationship with ordination axes (correlation coefficient in the case of correlation, r<sup>2</sup> in the case of multiple regression) and possibly also the test of significance. There is a difference between linear and unimodal ordination method; while the linear method all samples have the same weight, in the unimodal method the sample weight (its importance in the analysis) is proportional to the sum of species abundances in this sample. This has to be reflected when the supplementary variables are related to ordination axes, and the weights need to be included in the calculation (that’s why weighted multiple regression is used). | + | The ecological meaning of the axes in unconstrained ordination can be interpreted by relating the sample scores on these axes to external supplementary variables (usually measured or estimated environmental variables). This relationship can be done either by correlating the supplementary variable to the first two or few main axes using Pearson’s correlation coefficient or by regressing the supplementary on the sample scores of selected ordination axes using (weighted) multiple regression. The **correlation** method is more intuitive to understand, but the application is limited only to linear ordination methods, while the **(weighted) multiple regression** is less intuitive, but more general, applicable to both linear and unimodal ordination methods. The results are often used to project supplementary variables passively onto the ordination diagram while reporting the strength of the relationship with ordination axes (correlation coefficient in the case of correlation, r<sup>2</sup> in the case of multiple regression) and possibly also the test of significance. There is a difference between linear and unimodal ordination method; while in the linear method all samples have the same weight, in the unimodal method the sample weight (its importance in the analysis) is proportional to the sum of species abundances in this sample. This has to be reflected when the supplementary variables are related to ordination axes, and the weights need to be included in the calculation (that’s why weighted multiple regression is used). |

==== Correlation of supplementary variable with selected ordination axes ==== | ==== Correlation of supplementary variable with selected ordination axes ==== | ||

Line 45: | Line 45: | ||

==== Multiple testing issue and available corrections ==== | ==== Multiple testing issue and available corrections ==== | ||

- | The more tests of significance we are doing, the higher is the chance to observe the significant result. This rule is called //multiple testing issue// and can be illustrated in a simple example. I generated two random variables with normal distribution, calculated their regression, and tested it (using parametric F-test). One would expect that the test will not return a significant result since the variables are generated randomly. But if I repeat this 100 times (<imgref multiple-testing>), you can see that some of the results turn to be significant. The proportion of significant results depends on the threshold value you use to deem result significant; e.g., if you consider as significant results with P-value lower than 5% (alpha = 0.05), then 5% of the tests may appear as significant even though the variables are random. | + | The more tests of significance we are doing, the higher is the chance to observe the significant result, even if the null hypothesis is true (no relationship). This rule is called //multiple testing issue// and can be illustrated in a simple example. I generated two random variables with normal distribution, calculated their regression, and tested it (using parametric F-test). One would expect that the test will not return a significant result since the variables are generated randomly. But if I repeat this 100 times (<imgref multiple-testing>), you can see that some of the results turn to be significant. The proportion of significant results depends on the threshold value you use to deem result significant; e.g., if you consider as significant results with P-value lower than 5% (alpha = 0.05), then about 5% of the tests may appear as significant even though the variables are random (Type I error). |

<imgcaption multiple-testing|Multiple testing issue. I generated two random variables (normally distributed) and tested the significance of their regression with parametric F-test. I replicated this 100 times, each with newly generated random variables. Significant regressions (P < 0.05) are displayed with a red regression line. From a total of 100 analyses, four are significant at the level of 0.05 (almost 5% of all analyses).>{{ :obrazky:multiple-testing-issue.jpg?direct |}}</imgcaption> | <imgcaption multiple-testing|Multiple testing issue. I generated two random variables (normally distributed) and tested the significance of their regression with parametric F-test. I replicated this 100 times, each with newly generated random variables. Significant regressions (P < 0.05) are displayed with a red regression line. From a total of 100 analyses, four are significant at the level of 0.05 (almost 5% of all analyses).>{{ :obrazky:multiple-testing-issue.jpg?direct |}}</imgcaption> | ||

- | The solution is to avoid doing multiple tests if possible, and if not, apply some of the corrections methods. The most known is perhaps Bonferroni correction, which is however also very conservative (you simply multiply the resulting P-values by the overall number of tests you did in the analysis, which becomes detrimental in case that the number of tests is high). More useful (and less conservative) are Holm or false discovery rate corrections. | + | The solution is to either avoid doing multiple tests or apply some of the corrections methods. Perhaps the best known is Bonferroni correction, which is however also very conservative (you simply multiply the resulting P-values by the overall number of tests //m// you did in the analysis, P<sub>adj</sub> = P * m) and becomes detrimental in case that the number of tests is high, since it reduces the power of the test. Less conservative are Holm or false discovery rate (FDR) corrections. More about multiple testing issue can be found in my [[https://davidzeleny.net/blog/2019/03/15/about-p-values-and-multiple-testing-issue/|blog post]]. |

- | In the case of example above using nine random and real supplementary variables and relating them to unconstrained axes of ordination, if we apply the multiple testing correction (here Bonferroni, <imgref envfit_adj>), all results in the case of random variables become insignificant (in case of the real variables, one more result become insignificant compared to the not-corrected results). Since in this case, the test is permutational and the minimal P-value depends on the number of permutations, in case that there are many supplementary variables (and many tests), it may be necessary to increase the number of permutations to decrease the minimum P-value which can be calculated. For example, if the number of permutations is set to 199 (e.g. due to the calculation time), the minimum P-value which can be reached is P<sub>min</sub> = 1/(199+1) = 0.005; if there are ten variables and the correction for multiple testing is done by Bonferroni (P-value * number of tests), the best resulting corrected P-value would be 0.005*10 = 0.05, which means that we would be unable to reject the null hypothesis on P < 0.05. | + | In the case of example above using nine random and real supplementary variables and relating them to unconstrained ordination axes, if we apply the multiple testing correction (here Bonferroni, <imgref envfit_adj>), all results in the case of random variables become insignificant (in case of the real variables, one more result become insignificant compared to the not-corrected results). Since in this case, the test is permutational and the minimal P-value depends on the number of permutations, in case that there are many supplementary variables (and many tests), it may be necessary to increase the number of permutations to decrease the minimum P-value which can be calculated. For example, if the number of permutations is set to 199 (e.g. due to the calculation time), the minimum P-value which can be reached is P<sub>min</sub> = 1/(199+1) = 0.005; if there are ten variables and the correction for multiple testing is done by Bonferroni (P-value * number of tests), the best resulting corrected P-value would be 0.005*10 = 0.05, which means that we would be unable to reject the null hypothesis on P < 0.05. |

<imgcaption envfit_adj|Results of multiple regression of random (left) and real (right) supplementary variables with first two axes of unconstrained ordination, with P-values adjusted by Bonferroni correction to acknowledge the multiple testing issue.>{{:obrazky:envfit_random_real_adjusted.jpg?direct|}}</imgcaption> | <imgcaption envfit_adj|Results of multiple regression of random (left) and real (right) supplementary variables with first two axes of unconstrained ordination, with P-values adjusted by Bonferroni correction to acknowledge the multiple testing issue.>{{:obrazky:envfit_random_real_adjusted.jpg?direct|}}</imgcaption> |

en/suppl_vars.txt · Last modified: 2019/03/16 06:20 by David Zelený