Trace:

en:suppl_vars

This shows you the differences between two versions of the page.

Both sides previous revision Previous revision Next revision | Previous revision | ||

en:suppl_vars [2019/03/16 06:13] David Zelený [Multiple testing issue and available corrections] |
en:suppl_vars [2019/03/16 06:20] (current) David Zelený [Multiple testing issue and available corrections] |
||
---|---|---|---|

Line 7: | Line 7: | ||

[[{|width: 7em; background-color: white; color: navy}suppl_vars_exercise|Exercise {{::lock-icon.png?nolink|}}]] | [[{|width: 7em; background-color: white; color: navy}suppl_vars_exercise|Exercise {{::lock-icon.png?nolink|}}]] | ||

- | The ecological meaning of the axes in unconstrained ordination can be interpreted by relating the sample scores on these axes to external supplementary variables (usually measured or estimated environmental variables). This relationship can be done either by correlating the supplementary variable to the first two or few main axes using Pearson’s correlation coefficient or by regressing the supplementary on the sample scores of selected ordination axes using (weighted) multiple regression. The **correlation** method is more intuitive to understand, but the application is limited only to linear ordination methods, while the **(weighted) multiple regression** is less intuitive, but more general, applicable to both linear and unimodal ordination methods. The results are often used to project supplementary variables passively onto the ordination diagram while reporting the strength of the relationship with ordination axes (correlation coefficient in the case of correlation, r<sup>2</sup> in the case of multiple regression) and possibly also the test of significance. There is a difference between linear and unimodal ordination method; while the linear method all samples have the same weight, in the unimodal method the sample weight (its importance in the analysis) is proportional to the sum of species abundances in this sample. This has to be reflected when the supplementary variables are related to ordination axes, and the weights need to be included in the calculation (that’s why weighted multiple regression is used). | + | The ecological meaning of the axes in unconstrained ordination can be interpreted by relating the sample scores on these axes to external supplementary variables (usually measured or estimated environmental variables). This relationship can be done either by correlating the supplementary variable to the first two or few main axes using Pearson’s correlation coefficient or by regressing the supplementary on the sample scores of selected ordination axes using (weighted) multiple regression. The **correlation** method is more intuitive to understand, but the application is limited only to linear ordination methods, while the **(weighted) multiple regression** is less intuitive, but more general, applicable to both linear and unimodal ordination methods. The results are often used to project supplementary variables passively onto the ordination diagram while reporting the strength of the relationship with ordination axes (correlation coefficient in the case of correlation, r<sup>2</sup> in the case of multiple regression) and possibly also the test of significance. There is a difference between linear and unimodal ordination method; while in the linear method all samples have the same weight, in the unimodal method the sample weight (its importance in the analysis) is proportional to the sum of species abundances in this sample. This has to be reflected when the supplementary variables are related to ordination axes, and the weights need to be included in the calculation (that’s why weighted multiple regression is used). |

==== Correlation of supplementary variable with selected ordination axes ==== | ==== Correlation of supplementary variable with selected ordination axes ==== | ||

Line 45: | Line 45: | ||

==== Multiple testing issue and available corrections ==== | ==== Multiple testing issue and available corrections ==== | ||

- | The more tests of significance we are doing, the higher is the chance to observe the significant result, even if the null hypothesis is true (no relationship). This rule is called //multiple testing issue// and can be illustrated in a simple example. I generated two random variables with normal distribution, calculated their regression, and tested it (using parametric F-test). One would expect that the test will not return a significant result since the variables are generated randomly. But if I repeat this 100 times (<imgref multiple-testing>), you can see that some of the results turn to be significant. The proportion of significant results depends on the threshold value you use to deem result significant; e.g., if you consider as significant results with P-value lower than 5% (alpha = 0.05), then about 5% of the tests may appear as significant even though the variables are random (Type I error). | + | The more tests of significance we are doing, the higher is the chance to observe the significant result, even if the null hypothesis is true (no relationship). This rule is called //multiple testing issue// and can be illustrated in a simple example. I generated two random variables with normal distribution, calculated their regression, and tested it (using parametric F-test). One would expect that the test will not return a significant result since the variables are generated randomly. But if I repeat this 100 times (<imgref multiple-testing>), you can see that some of the results turn to be significant. The proportion of significant results depends on the threshold value you use to deem result significant; e.g., if you consider as significant results with P-value lower than 5% (alpha = 0.05), then about 5% of the tests may appear as significant even though the variables are random (Type I error). Or, put in another way, the probability that at least one of the tests will be significant at P < alpha can be calculated 1 - (1 - m)<sup>alpha</sup>, which is called //family-wise Type I error rate// - the probability we are conducting Type I error rate if we interpret the results of multiple tests without any correction. |

<imgcaption multiple-testing|Multiple testing issue. I generated two random variables (normally distributed) and tested the significance of their regression with parametric F-test. I replicated this 100 times, each with newly generated random variables. Significant regressions (P < 0.05) are displayed with a red regression line. From a total of 100 analyses, four are significant at the level of 0.05 (almost 5% of all analyses).>{{ :obrazky:multiple-testing-issue.jpg?direct |}}</imgcaption> | <imgcaption multiple-testing|Multiple testing issue. I generated two random variables (normally distributed) and tested the significance of their regression with parametric F-test. I replicated this 100 times, each with newly generated random variables. Significant regressions (P < 0.05) are displayed with a red regression line. From a total of 100 analyses, four are significant at the level of 0.05 (almost 5% of all analyses).>{{ :obrazky:multiple-testing-issue.jpg?direct |}}</imgcaption> |

en/suppl_vars.1552688001.txt.gz · Last modified: 2019/03/16 06:13 by David Zelený