Trace:

en:suppl_vars

This shows you the differences between two versions of the page.

Both sides previous revision Previous revision | Next revision Both sides next revision | ||

en:suppl_vars [2019/02/25 20:56] David Zelený |
en:suppl_vars [2019/03/16 06:13] David Zelený [Multiple testing issue and available corrections] |
||
---|---|---|---|

Line 45: | Line 45: | ||

==== Multiple testing issue and available corrections ==== | ==== Multiple testing issue and available corrections ==== | ||

- | The more tests of significance we are doing, the higher is the chance to observe the significant result. This rule is called //multiple testing issue// and can be illustrated in a simple example. I generated two random variables with normal distribution, calculated their regression, and tested it (using parametric F-test). One would expect that the test will not return a significant result since the variables are generated randomly. But if I repeat this 100 times (<imgref multiple-testing>), you can see that some of the results turn to be significant. The proportion of significant results depends on the threshold value you use to deem result significant; e.g., if you consider as significant results with P-value lower than 5% (alpha = 0.05), then 5% of the tests may appear as significant even though the variables are random. | + | The more tests of significance we are doing, the higher is the chance to observe the significant result, even if the null hypothesis is true (no relationship). This rule is called //multiple testing issue// and can be illustrated in a simple example. I generated two random variables with normal distribution, calculated their regression, and tested it (using parametric F-test). One would expect that the test will not return a significant result since the variables are generated randomly. But if I repeat this 100 times (<imgref multiple-testing>), you can see that some of the results turn to be significant. The proportion of significant results depends on the threshold value you use to deem result significant; e.g., if you consider as significant results with P-value lower than 5% (alpha = 0.05), then about 5% of the tests may appear as significant even though the variables are random (Type I error). |

<imgcaption multiple-testing|Multiple testing issue. I generated two random variables (normally distributed) and tested the significance of their regression with parametric F-test. I replicated this 100 times, each with newly generated random variables. Significant regressions (P < 0.05) are displayed with a red regression line. From a total of 100 analyses, four are significant at the level of 0.05 (almost 5% of all analyses).>{{ :obrazky:multiple-testing-issue.jpg?direct |}}</imgcaption> | <imgcaption multiple-testing|Multiple testing issue. I generated two random variables (normally distributed) and tested the significance of their regression with parametric F-test. I replicated this 100 times, each with newly generated random variables. Significant regressions (P < 0.05) are displayed with a red regression line. From a total of 100 analyses, four are significant at the level of 0.05 (almost 5% of all analyses).>{{ :obrazky:multiple-testing-issue.jpg?direct |}}</imgcaption> | ||

- | The solution is to avoid doing multiple tests if possible, and if not, apply some of the corrections methods. The most known is perhaps Bonferroni correction, which is however also very conservative (you simply multiply the resulting P-values by the overall number of tests you did in the analysis, which becomes detrimental in case that the number of tests is high). More useful (and less conservative) are Holm or false discovery rate corrections. | + | The solution is to either avoid doing multiple tests or apply some of the corrections methods. Perhaps the best known is Bonferroni correction, which is however also very conservative (you simply multiply the resulting P-values by the overall number of tests //m// you did in the analysis, P<sub>adj</sub> = P * m) and becomes detrimental in case that the number of tests is high, since it reduces the power of the test. Less conservative are Holm or false discovery rate (FDR) corrections. More about multiple testing issue can be found in my [[https://davidzeleny.net/blog/2019/03/15/about-p-values-and-multiple-testing-issue/|blog post]]. |

- | In the case of example above using nine random and real supplementary variables and relating them to unconstrained axes of ordination, if we apply the multiple testing correction (here Bonferroni, <imgref envfit_adj>), all results in the case of random variables become insignificant (in case of the real variables, one more result become insignificant compared to the not-corrected results). Since in this case, the test is permutational and the minimal P-value depends on the number of permutations, in case that there are many supplementary variables (and many tests), it may be necessary to increase the number of permutations to decrease the minimum P-value which can be calculated. For example, if the number of permutations is set to 199 (e.g. due to the calculation time), the minimum P-value which can be reached is P<sub>min</sub> = 1/(199+1) = 0.005; if there are ten variables and the correction for multiple testing is done by Bonferroni (P-value * number of tests), the best resulting corrected P-value would be 0.005*10 = 0.05, which means that we would be unable to reject the null hypothesis on P < 0.05. | + | In the case of example above using nine random and real supplementary variables and relating them to unconstrained ordination axes, if we apply the multiple testing correction (here Bonferroni, <imgref envfit_adj>), all results in the case of random variables become insignificant (in case of the real variables, one more result become insignificant compared to the not-corrected results). Since in this case, the test is permutational and the minimal P-value depends on the number of permutations, in case that there are many supplementary variables (and many tests), it may be necessary to increase the number of permutations to decrease the minimum P-value which can be calculated. For example, if the number of permutations is set to 199 (e.g. due to the calculation time), the minimum P-value which can be reached is P<sub>min</sub> = 1/(199+1) = 0.005; if there are ten variables and the correction for multiple testing is done by Bonferroni (P-value * number of tests), the best resulting corrected P-value would be 0.005*10 = 0.05, which means that we would be unable to reject the null hypothesis on P < 0.05. |

<imgcaption envfit_adj|Results of multiple regression of random (left) and real (right) supplementary variables with first two axes of unconstrained ordination, with P-values adjusted by Bonferroni correction to acknowledge the multiple testing issue.>{{:obrazky:envfit_random_real_adjusted.jpg?direct|}}</imgcaption> | <imgcaption envfit_adj|Results of multiple regression of random (left) and real (right) supplementary variables with first two axes of unconstrained ordination, with P-values adjusted by Bonferroni correction to acknowledge the multiple testing issue.>{{:obrazky:envfit_random_real_adjusted.jpg?direct|}}</imgcaption> |

en/suppl_vars.txt · Last modified: 2019/03/16 06:20 by David Zelený