User Tools

Site Tools


en:monte_carlo_examples

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
en:monte_carlo_examples [2019/01/26 20:00]
David Zelený
en:monte_carlo_examples [2019/02/10 16:04] (current)
David Zelený
Line 8: Line 8:
  
  
- 
-==== Example 1: Is the variance explained by environmental variables significant?​ ==== 
- 
-This example directly follows the [[en:​rda_examples|Example 1 in RDA & tb-RDA section]] and [[en:​expl_var_examples|Example 1 in Explained variance section]], consider checking them first. We used Vltava river valley dataset, and two field-measured environmental variables, soil pH and soil depth (''​pH''​ and ''​SOILDPT''​),​ to explain variance in species composition. We found that they can explain 8.9% of overall variance; when compared to the variance which can be maximally explained by two explanatory variables (21.7% explained by two tb-PCA axes, see here) this sounds not bad (it is more than 40% of variance we can maximally explain in this dataset with two variables). But is the result significant?​ By significant,​ I mean: is the variance considerably higher than the variance explained (in average) by two random variables not related to species composition?​ This is the task for Monte Carlo permutation test (check Theory part to see how it works). 
- 
-First, get the data and calculate tb-RDA on them (this is esentially repeating beginning of [[en:​expl_var_examples|Example 1 in the section Explained variance]]):​ 
-<code rsplus> 
-vltava.spe <- read.delim ('​https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​vltava-spe.txt',​ row.names = 1) 
-vltava.env <- read.delim ('​https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​vltava-env.txt'​) 
-spe <- vltava.spe 
-env <- vltava.env[,​ c('​pH',​ '​SOILDPT'​)] 
-library (vegan) 
-spe.log <- log1p (spe)  
-spe.hell <- decostand (spe.log, '​hell'​) 
-tbRDA <- rda (spe.hell ~ pH + SOILDPT, data = env) 
-R2.obs <- RsquareAdj (tbRDA)$r.squared 
-R2.obs 
-</​code>​ 
-<​code>​ 
-[1] 0.08868879 
-</​code>​ 
- 
-In the next step, calculate variance explained by randomized env. variables 
-<code rsplus> 
-env.rand <- env[sample (1:​97),​] ​ # the function "​sample"​ will reshuffle the rows with environmental variabels 
-tbRDA.rand <- rda (spe.hell ~ pH + SOILDPT, data = env.rand) 
-RsquareAdj (tbRDA.rand)$r.squared 
-</​code>​ 
-This value represents the variance explained by two random explanatory variables. My result was 0.01854349, but this value will change in each run. We need to do enough repetitions (permutations) to get an idea about the distribution of this values (null model). We can use the "​for"​ loop for it, or, as in this case, function "​replicate",​ with two arguments: ''​n''​ = number of replicates, and ''​expr''​ = expression to be replicated (if more lines of script are involved, this expression needs to be enclosed in curly brackets {}): 
- 
-<code rsplus> 
-n.perm <- 99  # set the number of permutations 
-R2.rand <- replicate (n = n.perm, expr = { 
-  env.rand <- env[sample (1:97),] 
-  tbRDA.rand <- rda (spe.hell ~ pH + SOILDPT, data = env.rand) 
-  RsquareAdj (tbRDA.rand)$r.squared 
-  }) 
-<​code>​ 
-The vector ''​R2.rand''​ contains 99 values of variance explained by random variables. In the next step, we will merge them with the observed R2 (''​R2.obs''​),​ since this is considered to be also the part of null distribution (and this is also the reason why the numbers of permutation are usually ending with 9): 
-<code rsplus> 
-R2 <- c (R2.rand, R2.obs) 
-</​code>​ 
-Draw the histogram of values and highlight the observed R2 there: 
-<code rsplus> 
-hist (R2, nclass = 100)  # ; argument "​nclass"​ separates the bars into 100 categories (compare with hist without this argument) 
-abline (v = R2.obs, col = '​red'​) ​ # red line to indicate where in the histogram is the observed value 
-</​code>​ 
-{{:​obrazky:​perm-test-histogram-r2.png?​direct|}} 
- 
-Already now we can see that our observed value (0.089, red vertical line) is far higher than all values generated by randomized env. variables (''​range (R2.rand)''​ shows that these values are between 0.014 and 0.033, but again, this range will change in another run of the analysis, and will likely increase if the number of permutations increases, since there will be higher chance to get values more different from the mean). 
- 
-To calculate the significance,​ we need to know what is the number of cases in which the variance explained by random explanatory variables was higher or equal to the observed one (explained by real variables). Remember that R.obs is the part of our null distribution (we added it there), so we always obtain at least one value in such comparison (the R.obs itself). 
- 
-<code rsplus> 
-P <- sum (R2 >= R2.obs)/​(n.perm + 1)  # 0.01 
-</​code>​ 
- 
-You can see that our observed R<​sup>​2</​sup>​ (R2.obs) is far higher than any R2 generated by a random variable, so the calculation of P-value contains only one value equal or higher than observed R2 (the observed R2 itself). The resulting P-value is 0.01, which is the lowest P-value we can get with Monte Carlo permutation test based on 99 permutations:​ 
- 
-<​code>​ 
-1/(99 + 1)  # 0.01 
-</​code>​ 
- 
-If we are happy with rejecting the null hypothesis at the alpha level of 5% (P < 0.05), than this could be sufficient. But if we need to get lower P-value (e.g. because we are going to correct them for multiple testing, as in the case of forward selection later), we need to increase the number of permutations. As we saw in the theoretical part, the relationship between the number of permutations and the minimal P-value we can obtain is P<​sub>​min</​sub>​ = 1/​(number_of_permutations + 1). If you hope to be able reject the null hypothesis at P < 0.001, you need at least 1/0.001-1 = 999 permutations (in that case the lowest P value will be 0.001) or better more (e.g. 1499 permutations,​ with the lowest P-value 1/(1499+1) = 0.00067). 
- 
-In ''​vegan'',​ the test of the significance for constrained ordination is done by function ''​anova''​ (this may be a bit confusing name, since it is not really calculating ANOVA) 
-<code rsplus> 
-anova (tbRDA, permutations = 99) 
-</​code>​ 
-<​code>​ 
-Permutation test for rda under reduced model 
-Permutation:​ free 
-Number of permutations:​ 99 
- 
-Model: rda(formula = spe.hell ~ pH + SOILDPT, data = env) 
-         Df Variance ​    F Pr(>​F) ​   
-Model     ​2 ​ 0.06250 4.574   0.01 ** 
-Residual 94  0.64226 ​               ​ 
---- 
-  Signif. codes: ​ 0 '​***'​ 0.001 '​**'​ 0.01 '​*'​ 0.05 '​.'​ 0.1 ' ' 1 
-</​code>​ 
-The P-value is the same (P = 0.01) as in our manual Monte-Carlo permutation test above. Still, 99 permutations is really low number; consider using at least the default 999 permutations to get P-values close to 0.001. 
-==== Example 2: Effect of environmental variables on species composition of plants in Carpathian wetlands tested by tb-RDA==== 
- 
-The dataset [[en:​data:​wetlands|Carpathian spring fen meadows]] consists of three data matrices - species matrix with bryophytes, species matrix with vascular plants and environmental matrix with chemical variables and slope. We ignore bryophytes here, and focus on relationship between vascular plants (''​vasc''​) and chemical variables (''​chem''​). 
- 
-<code rsplus> 
-# Carpathian wetlands - import data 
-vasc <- read.delim ('​https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​vasc_plants.txt',​ row.names = 1) 
-chem <- read.delim ('​https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​chemistry.txt',​ row.names = 1) 
-  
-</​code>​ 
-For the purpose of tb-RDA, we first pre-transform the species data using Hellinger'​s transformation (tb-RDA), and remove ''​slope''​ from the matrix of evironmental variabels (slope of the site is not a chemical variable): 
- 
-<code rsplus> 
-library (vegan) ​ # if you haven'​t use it up to now 
-vasc.hell <- decostand (vasc, '​hell'​) ​ # pre-transform species data by Hellinger transformation 
-chem <- chem[, -15]  # remove slope 
-rda.vasc <- rda (vasc.hell ~ ., chem) 
-</​code>​ 
- 
-To see the output, we can either directly print the ''​rda.vasc''​ object, or apply the function ''​summary''​ on it to get more extended information (it prints also all the scores, which may get quite long, so wrap it into ''​header''​ function to shorten those tables into only first six rows): 
-<code rsplus> 
-head (summary (rda.vasc)) 
-</​code>​ 
-<​code>​ 
-Call: 
-rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 +      PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem)  
- 
-Partitioning of variance: 
-              Inertia Proportion 
-Total          0.5674 ​    ​1.0000 
-Constrained ​   0.2061 ​    ​0.3632 
-Unconstrained ​ 0.3613 ​    ​0.6368 
- 
-Eigenvalues,​ and their contribution to the variance ​ 
- 
-Importance of components: 
-                        RDA1    RDA2    RDA3    RDA4    RDA5     ​RDA6 ​    ​RDA7 ​   RDA8     ​RDA9 ​   RDA10    RDA11    RDA12    RDA13    RDA14     ​PC1 ​    ​PC2 ​    PC3 
-Eigenvalue ​           0.0978 0.02165 0.01501 0.01077 0.01060 0.009272 0.008199 0.00615 0.005908 0.005756 0.004406 0.004055 0.003383 0.003123 0.03260 0.03204 0.02084 
-Proportion Explained ​ 0.1724 0.03815 0.02645 0.01899 0.01868 0.016340 0.014450 0.01084 0.010410 0.010140 0.007770 0.007150 0.005960 0.005500 0.05746 0.05647 0.03672 
-Cumulative Proportion 0.1724 0.21051 0.23696 0.25595 0.27463 0.290970 0.305420 0.31626 0.326680 0.336820 0.344590 0.351730 0.357700 0.363200 0.42066 0.47713 0.51386 
-                          PC4     ​PC5 ​    ​PC6 ​    ​PC7 ​    ​PC8 ​    ​PC9 ​   PC10    PC11     ​PC12 ​    ​PC13 ​    ​PC14 ​    ​PC15 ​    ​PC16 ​   PC17     ​PC18 ​    ​PC19 ​    PC20 
-Eigenvalue ​           0.01711 0.01638 0.01426 0.01370 0.01295 0.01166 0.01106 0.01057 0.009716 0.008781 0.008481 0.008224 0.007753 0.00759 0.007313 0.007163 0.006564 
-Proportion Explained ​ 0.03015 0.02887 0.02513 0.02415 0.02282 0.02055 0.01950 0.01862 0.017120 0.015480 0.014950 0.014490 0.013660 0.01338 0.012890 0.012620 0.011570 
-Cumulative Proportion 0.54401 0.57287 0.59800 0.62215 0.64497 0.66552 0.68502 0.70364 0.720760 0.736240 0.751180 0.765680 0.779340 0.79272 0.805610 0.818230 0.829800 
-                          PC21     ​PC22 ​    ​PC23 ​   PC24     ​PC25 ​    ​PC26 ​   PC27     ​PC28 ​   PC29     ​PC30 ​    ​PC31 ​    ​PC32 ​    ​PC33 ​    ​PC34 ​    ​PC35 ​   PC36 
-Eigenvalue ​           0.006371 0.005808 0.005689 0.00547 0.005307 0.004855 0.00475 0.004443 0.00403 0.003576 0.003554 0.003397 0.003139 0.002931 0.002827 0.00276 
-Proportion Explained ​ 0.011230 0.010240 0.010030 0.00964 0.009350 0.008560 0.00837 0.007830 0.00710 0.006300 0.006260 0.005990 0.005530 0.005160 0.004980 0.00486 
-Cumulative Proportion 0.841030 0.851260 0.861290 0.87093 0.880280 0.888840 0.89721 0.905040 0.91215 0.918450 0.924710 0.930700 0.936230 0.941400 0.946380 0.95124 
-                          PC37     ​PC38 ​    ​PC39 ​    ​PC40 ​    ​PC41 ​    ​PC42 ​    ​PC43 ​   PC44     ​PC45 ​    ​PC46 ​    ​PC47 ​    ​PC48 ​   PC49    PC50      PC51      PC52 
-Eigenvalue ​           0.002552 0.002361 0.002284 0.002046 0.001903 0.001807 0.001691 0.00157 0.001559 0.001434 0.001328 0.001304 0.00110 0.00103 0.0009236 0.0008693 
-Proportion Explained ​ 0.004500 0.004160 0.004030 0.003610 0.003350 0.003180 0.002980 0.00277 0.002750 0.002530 0.002340 0.002300 0.00194 0.00182 0.0016300 0.0015300 
-Cumulative Proportion 0.955740 0.959900 0.963930 0.967530 0.970890 0.974070 0.977050 0.97982 0.982570 0.985090 0.987430 0.989730 0.99167 0.99349 0.9951100 0.9966500 
-                           ​PC53 ​     PC54      PC55 
-Eigenvalue ​           0.0007434 0.0006739 0.0004857 
-Proportion Explained ​ 0.0013100 0.0011900 0.0008600 
-Cumulative Proportion 0.9979600 0.9991400 1.0000000 
- 
-Accumulated constrained eigenvalues 
-Importance of components: 
-                        RDA1    RDA2    RDA3    RDA4    RDA5     ​RDA6 ​    ​RDA7 ​   RDA8     ​RDA9 ​   RDA10    RDA11    RDA12    RDA13    RDA14 
-Eigenvalue ​           0.0978 0.02165 0.01501 0.01077 0.01060 0.009272 0.008199 0.00615 0.005908 0.005756 0.004406 0.004055 0.003383 0.003123 
-Proportion Explained ​ 0.4746 0.10504 0.07282 0.05228 0.05144 0.044990 0.039790 0.02984 0.028670 0.027930 0.021380 0.019680 0.016420 0.015150 
-Cumulative Proportion 0.4746 0.57961 0.65243 0.70471 0.75615 0.801140 0.840920 0.87077 0.899440 0.927370 0.948750 0.968430 0.984850 1.000000 
- 
-Scaling 2 for species and site scores 
-* Species are scaled proportional to eigenvalues 
-* Sites are unscaled: weighted dispersion equal on all dimensions 
-* General scaling constant of scores: ​ 2.501418 ​ 
- 
- 
-Species scores 
- 
-               ​RDA1 ​      ​RDA2 ​      ​RDA3 ​     RDA4      RDA5      RDA6 
-Acer_sp. -0.0154645 -0.0007631 -0.0075229 -0.008245 ​ 0.001732 ​ 0.007874 
-AchiMill ​ 0.0003371 ​ 0.0002172 ​ 0.0099623 -0.023240 -0.010564 ​ 0.017989 
-AgroCani ​ 0.2989206 -0.0534899 -0.0003443 ​ 0.019328 -0.002615 -0.012205 
-AgroCapi ​ 0.0439209 ​ 0.0196134 ​ 0.0014274 -0.007778 -0.018888 ​ 0.021651 
-AgroStol -0.1074882 -0.0278701 ​ 0.0070877 -0.050120 -0.016408 -0.003469 
-AjugRept -0.0607449 ​ 0.0359534 -0.0898934 -0.005017 -0.001849 -0.002417 
-....                                                                    
- 
- 
-Site scores (weighted sums of species scores) 
- 
-             ​RDA1 ​    ​RDA2 ​     RDA3    RDA4     ​RDA5 ​    RDA6 
-S1       ​-0.13618 ​ 0.34012 -0.648298 ​ 0.3345 -0.08732 ​ 0.29732 
-S2       ​-0.02351 ​ 0.50122 ​ 0.483985 -0.4592 -0.34310 ​ 0.01176 
-S3       ​-0.36203 -0.29731 ​ 0.187990 -0.2157 ​ 0.37425 ​ 0.24345 
-S4       ​-0.45847 -0.75706 ​ 0.028869 ​ 0.1514 -0.12375 ​ 0.24845 
-S5       ​-0.34538 ​ 0.01186 -0.543172 ​ 0.0689 -0.09529 -0.01918 
-S6       ​-0.38319 -0.12405 ​ 0.008091 ​ 0.3563 ​ 0.37374 ​ 0.10958 
-....                                                          ​ 
- 
- 
-Site constraints (linear combinations of constraining variables) 
- 
-             ​RDA1 ​    ​RDA2 ​     RDA3    RDA4     ​RDA5 ​     RDA6 
-S1       ​-0.29599 ​ 0.51952 -0.676450 ​ 0.3432 -0.11594 ​ 0.041619 
-S2        0.13970 ​ 0.17128 ​ 0.057787 -0.2379 -0.43640 ​ 0.078915 
-S3       ​-0.37947 -0.26887 -0.021117 ​ 0.1957 ​ 0.09659 ​ 0.112267 
-S4       ​-0.42693 -0.41047 ​ 0.057856 ​ 0.2340 ​ 0.15766 ​ 0.017807 
-S5       ​-0.32293 ​ 0.13820 -0.465399 ​ 0.3952 ​ 0.13787 -0.020922 
-S6       ​-0.09608 -0.05028 ​ 0.002395 ​ 0.1068 ​ 0.12036 ​ 0.002839 
-....                                                            
- 
- 
-Biplot scores for constraining variables 
- 
-            RDA1       ​RDA2 ​    ​RDA3 ​    ​RDA4 ​    ​RDA5 ​     RDA6 
-Ca      -0.88402 -0.2568958 -0.10774 ​ 0.09646 ​ 0.02262 -0.022930 
-Mg      -0.81588 -0.0021299 -0.22725 -0.33887 ​ 0.08347 -0.034787 
-Fe       ​0.12061 ​ 0.4707166 ​ 0.03481 ​ 0.12330 ​ 0.24032 -0.144807 
-K       ​-0.32015 ​ 0.3987824 -0.07212 ​ 0.02156 -0.19731 -0.273450 
-Na      -0.51577 ​ 0.4008411 ​ 0.03295 -0.06688 -0.13851 -0.227552 
-Si      -0.29676 ​ 0.4289340 -0.67314 ​ 0.06990 -0.21501 -0.089828 
-SO4     ​-0.34191 -0.3049719 -0.25253 -0.19042 -0.24876 ​ 0.064491 
-PO4      0.21537 -0.0008569 ​ 0.21286 ​ 0.11162 ​ 0.12566 -0.044834 
-NO3      0.15926 -0.4121772 -0.09585 ​ 0.23946 -0.48853 ​ 0.245962 
-NH3      0.44013 -0.2471674 -0.22154 -0.02665 ​ 0.38679 ​ 0.488924 
-Cl      -0.03537 -0.2772850 -0.08676 ​ 0.18063 ​ 0.31496 -0.002797 
-Corg     ​0.69131 -0.0833783 ​ 0.05578 ​ 0.28857 ​ 0.25577 -0.260926 
-pH      -0.67211 ​ 0.0469155 ​ 0.35972 ​ 0.16870 -0.03229 ​ 0.552316 
-conduct -0.82805 ​ 0.0162761 ​ 0.06142 ​ 0.17723 ​ 0.14450 ​ 0.280628 
-</​code>​ 
- 
-There are 14 environmental variables, which means that we got 14 constrained ordination axes (named RDA1 to RDA14). Unconstrained axes are named as PC axes. The table ''​Proportion Explained''​ with the partitioning of the variance between constrained axes (i.e. environmental variables) and unconstrained axes (i.e. variance not explained by environmental factors) shows that the proportion of the total variance, which is explained by all environmental factors, is 36.3% (''​0.3632''​ at the row ''​Constrained''​ and column ''​Proportion''​). The first constrained axis explains 17.2%, while the second just 3.8% (''​Importance of components''​ > ''​Proportion Explained'',​ columns ''​RDA1''​ and ''​RDA2''​);​ it seems that those 14 variables represent mostly a single strong environmental gradient represented by the first constrained axis. 
- 
-What we calculated here is a global model - it includes all available environmental variables. The next question is whether the global model is significant:​ 
-<code rsplus> 
-anova (rda.vasc) 
-</​code>​ 
-<​code>​ 
-Permutation test for rda under reduced model 
-Permutation:​ free 
-Number of permutations:​ 999 
- 
-Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) 
-         Df Variance ​     F Pr(>​F) ​   ​ 
-Model    14  0.20608 2.2407 ​ 0.001 *** 
-Residual 55  0.36133 ​                 ​ 
---- 
-Signif. codes: ​ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
-</​code>​ 
- 
-Indeed it is significant (P = 0.001, which is the lowest P-value we can obtain with the default number of permutations,​ 999). Alternatively,​ we may be interested in the significance of only the first constrained axis. This is achieved by adding the argument ''​first = TRUE'':​ 
-<code rsplus> 
-anova (rda.vasc, first = TRUE) 
-</​code>​ 
-<​code>​ 
-Permutation test for rda under reduced model 
-Permutation:​ free 
-Number of permutations:​ 999 
- 
-Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) 
-         Df Variance ​     F Pr(>​F) ​   ​ 
-RDA1      1  0.09780 14.887 ​ 0.001 *** 
-Residual 55  0.36133 ​                 ​ 
---- 
-Signif. codes: ​ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
-</​code>​ 
- 
-Or, we may calculate the significance of each constrained axis independently. This can be done by adding argument ''​by = "​axis"''​ (by default, the argument ''​by''​ is set to ''​NULL'',​ see the helpfile ''?​anova.cca''​). The calculation takes rather long (there are 14 constrained axes, each tested separately);​ you may speed it up using parallel calculation if your computer has more than one core (use the argument ''​parallel''​ with the number of available cores, four in the case of my computer). Also, we will directly store the results into an object (''​test_axis''​) so as we can call it later for further processing: ​ 
-<code rsplus> 
-test_axis <- anova (rda.vasc, by = '​axis',​ parallel = 4) 
-</​code>​ 
-<​code>​ 
-Permutation test for rda under reduced model 
-Forward tests for axes 
-Permutation:​ free 
-Number of permutations:​ 999 
- 
-Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) 
-         Df Variance ​      F Pr(>​F) ​   ​ 
-RDA1      1  0.09780 14.8869 ​ 0.001 *** 
-RDA2      1  0.02165 ​ 3.2950 ​ 0.008 **  
-RDA3      1  0.01501 ​ 2.2844 ​ 0.213    ​ 
-RDA4      1  0.01077 ​ 1.6399 ​ 0.930    ​ 
-RDA5      1  0.01060 ​ 1.6137 ​ 0.913    ​ 
-RDA6      1  0.00927 ​ 1.4113 ​ 0.975    ​ 
-RDA7      1  0.00820 ​ 1.2481 ​ 0.997    ​ 
-RDA8      1  0.00615 ​ 0.9362 ​ 1.000    ​ 
-RDA9      1  0.00591 ​ 0.8993 ​ 1.000    ​ 
-RDA10     ​1 ​ 0.00576 ​ 0.8761 ​ 1.000    ​ 
-RDA11     ​1 ​ 0.00441 ​ 0.6707 ​ 1.000    ​ 
-RDA12     ​1 ​ 0.00406 ​ 0.6173 ​ 1.000    ​ 
-RDA13     ​1 ​ 0.00338 ​ 0.5150 ​ 1.000    ​ 
-RDA14     ​1 ​ 0.00312 ​ 0.4754 ​ 1.000    ​ 
-Residual 55  0.36133 ​                   
---- 
-Signif. codes: ​ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
-</​code>​ 
-Our impression from above was right - here, only first two axes are significant,​ second not so strongly as the first one. In fact, the test above is running 14 independent tests of significance,​ the number high enough to have a problem with multiple testing issue (more tests = more significant results). To correct for multiple testing, consider applying some correction (e.g. Bonferroni, or, as in this case, Holm's correction). Significances are stored in the component ''​Pr(>​F)''​ of ''​test_axis''​ object (a list), and the function conducting correction is ''​p.adjust'':​ 
-<code rsplus> 
-test_axis.adj <- test_axis 
-test_axis.adj$`Pr(>​F)` <- p.adjust (test_axis$`Pr(>​F)`,​ method = '​holm'​) 
-test_axis.adj 
-</​code>​ 
-<​code>​ 
-Permutation test for rda under reduced model 
-Forward tests for axes 
-Permutation:​ free 
-Number of permutations:​ 999 
- 
-Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) 
-         Df Variance ​      F Pr(>​F)  ​ 
-RDA1      1  0.09780 14.8869 ​ 0.014 * 
-RDA2      1  0.02165 ​ 3.2950 ​ 0.052 . 
-RDA3      1  0.01501 ​ 2.2844 ​ 1.000  ​ 
-RDA4      1  0.01077 ​ 1.6399 ​ 1.000  ​ 
-RDA5      1  0.01060 ​ 1.6137 ​ 1.000  ​ 
-RDA6      1  0.00927 ​ 1.4113 ​ 1.000  ​ 
-RDA7      1  0.00820 ​ 1.2481 ​ 1.000  ​ 
-RDA8      1  0.00615 ​ 0.9362 ​ 1.000  ​ 
-RDA9      1  0.00591 ​ 0.8993 ​ 1.000  ​ 
-RDA10     ​1 ​ 0.00576 ​ 0.8761 ​ 1.000  ​ 
-RDA11     ​1 ​ 0.00441 ​ 0.6707 ​ 1.000  ​ 
-RDA12     ​1 ​ 0.00406 ​ 0.6173 ​ 1.000  ​ 
-RDA13     ​1 ​ 0.00338 ​ 0.5150 ​ 1.000  ​ 
-RDA14     ​1 ​ 0.00312 ​ 0.4754 ​ 1.000  ​ 
-Residual 55  0.36133 ​                 
---- 
-Signif. codes: ​ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
-</​code>​ 
-After adjusting the P-values for multiple testing issue, the second axis became only marginally significant (note that your P-values may slightly differ, since this is permutation test, each time running with different set of permuted environmental variables). 
- 
-Other testing option is to test each term (constraining variable) separately in the sequence of adding them into the model - this is done adding the argument ''​by = "​terms"''​. It sequentially adds each variable one by one in order in which they enter the model formula, and for each calculates partial explained variation and its significance with previous variables as covariables (i.e. for the first variable it is marginal variation explained by this variable and its significance,​ for the second variable it is partial variation explained by the second variable after removing the variation of the first variable as covariable, etc.). Since variables in the dataset are ordered without exact meaning, this option is not useful here. More meaningful is the last option, ''​by = "​margin"'',​ testing the variance explained by each explanatory variable with all the others used as covariables (independently from their order in the model): 
-<code rsplus> 
-test_margin <- anova (rda.vasc, by = '​margin',​ parallel = 4) 
-</​code>​ 
-<​code>​ 
-Permutation test for rda under reduced model 
-Marginal effects of terms 
-Permutation:​ free 
-Number of permutations:​ 999 
- 
-Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) 
-         Df Variance ​     F Pr(>​F) ​   
-Ca        1  0.01460 2.2228 ​ 0.007 ** 
-Mg        1  0.00973 1.4816 ​ 0.084 .  
-Fe        1  0.00729 1.1096 ​ 0.258    
-K         ​1 ​ 0.00668 1.0162 ​ 0.375    
-Na        1  0.00565 0.8603 ​ 0.651    
-Si        1  0.01237 1.8828 ​ 0.026 *  
-SO4       ​1 ​ 0.00739 1.1246 ​ 0.291    
-PO4       ​1 ​ 0.00467 0.7108 ​ 0.872    
-NO3       ​1 ​ 0.00908 1.3828 ​ 0.108    
-NH3       ​1 ​ 0.01066 1.6231 ​ 0.042 *  
-Cl        1  0.00584 0.8887 ​ 0.617    
-Corg      1  0.00781 1.1881 ​ 0.217    
-pH        1  0.00850 1.2938 ​ 0.146    
-conduct ​  ​1 ​ 0.00959 1.4604 ​ 0.078 .  
-Residual 55  0.36133 ​                 
---- 
-Signif. codes: ​ 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
-</​code>​ 
-If we correct for multiple testing (not done here), we will see that only calcium (''​Ca''​) remains marginally significant,​ i.e. it can explain some important extra variance which other variables cannot. ​ 
- 
-And if we want to know the variance each variable can explain independently from others (i.e. the variance each variable will explain if it is the only variable in the model, without any covariables)?​ What I do below is a bit complex R coding exercise and I won't explain it in detail, but what I did is to make a loop (using ''​apply''​) which will take each of the 14 variables one by one, calculate tb-RDA on it using ''​vasc.hell''​ data, test it's significance,​ return all values in one summary table, correct them for the multiple-testing issue by Holm's correction, and order the table by decreasing value of variance explained by each variable: 
- 
-<code rsplus> 
-test_each <- t(apply (chem, 2, FUN = function (x) as.matrix (anova (rda (vasc.hell ~ x))[1:​4][1,​]))) 
-test_each <- as.data.frame (test_each) 
-names (test_each) <- c("​Df",​ "​Variance",​ "​F",​ "​Pr(>​F)"​) 
-test_each.adj <- test_each 
-test_each.adj$`Pr(>​F)` <- p.adjust (test_each$`Pr(>​F)`,​ method = '​holm'​) 
-test_each.adj[order (test_each.adj$Variance,​ decreasing = TRUE),] 
-</​code>​ 
-<​code>​ 
-        Df    Variance ​         F Pr(>F) 
-Ca       1 0.078859882 10.9763368 ​ 0.014 
-conduct ​ 1 0.069408621 ​ 9.4774885 ​ 0.014 
-Mg       1 0.068014761 ​ 9.2612406 ​ 0.014 
-Corg     1 0.050866878 ​ 6.6963607 ​ 0.014 
-pH       1 0.049697951 ​ 6.5277054 ​ 0.014 
-Na       1 0.032285586 ​ 4.1026474 ​ 0.014 
-NH3      1 0.026047765 ​ 3.2718455 ​ 0.016 
-Si       1 0.021008459 ​ 2.6145234 ​ 0.016 
-SO4      1 0.018808570 ​ 2.3313586 ​ 0.036 
-K        1 0.017124964 ​ 2.1161781 ​ 0.040 
-NO3      1 0.012759735 ​ 1.5643455 ​ 0.236 
-Fe       1 0.010942632 ​ 1.3371875 ​ 0.384 
-PO4      1 0.009863593 ​ 1.2029965 ​ 0.384 
-Cl       1 0.006910560 ​ 0.8383942 ​ 0.736 
-</​code>​ 
-Some variables (at the bottom of the table) do not explain significant part of variance, some do (with calcium, ''​Ca'',​ explaining the most). What I did is in fact the first step of forward selection, which aims to select the subset of variables that explain the most variance in the model (in the first step the variables are sorted by explained variance, the first of them is tested, and if significant,​ it is included as the first selected variables into the model). This will be the topic for [[en:​forward_sel_examples|Example 1 in Forward selection section]]. 
  
  
en/monte_carlo_examples.txt · Last modified: 2019/02/10 16:04 by David Zelený