User Tools

Site Tools


en:expl_var_examples

This is an old revision of the document!


Section: Ordination analysis

Explained variation and Monte Carlo permutation test (constrained ordination)

Example 1: Is the explained variance high enough to be considered as interesting?

This is direct continuation of Example 1 in tb-RDA section. We used Vltava river valley dataset to calculate tb-RDA (on log-transformed and then Hellinger transformed species composition data) with two measured variables (pH and SOILDPT) as explanatory. They explained 8.9% of the variance, and we may ask: if variables explain this amount of variance (which may seem rather low, less than 10%), is it enough to report this result?

To decide, we need to do two more things. One is to test whether the analysis is significant, meaning that this amount of variance is high enough compared to the variance generated (in average) by two random variables (this is the topic for Example 2 below). The other thing is to compare the value of explained variation (R2 = 8.9%) with the variation which would be explained by two best variables, i.e. variables which represent the strongest gradient along which the species composition is changing. As you can see in the theoretical part, we can create such variable for our dataset by applying first unconstrained ordination on our data and extract the first few unconstrained ordination axes (i.e. sample scores along them). These axes represent the main directions along which species composition changes the most rapidly, which is exactly what we need. Then we can use these axes as explanatory in the constrained version of the same ordination (here tb-RDA) to see how much variance they can explain. The number of axes I should take depends on the number of real variables I want to compare the variance with (in this case two).

Let's get data in and calculate tb-RDA on two explanatory variables (this is essentially repeating steps in Example 1 from tb-RDA:

vltava.spe <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-spe.txt', row.names = 1)
vltava.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-env.txt')
spe <- vltava.spe
env <- vltava.env[, c('pH', 'SOILDPT')]
library (vegan)
spe.log <- log1p (spe)  # species data are in percentage scale which is strongly rightskewed, better to transform them
spe.hell <- decostand (spe.log, 'hell')  # we are planning to do tb-RDA, this is Hellinger pre-transformation
tbRDA <- rda (spe.hell ~ pH + SOILDPT, data = env)  # calculate tb-RDA with two explanatory variables

I can use the function RsquareAd from vegan to extract the explained variance directly from the ordination object:

R2.obs <- RsquareAdj (tbRDA)$r.squared
R2.obs
[1] 0.08868879

Result is 0.089 = 8.9%. Note that the function RsquareAdj returns a list with two components, r.squared and adj.r.squared, the first referring to the R2 we are interested now, and the second to adjusted R2, i.e. R2 corrected for the number of samples and number of explanatory variables (we will use this later).

Now we need to calculate the unconstrained version of tb-RDA on the same data and extract the first two ordination axes. The unconstrained version of tb-RDA is tb-PCA (i.e. PCA calculated on pre-transform data, where the pre-transformation should be the same in both analyses):

tbPCA <- rda (spe.hell)
tbPCA
Call: rda(X = spe.hell)

              Inertia Rank
Total          0.7048     
Unconstrained  0.7048   96
Inertia is variance 

Eigenvalues for unconstrained axes:
    PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8 
0.09197 0.06075 0.04684 0.03537 0.02650 0.02361 0.02093 0.02035 
(Showed only 8 of all 96 unconstrained eigenvalues)

For comparison, let's also include the summary of tb-RDA method:

tbRDA
Call: rda(formula = spe.hell ~ pH + SOILDPT, data = env)

              Inertia Proportion Rank
Total         0.70476    1.00000     
Constrained   0.06250    0.08869    2
Unconstrained 0.64226    0.91131   94
Inertia is variance 

Eigenvalues for constrained axes:
   RDA1    RDA2 
0.04023 0.02227 

Eigenvalues for unconstrained axes:
    PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8 
0.07321 0.04857 0.04074 0.03144 0.02604 0.02152 0.01917 0.01715 
(Showed only 8 of all 94 unconstrained eigenvalues)

You can see that total inertia (variance of the whole dataset) is identical in both analyses (0.70476). This is the variance we aim to explain.

Extract the two PCA axes from tbPCA object and use them as explanatory variables in tbRDA on the same data to see how much variance they explain:

PCA12 <- scores (tbPCA, display = 'sites', choices = 1:2)
tbRDA_PCA12 <- rda (spe.hell ~ PCA12)
RsquareAdj (tbRDA_PCA12)$r.squared
[1] 0.2167075

Two first tb-PCA axes, if used as explanatory in the tb-RDA on the same dataset, explain 21.7% of the variance. Note that this is the same number we would get by simply checking the amount of variance represented by the first two ordination axes in tb-PCA in the summary above: eigenvalue of PCA1 = 0.09197, eigenvalue of PCA1 = 0.06075, total variance (inertia) = 0.70476, recalculated into percentage: (0.09197+0.06075)/0.70476 = 21.7%. In fact, we don't need to really to the whole procedure (extract tb-PCA axes and use them as explanatory in tb-RDA), we can simply check the variance of n axes in unconstrained ordination (n = number of explanatory variables) and compare it with real variance explained by environmental variables.

The comparison here is: 8.9% explained by real variables (pH and SOILDPT) vs 21.7% which would be explained by two best, not correlated variables (if we had them). We see that measured variables explain something over 40% of variation they could (8.9/21.7 = 0.41), which is not bad. Remember important difference: PCA axes are (from definition) not correlated, while our real variables often will be (as in this case: cor.test (~ pH + SOILDPT, data = env) shows that Pearson's correlation coefficient between pH and SOILDPT is r = 0.273, and this correlation is significant at P = 0.0069).

The next step is to test whether the variation explained by our variables is significant - this is a topic for Example 1 in the Permutation test section.

Example 2: Is the variance explained by environmental variables significant?

This example directly follows the Example 1 in RDA & tb-RDA section and Example 1 in Explained variance section, consider checking them first. We used Vltava river valley dataset, and two field-measured environmental variables, soil pH and soil depth (pH and SOILDPT), to explain variance in species composition. We found that they can explain 8.9% of overall variance; when compared to the variance which can be maximally explained by two explanatory variables (21.7% explained by two tb-PCA axes, see here) this sounds not bad (it is more than 40% of variance we can maximally explain in this dataset with two variables). But is the result significant? By significant, I mean: is the variance considerably higher than the variance explained (in average) by two random variables not related to species composition? This is the task for Monte Carlo permutation test (check Theory part to see how it works).

First, get the data and calculate tb-RDA on them (this is esentially repeating beginning of Example 1 in the section Explained variance):

vltava.spe <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-spe.txt', row.names = 1)
vltava.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-env.txt')
spe <- vltava.spe
env <- vltava.env[, c('pH', 'SOILDPT')]
library (vegan)
spe.log <- log1p (spe) 
spe.hell <- decostand (spe.log, 'hell')
tbRDA <- rda (spe.hell ~ pH + SOILDPT, data = env)
R2.obs <- RsquareAdj (tbRDA)$r.squared
R2.obs
[1] 0.08868879

In the next step, calculate variance explained by randomized env. variables

env.rand <- env[sample (1:97),]  # the function "sample" will reshuffle the rows with environmental variabels
tbRDA.rand <- rda (spe.hell ~ pH + SOILDPT, data = env.rand)
RsquareAdj (tbRDA.rand)$r.squared

This value represents the variance explained by two random explanatory variables. My result was 0.01854349, but this value will change in each run. We need to do enough repetitions (permutations) to get an idea about the distribution of this values (null model). We can use the “for” loop for it, or, as in this case, function “replicate”, with two arguments: n = number of replicates, and expr = expression to be replicated (if more lines of script are involved, this expression needs to be enclosed in curly brackets {}):

n.perm <- 99  # set the number of permutations
R2.rand <- replicate (n = n.perm, expr = {
  env.rand <- env[sample (1:97),]
  tbRDA.rand <- rda (spe.hell ~ pH + SOILDPT, data = env.rand)
  RsquareAdj (tbRDA.rand)$r.squared
  })
<code>
The vector ''R2.rand'' contains 99 values of variance explained by random variables. In the next step, we will merge them with the observed R2 (''R2.obs''), since this methods/html/is.html">is considered to be also the part of null distribution (and this methods/html/is.html">is also the reason why the numbers of permutation are usually ending with 9):
<code rsplus>
R2 <- c (R2.rand, R2.obs)

Draw the histogram of values and highlight the observed R2 there:

hist (R2, nclass = 100)  # ; argument "nclass" separates the bars into 100 categories (compare with hist without this argument)
abline (v = R2.obs, col = 'red')  # red line to indicate where in the histogram is the observed value

Already now we can see that our observed value (0.089, red vertical line) is far higher than all values generated by randomized env. variables (range (R2.rand) shows that these values are between 0.014 and 0.033, but again, this range will change in another run of the analysis, and will likely increase if the number of permutations increases, since there will be higher chance to get values more different from the mean).

To calculate the significance, we need to know what is the number of cases in which the variance explained by random explanatory variables was higher or equal to the observed one (explained by real variables). Remember that R.obs is the part of our null distribution (we added it there), so we always obtain at least one value in such comparison (the R.obs itself).

P <- sum (R2 >= R2.obs)/(n.perm + 1)  # 0.01

You can see that our observed R2 (R2.obs) is far higher than any R2 generated by a random variable, so the calculation of P-value contains only one value equal or higher than observed R2 (the observed R2 itself). The resulting P-value is 0.01, which is the lowest P-value we can get with Monte Carlo permutation test based on 99 permutations:

1/(99 + 1)  # 0.01

If we are happy with rejecting the null hypothesis at the alpha level of 5% (P < 0.05), than this could be sufficient. But if we need to get lower P-value (e.g. because we are going to correct them for multiple testing, as in the case of forward selection later), we need to increase the number of permutations. As we saw in the theoretical part, the relationship between the number of permutations and the minimal P-value we can obtain is Pmin = 1/(number_of_permutations + 1). If you hope to be able reject the null hypothesis at P < 0.001, you need at least 1/0.001-1 = 999 permutations (in that case the lowest P value will be 0.001) or better more (e.g. 1499 permutations, with the lowest P-value 1/(1499+1) = 0.00067).

In vegan, the test of the significance for constrained ordination is done by function anova (this may be a bit confusing name, since it is not really calculating ANOVA)

anova (tbRDA, permutations = 99)
Permutation test for rda under reduced model
Permutation: free
Number of permutations: 99

Model: rda(formula = spe.hell ~ pH + SOILDPT, data = env)
         Df Variance     F Pr(>F)   
Model     2  0.06250 4.574   0.01 **
Residual 94  0.64226                
---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The P-value is the same (P = 0.01) as in our manual Monte-Carlo permutation test above. Still, 99 permutations is really low number; consider using at least the default 999 permutations to get P-values close to 0.001.

Example 3: Effect of environmental variables on species composition of plants in Carpathian wetlands tested by tb-RDA

The dataset Carpathian spring fen meadows consists of three data matrices - species matrix with bryophytes, species matrix with vascular plants and environmental matrix with chemical variables and slope. We ignore bryophytes here, and focus on relationship between vascular plants (vasc) and chemical variables (chem).

# Carpathian wetlands - import data
vasc <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vasc_plants.txt', row.names = 1)
chem <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/chemistry.txt', row.names = 1)
 

For the purpose of tb-RDA, we first pre-transform the species data using Hellinger's transformation (tb-RDA), and remove slope from the matrix of evironmental variabels (slope of the site is not a chemical variable):

library (vegan)  # if you haven't use it up to now
vasc.hell <- decostand (vasc, 'hell')  # pre-transform species data by Hellinger transformation
chem <- chem[, -15]  # remove slope
rda.vasc <- rda (vasc.hell ~ ., chem)

To see the output, we can either directly print the rda.vasc object, or apply the function summary on it to get more extended information (it prints also all the scores, which may get quite long, so wrap it into header function to shorten those tables into only first six rows):

head (summary (rda.vasc))
Call:
rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 +      PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) 

Partitioning of variance:
              Inertia Proportion
Total          0.5674     1.0000
Constrained    0.2061     0.3632
Unconstrained  0.3613     0.6368

Eigenvalues, and their contribution to the variance 

Importance of components:
                        RDA1    RDA2    RDA3    RDA4    RDA5     RDA6     RDA7    RDA8     RDA9    RDA10    RDA11    RDA12    RDA13    RDA14     PC1     PC2     PC3
Eigenvalue            0.0978 0.02165 0.01501 0.01077 0.01060 0.009272 0.008199 0.00615 0.005908 0.005756 0.004406 0.004055 0.003383 0.003123 0.03260 0.03204 0.02084
Proportion Explained  0.1724 0.03815 0.02645 0.01899 0.01868 0.016340 0.014450 0.01084 0.010410 0.010140 0.007770 0.007150 0.005960 0.005500 0.05746 0.05647 0.03672
Cumulative Proportion 0.1724 0.21051 0.23696 0.25595 0.27463 0.290970 0.305420 0.31626 0.326680 0.336820 0.344590 0.351730 0.357700 0.363200 0.42066 0.47713 0.51386
                          PC4     PC5     PC6     PC7     PC8     PC9    PC10    PC11     PC12     PC13     PC14     PC15     PC16    PC17     PC18     PC19     PC20
Eigenvalue            0.01711 0.01638 0.01426 0.01370 0.01295 0.01166 0.01106 0.01057 0.009716 0.008781 0.008481 0.008224 0.007753 0.00759 0.007313 0.007163 0.006564
Proportion Explained  0.03015 0.02887 0.02513 0.02415 0.02282 0.02055 0.01950 0.01862 0.017120 0.015480 0.014950 0.014490 0.013660 0.01338 0.012890 0.012620 0.011570
Cumulative Proportion 0.54401 0.57287 0.59800 0.62215 0.64497 0.66552 0.68502 0.70364 0.720760 0.736240 0.751180 0.765680 0.779340 0.79272 0.805610 0.818230 0.829800
                          PC21     PC22     PC23    PC24     PC25     PC26    PC27     PC28    PC29     PC30     PC31     PC32     PC33     PC34     PC35    PC36
Eigenvalue            0.006371 0.005808 0.005689 0.00547 0.005307 0.004855 0.00475 0.004443 0.00403 0.003576 0.003554 0.003397 0.003139 0.002931 0.002827 0.00276
Proportion Explained  0.011230 0.010240 0.010030 0.00964 0.009350 0.008560 0.00837 0.007830 0.00710 0.006300 0.006260 0.005990 0.005530 0.005160 0.004980 0.00486
Cumulative Proportion 0.841030 0.851260 0.861290 0.87093 0.880280 0.888840 0.89721 0.905040 0.91215 0.918450 0.924710 0.930700 0.936230 0.941400 0.946380 0.95124
                          PC37     PC38     PC39     PC40     PC41     PC42     PC43    PC44     PC45     PC46     PC47     PC48    PC49    PC50      PC51      PC52
Eigenvalue            0.002552 0.002361 0.002284 0.002046 0.001903 0.001807 0.001691 0.00157 0.001559 0.001434 0.001328 0.001304 0.00110 0.00103 0.0009236 0.0008693
Proportion Explained  0.004500 0.004160 0.004030 0.003610 0.003350 0.003180 0.002980 0.00277 0.002750 0.002530 0.002340 0.002300 0.00194 0.00182 0.0016300 0.0015300
Cumulative Proportion 0.955740 0.959900 0.963930 0.967530 0.970890 0.974070 0.977050 0.97982 0.982570 0.985090 0.987430 0.989730 0.99167 0.99349 0.9951100 0.9966500
                           PC53      PC54      PC55
Eigenvalue            0.0007434 0.0006739 0.0004857
Proportion Explained  0.0013100 0.0011900 0.0008600
Cumulative Proportion 0.9979600 0.9991400 1.0000000

Accumulated constrained eigenvalues
Importance of components:
                        RDA1    RDA2    RDA3    RDA4    RDA5     RDA6     RDA7    RDA8     RDA9    RDA10    RDA11    RDA12    RDA13    RDA14
Eigenvalue            0.0978 0.02165 0.01501 0.01077 0.01060 0.009272 0.008199 0.00615 0.005908 0.005756 0.004406 0.004055 0.003383 0.003123
Proportion Explained  0.4746 0.10504 0.07282 0.05228 0.05144 0.044990 0.039790 0.02984 0.028670 0.027930 0.021380 0.019680 0.016420 0.015150
Cumulative Proportion 0.4746 0.57961 0.65243 0.70471 0.75615 0.801140 0.840920 0.87077 0.899440 0.927370 0.948750 0.968430 0.984850 1.000000

Scaling 2 for species and site scores
* Species are scaled proportional to eigenvalues
* Sites are unscaled: weighted dispersion equal on all dimensions
* General scaling constant of scores:  2.501418 


Species scores

               RDA1       RDA2       RDA3      RDA4      RDA5      RDA6
Acer_sp. -0.0154645 -0.0007631 -0.0075229 -0.008245  0.001732  0.007874
AchiMill  0.0003371  0.0002172  0.0099623 -0.023240 -0.010564  0.017989
AgroCani  0.2989206 -0.0534899 -0.0003443  0.019328 -0.002615 -0.012205
AgroCapi  0.0439209  0.0196134  0.0014274 -0.007778 -0.018888  0.021651
AgroStol -0.1074882 -0.0278701  0.0070877 -0.050120 -0.016408 -0.003469
AjugRept -0.0607449  0.0359534 -0.0898934 -0.005017 -0.001849 -0.002417
....                                                                   


Site scores (weighted sums of species scores)

             RDA1     RDA2      RDA3    RDA4     RDA5     RDA6
S1       -0.13618  0.34012 -0.648298  0.3345 -0.08732  0.29732
S2       -0.02351  0.50122  0.483985 -0.4592 -0.34310  0.01176
S3       -0.36203 -0.29731  0.187990 -0.2157  0.37425  0.24345
S4       -0.45847 -0.75706  0.028869  0.1514 -0.12375  0.24845
S5       -0.34538  0.01186 -0.543172  0.0689 -0.09529 -0.01918
S6       -0.38319 -0.12405  0.008091  0.3563  0.37374  0.10958
....                                                          


Site constraints (linear combinations of constraining variables)

             RDA1     RDA2      RDA3    RDA4     RDA5      RDA6
S1       -0.29599  0.51952 -0.676450  0.3432 -0.11594  0.041619
S2        0.13970  0.17128  0.057787 -0.2379 -0.43640  0.078915
S3       -0.37947 -0.26887 -0.021117  0.1957  0.09659  0.112267
S4       -0.42693 -0.41047  0.057856  0.2340  0.15766  0.017807
S5       -0.32293  0.13820 -0.465399  0.3952  0.13787 -0.020922
S6       -0.09608 -0.05028  0.002395  0.1068  0.12036  0.002839
....                                                           


Biplot scores for constraining variables

            RDA1       RDA2     RDA3     RDA4     RDA5      RDA6
Ca      -0.88402 -0.2568958 -0.10774  0.09646  0.02262 -0.022930
Mg      -0.81588 -0.0021299 -0.22725 -0.33887  0.08347 -0.034787
Fe       0.12061  0.4707166  0.03481  0.12330  0.24032 -0.144807
K       -0.32015  0.3987824 -0.07212  0.02156 -0.19731 -0.273450
Na      -0.51577  0.4008411  0.03295 -0.06688 -0.13851 -0.227552
Si      -0.29676  0.4289340 -0.67314  0.06990 -0.21501 -0.089828
SO4     -0.34191 -0.3049719 -0.25253 -0.19042 -0.24876  0.064491
PO4      0.21537 -0.0008569  0.21286  0.11162  0.12566 -0.044834
NO3      0.15926 -0.4121772 -0.09585  0.23946 -0.48853  0.245962
NH3      0.44013 -0.2471674 -0.22154 -0.02665  0.38679  0.488924
Cl      -0.03537 -0.2772850 -0.08676  0.18063  0.31496 -0.002797
Corg     0.69131 -0.0833783  0.05578  0.28857  0.25577 -0.260926
pH      -0.67211  0.0469155  0.35972  0.16870 -0.03229  0.552316
conduct -0.82805  0.0162761  0.06142  0.17723  0.14450  0.280628

There are 14 environmental variables, which means that we got 14 constrained ordination axes (named RDA1 to RDA14). Unconstrained axes are named as PC axes. The table Proportion Explained with the partitioning of the variance between constrained axes (i.e. environmental variables) and unconstrained axes (i.e. variance not explained by environmental factors) shows that the proportion of the total variance, which is explained by all environmental factors, is 36.3% (0.3632 at the row Constrained and column Proportion). The first constrained axis explains 17.2%, while the second just 3.8% (Importance of components > Proportion Explained, columns RDA1 and RDA2); it seems that those 14 variables represent mostly a single strong environmental gradient represented by the first constrained axis.

What we calculated here is a global model - it includes all available environmental variables. The next question is whether the global model is significant:

anova (rda.vasc)
Permutation test for rda under reduced model
Permutation: free
Number of permutations: 999

Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem)
         Df Variance      F Pr(>F)    
Model    14  0.20608 2.2407  0.001 ***
Residual 55  0.36133                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Indeed it is significant (P = 0.001, which is the lowest P-value we can obtain with the default number of permutations, 999). Alternatively, we may be interested in the significance of only the first constrained axis. This is achieved by adding the argument first = TRUE:

anova (rda.vasc, first = TRUE)
Permutation test for rda under reduced model
Permutation: free
Number of permutations: 999

Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem)
         Df Variance      F Pr(>F)    
RDA1      1  0.09780 14.887  0.001 ***
Residual 55  0.36133                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Or, we may calculate the significance of each constrained axis independently. This can be done by adding argument by = “axis” (by default, the argument by is set to NULL, see the helpfile ?anova.cca). The calculation takes rather long (there are 14 constrained axes, each tested separately); you may speed it up using parallel calculation if your computer has more than one core (use the argument parallel with the number of available cores, four in the case of my computer). Also, we will directly store the results into an object (test_axis) so as we can call it later for further processing:

test_axis <- anova (rda.vasc, by = 'axis', parallel = 4)
Permutation test for rda under reduced model
Forward tests for axes
Permutation: free
Number of permutations: 999

Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem)
         Df Variance       F Pr(>F)    
RDA1      1  0.09780 14.8869  0.001 ***
RDA2      1  0.02165  3.2950  0.008 ** 
RDA3      1  0.01501  2.2844  0.213    
RDA4      1  0.01077  1.6399  0.930    
RDA5      1  0.01060  1.6137  0.913    
RDA6      1  0.00927  1.4113  0.975    
RDA7      1  0.00820  1.2481  0.997    
RDA8      1  0.00615  0.9362  1.000    
RDA9      1  0.00591  0.8993  1.000    
RDA10     1  0.00576  0.8761  1.000    
RDA11     1  0.00441  0.6707  1.000    
RDA12     1  0.00406  0.6173  1.000    
RDA13     1  0.00338  0.5150  1.000    
RDA14     1  0.00312  0.4754  1.000    
Residual 55  0.36133                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Our impression from above was right - here, only first two axes are significant, second not so strongly as the first one. In fact, the test above is running 14 independent tests of significance, the number high enough to have a problem with multiple testing issue (more tests = more significant results). To correct for multiple testing, consider applying some correction (e.g. Bonferroni, or, as in this case, Holm's correction). Significances are stored in the component Pr(>F) of test_axis object (a list), and the function conducting correction is p.adjust:

test_axis.adj <- test_axis
test_axis.adj$`Pr(>F)` <- p.adjust (test_axis$`Pr(>F)`, method = 'holm')
test_axis.adj
Permutation test for rda under reduced model
Forward tests for axes
Permutation: free
Number of permutations: 999

Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem)
         Df Variance       F Pr(>F)  
RDA1      1  0.09780 14.8869  0.014 *
RDA2      1  0.02165  3.2950  0.052 .
RDA3      1  0.01501  2.2844  1.000  
RDA4      1  0.01077  1.6399  1.000  
RDA5      1  0.01060  1.6137  1.000  
RDA6      1  0.00927  1.4113  1.000  
RDA7      1  0.00820  1.2481  1.000  
RDA8      1  0.00615  0.9362  1.000  
RDA9      1  0.00591  0.8993  1.000  
RDA10     1  0.00576  0.8761  1.000  
RDA11     1  0.00441  0.6707  1.000  
RDA12     1  0.00406  0.6173  1.000  
RDA13     1  0.00338  0.5150  1.000  
RDA14     1  0.00312  0.4754  1.000  
Residual 55  0.36133                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

After adjusting the P-values for multiple testing issue, the second axis became only marginally significant (note that your P-values may slightly differ, since this is permutation test, each time running with different set of permuted environmental variables).

Other testing option is to test each term (constraining variable) separately in the sequence of adding them into the model - this is done adding the argument by = “terms”. It sequentially adds each variable one by one in order in which they enter the model formula, and for each calculates partial explained variation and its significance with previous variables as covariables (i.e. for the first variable it is marginal variation explained by this variable and its significance, for the second variable it is partial variation explained by the second variable after removing the variation of the first variable as covariable, etc.). Since variables in the dataset are ordered without exact meaning, this option is not useful here. More meaningful is the last option, by = “margin”, testing the variance explained by each explanatory variable with all the others used as covariables (independently from their order in the model):

test_margin <- anova (rda.vasc, by = 'margin', parallel = 4)
Permutation test for rda under reduced model
Marginal effects of terms
Permutation: free
Number of permutations: 999

Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem)
         Df Variance      F Pr(>F)   
Ca        1  0.01460 2.2228  0.007 **
Mg        1  0.00973 1.4816  0.084 . 
Fe        1  0.00729 1.1096  0.258   
K         1  0.00668 1.0162  0.375   
Na        1  0.00565 0.8603  0.651   
Si        1  0.01237 1.8828  0.026 * 
SO4       1  0.00739 1.1246  0.291   
PO4       1  0.00467 0.7108  0.872   
NO3       1  0.00908 1.3828  0.108   
NH3       1  0.01066 1.6231  0.042 * 
Cl        1  0.00584 0.8887  0.617   
Corg      1  0.00781 1.1881  0.217   
pH        1  0.00850 1.2938  0.146   
conduct   1  0.00959 1.4604  0.078 . 
Residual 55  0.36133                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

If we correct for multiple testing (not done here), we will see that only calcium (Ca) remains marginally significant, i.e. it can explain some important extra variance which other variables cannot.

And if we want to know the variance each variable can explain independently from others (i.e. the variance each variable will explain if it is the only variable in the model, without any covariables)? What I do below is a bit complex R coding exercise and I won't explain it in detail, but what I did is to make a loop (using apply) which will take each of the 14 variables one by one, calculate tb-RDA on it using vasc.hell data, test it's significance, return all values in one summary table, correct them for the multiple-testing issue by Holm's correction, and order the table by decreasing value of variance explained by each variable:

test_each <- t(apply (chem, 2, FUN = function (x) as.matrix (anova (rda (vasc.hell ~ x))[1:4][1,])))
test_each <- as.data.frame (test_each)
names (test_each) <- c("Df", "Variance", "F", "Pr(>F)")
test_each.adj <- test_each
test_each.adj$`Pr(>F)` <- p.adjust (test_each$`Pr(>F)`, method = 'holm')
test_each.adj[order (test_each.adj$Variance, decreasing = TRUE),]
        Df    Variance          F Pr(>F)
Ca       1 0.078859882 10.9763368  0.014
conduct  1 0.069408621  9.4774885  0.014
Mg       1 0.068014761  9.2612406  0.014
Corg     1 0.050866878  6.6963607  0.014
pH       1 0.049697951  6.5277054  0.014
Na       1 0.032285586  4.1026474  0.014
NH3      1 0.026047765  3.2718455  0.016
Si       1 0.021008459  2.6145234  0.016
SO4      1 0.018808570  2.3313586  0.036
K        1 0.017124964  2.1161781  0.040
NO3      1 0.012759735  1.5643455  0.236
Fe       1 0.010942632  1.3371875  0.384
PO4      1 0.009863593  1.2029965  0.384
Cl       1 0.006910560  0.8383942  0.736

Some variables (at the bottom of the table) do not explain significant part of variance, some do (with calcium, Ca, explaining the most). What I did is in fact the first step of forward selection, which aims to select the subset of variables that explain the most variance in the model (in the first step the variables are sorted by explained variance, the first of them is tested, and if significant, it is included as the first selected variables into the model). This will be the topic for Example 1 in Forward selection section.

en/expl_var_examples.1551195351.txt.gz · Last modified: 2019/02/26 23:35 by David Zelený