Theory, Examples & Exercises
This is an old revision of the document!
In this example, we will apply constrained ordination on Vltava river valley dataset. We will ask how much variance in species composition can be explained by two variables, soil pH and soil depth. Both are important factors for plant growth, and moreover, in the study area, they are somewhat correlated (shallower soils have lower pH since the prevailing geological substrate is acid).
First, upload the Vltava river valley data:
vltava.spe <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-spe.txt', row.names = 1) vltava.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-env.txt') spe <- vltava.spe # rename variables to make them shorter env <- vltava.env[, c('pH', 'SOILDPT')] # select only two explanatory variables
vegan and calculate tb-RDA based on Hellinger pre-transformed species composition data. Note that since the original data represent estimates of percentage cover, it is better to log transform these values first before Hellinger transformation is done:
library (vegan) spe.log <- log1p (spe) # species data are in percentage scale which is strongly rightskewed, better to transform them spe.hell <- decostand (spe.log, 'hell') # we are planning to do tb-RDA, this is Hellinger pre-transformation tbRDA <- rda (spe.hell ~ pH + SOILDPT, data = env) # calculate tb-RDA with two explanatory variables tbRDA
Call: rda(formula = spe.hell ~ pH + SOILDPT, data = env) Inertia Proportion Rank Total 0.70476 1.00000 Constrained 0.06250 0.08869 2 Unconstrained 0.64226 0.91131 94 Inertia is variance Eigenvalues for constrained axes: RDA1 RDA2 0.04023 0.02227 Eigenvalues for unconstrained axes: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 0.07321 0.04857 0.04074 0.03144 0.02604 0.02152 0.01917 0.01715 (Showed only 8 of all 94 unconstrained eigenvalues)
The two variables explain 8.9% of the variance (the row
Constrained and column
Proportion in the table above, can be calculated also as the sum of eigenvalues for the constrained axes divided by total variance (inertia): (0.04023+0.02227) /0.70476=0.08868. The first constrained axis (RDA1) explains 0.04023/0.70476=5.7% of the variance, while the second (RDA2) explains 0.02227/0.70476=3.2%. Note that the first unconstrained axis (PC1) represents 0.07321/0.70476=10.4% of total variance, which is more than both explanatory variables together; the first two unconstrained explain (0.07321+0.04857)/0.70476=17.3%. This means that the dataset may be structured by some strong environmental variable(s) different from pH and soil depth (we will check this below).
Let's see the ordination diagram:
What may be those environmental variables associated with unconstrained axes? The
vltava.env dataset contains a number of other measured variables which we may fit as supplementary to the first and second unconstrained axis to see which of them is most related to which of them. But here we will do an alternative thing: we will use mean Ellenberg indicator values (mEIV) calculated for each plot based on the species composition and tabulated Ellenberg species indicator values (ecological optima of species along several main environmental gradients). This approach will illustrate the situation as if in the field we measured only soil pH and depth (relatively easily obtained variables), and we use these indirect estimates to get an idea about which other factors may be important.
ordiplot (tbRDA, choices = c(3,4), type = 'n') points (tbRDA, choices = c(3,4), display = 'sites', pch = as.character (vltava.env$GROUP), col = vltava.env$GROUP) ef <- envfit (tbRDA, vltava.env[,23:28], choices = c(3,4), permutations = 0) plot (ef)
***VECTORS PC1 PC2 r2 LIGHT -0.93135 -0.36411 0.6282 TEMP -0.97246 -0.23305 0.2352 CONT -0.86643 -0.49929 0.0885 MOIST 0.44495 -0.89556 0.4706 REACT 0.95614 -0.29291 0.1166 NUTR 0.94383 -0.33044 0.4519
The highest R2 of regression with the first two axes have light and moisture, with light associated mostly with the first axis and the moisture mostly with the second. Seems that these two ecological factors, not related to soil pH and soil depth, are important for vegetation but not measured; light passing through the canopy of the forest has strong effect on the species composition of the herb understory (herbs makes most of the species in this analysis, since temperate forest is rather poor for woody species), and moisture also (flooded alluvial forests at the bottom parts of the valley, veg. type 2, have very different species composition from dry open forests on the upper parts of the valley slopes).
Note one more thing: when applying the function
envfit on mean Ellenberg indicator values, I did not test for the significance (I set the argument
permutation = 0). This has a meaning: both mean Ellenberg indicator values and sample scores on ordination axes are calculated from the same matrix of species composition, and to directly test their relationship would be wrong (they are not independent, and we get high probability to get significant result even if the species Ellenberg indicator values are randomly generated). Check the section Analysis of species attributes (e.g. traits or species indicator values) for detail explanation on how to solve this .