User Tools

Site Tools


en:rda_cca_examples

This is an old revision of the document!


Constrained ordination

Redundancy Analysis (RDA) & transformation-based version (tb-RDA), Canonical Correspondence Analysis (CCA)

Example 1: Vltava river valley dataset

In this example, we will apply constrained ordination on Vltava river valley dataset. We will ask how much variance in species composition can be explained by two variables, soil pH and soil depth. Both are important factors for plant growth, and moreover, in the study area, they are somewhat correlated (shallower soils have lower pH since the prevailing geological substrate is acid).

First, upload the Vltava river valley data:

vltava.spe <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-spe.txt', row.names = 1)
vltava.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-env.txt')
 
spe <- vltava.spe  # rename variables to make them shorter
env <- vltava.env[, c('pH', 'SOILDPT')]  # select only two explanatory variables

Upload library vegan and calculate tb-RDA based on Hellinger pre-transformed species composition data. Note that since the original data represent estimates of percentage cover, it is better to log transform these values first before Hellinger transformation is done:

library (vegan)
spe.log <- log1p (spe)  # species data are in percentage scale which is strongly rightskewed, better to transform them
spe.hell <- decostand (spe.log, 'hell')  # we are planning to do tb-RDA, this is Hellinger pre-transformation
tbRDA <- rda (spe.hell ~ pH + SOILDPT, data = env)  # calculate tb-RDA with two explanatory variables
tbRDA
Call: rda(formula = spe.hell ~ pH + SOILDPT, data = env)

              Inertia Proportion Rank
Total         0.70476    1.00000     
Constrained   0.06250    0.08869    2
Unconstrained 0.64226    0.91131   94
Inertia is variance 

Eigenvalues for constrained axes:
   RDA1    RDA2 
0.04023 0.02227 

Eigenvalues for unconstrained axes:
    PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8 
0.07321 0.04857 0.04074 0.03144 0.02604 0.02152 0.01917 0.01715 
(Showed only 8 of all 94 unconstrained eigenvalues)

The two variables explain 8.9% of the variance (the row Constrained and column Proportion in the table above, can be calculated also as the sum of eigenvalues for the constrained axes divided by total variance (inertia): (0.04023+0.02227) /0.70476=0.08868. The first constrained axis (RDA1) explains 0.04023/0.70476=5.7% of the variance, while the second (RDA2) explains 0.02227/0.70476=3.2%. Note that the first unconstrained axis (PC1) represents 0.07321/0.70476=10.4% of total variance, which is more than both explanatory variables together; the first two unconstrained explain (0.07321+0.04857)/0.70476=17.3%. This means that the dataset may be structured by some strong environmental variable(s) different from pH and soil depth (we will check this below).

Let's see the ordination diagram:

ordiplot (tbRDA)

What may be those environmental variables associated with unconstrained axes? The vltava.env dataset contains a number of other measured variables which we may fit as supplementary to the first and second unconstrained axis to see which of them is most related to which of them. But here we will do an alternative thing: we will use mean Ellenberg indicator values (mEIV) calculated for each plot based on the species composition and tabulated Ellenberg species indicator values (ecological optima of species along several main environmental gradients). This approach will illustrate the situation as if in the field we measured only soil pH and depth (relatively easily obtained variables), and we use these indirect estimates to get an idea about which other factors may be important.

ordiplot (tbRDA, choices = c(3,4), type = 'n')
points (tbRDA, choices = c(3,4), display = 'sites', pch = as.character (vltava.env$GROUP), col = vltava.env$GROUP)
ef <- envfit (tbRDA, vltava.env[,23:28], choices = c(3,4), permutations = 0)
plot (ef)

ef
***VECTORS

           PC1      PC2     r2
LIGHT -0.93135 -0.36411 0.6282
TEMP  -0.97246 -0.23305 0.2352
CONT  -0.86643 -0.49929 0.0885
MOIST  0.44495 -0.89556 0.4706
REACT  0.95614 -0.29291 0.1166
NUTR   0.94383 -0.33044 0.4519

The highest R2 of regression with the first two axes have light and moisture, with light associated mostly with the first axis and the moisture mostly with the second. Seems that these two ecological factors, not related to soil pH and soil depth, are important for vegetation but not measured; light passing through the canopy of the forest has strong effect on the species composition of the herb understory (herbs makes most of the species in this analysis, since temperate forest is rather poor for woody species), and moisture also (flooded alluvial forests at the bottom parts of the valley, veg. type 2, have very different species composition from dry open forests on the upper parts of the valley slopes).

Note one more thing: when applying the function envfit on mean Ellenberg indicator values, I did not test for the significance (I set the argument permutation = 0). This has a meaning: both mean Ellenberg indicator values and sample scores on ordination axes are calculated from the same matrix of species composition, and to directly test their relationship would be wrong (they are not independent, and we get high probability to get significant result even if the species Ellenberg indicator values are randomly generated). Check the section Analysis of species attributes (e.g. traits or species indicator values) for detail explanation on how to solve this FIXME.

en/rda_cca_examples.1548202151.txt.gz · Last modified: 2019/01/23 08:09 by David Zelený