User Tools

Site Tools


en:pcoa_nmds

This is an old revision of the document!


Unconstrained ordination

PCoA (Principal Coordinate Analysis)

This method is also known as MDS (Metric Multidimensional Scaling). While PCA preserves Euclidean distances among samples and CA chi-square distances, PCoA provides Euclidean representation of a set of objects whose relationship is measured by any similarity or distance measure chosen by the user. As well as PCA and CA, PCoA returns a set of orthogonal axes whose importance is measured by eigenvalues. This means that calculating PCoA on Euclidean distances among samples yields the same results as PCA calculated on covariance matrix of the same dataset (if scaling 1 is used).

R functions

  • cmdscale (library vegan) - calculates PCoA on matrix of distances among samples (this could be calculated e.g. by function vegdist from library vegan). Use function ordiplot to project the ordination diagram.
  • pcoa (library ape) - another way how to achieve PCoA analysis. Use biplot.pcoa function to project ordination diagram.

NMDS (Non-metric Multidimensional Scaling)

Non-metric alternative to PCoA analysis - it can use any distance measure among samples, and the main focus is on projecting the relative position of sample points into low dimensional ordination space (two or three axes). The method is distance based, not eigenvalue based - it means that it does not attempt to maximize the variance preserved by particular ordination axes and resulting projection could therefore be rotated in any direction.

The algorithm goes like this (simplified):

  1. Specify the number of dimensions m you want to use (into which you want to scale down the distribution of samples in multidimensional space - that's why it's scaling).
  2. Construct initial configuration of all samples in m dimensions as a starting point of iterative process. The result of the whole iteration procedure may depend on this step, so it's somehow crucial - the initial configuration could be generated by random, but better way is to help it a bit, e.g. by using PCoA ordination as a starting position.
  3. An iterative procedure tries to reshuffle the objects in given number of dimension in such a way that the real distances among objects reflects best their compositional dissimilarity. Fit between these two parameters is expressed as so called stress value - the lower stress value the better.
  4. Algorithm stops when new iteration cannot lower the stress value - the solution has been reached.
  5. After the algorithm is finished, the final solution is rotated using PCA to ease its interpretation (that's why final ordination diagram has ordination axes, even if original algorithm doesn't produce any).

R functions

  • metaMDS (library vegan) - rather advanced function, composed of many subroutine steps. See example below for details.
  • stressplot (library vegan) - draws Shepards stress plot, which is the relationship between real distances between samples in resulting m dimensional ordination solution, and their particular compositional dissimilarities expressed by selected dissimilarity measure.
  • goodness (library vegan) - returns goodness-of-fit of particular samples. See example how can be this result visualized (inspired by Borcard et al. 2011).

Example of use

NMDS of river valley data

vltava.spe <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-spe.txt', row.names = 1)
NMDS <- metaMDS (vltava.spe)
Square root transformation
Wisconsin double standardization
Run 0 stress 0.2022791 
Run 1 stress 0.2193042 
Run 2 stress 0.2130607 
Run 3 stress 0.208742 
Run 4 stress 0.2022791 
... procrustes: rmse 9.278716e-06  max resid 3.31574e-05 
*** Solution reached
NMDS
Call:
metaMDS(comm = vltava.spe) 

global Multidimensional Scaling using monoMDS

Data:     wisconsin(sqrt(vltava.spe)) 
Distance: bray 

Dimensions: 2 
Stress:     0.2022791 
Stress type 1, weak ties
Two convergent solutions found after 4 tries
Scaling: centring, PC rotation, halfchange scaling 
Species: expanded scores based on ‘wisconsin(sqrt(vltava.spe))’ 

If the default setting of metaMDS function is used, the data are automatically (if necessary) transformed (in this case, combination of wisconsin and sqrt transformation was used). In this case, stress value is 20.2.

To draw the result, use the function ordiplot. In this case, using type = 't' will add text labels (default setting adds only points):

ordiplot (NMDS, type = 't')

par (mfrow = c(1,2)) # this function divides plotting window into two columns
stressplot (NMDS)
plot (NMDS, display = 'sites', type = 't', main = 'Goodness of fit') # this function draws NMDS ordination diagram with sites
points (NMDS, display = 'sites', cex = goodness (NMDS)*200) # and this adds the points with size reflecting goodness of fit (bigger = worse fit)


Exercise 1

Use data from the variable eurodist, which is available in R (you don't need to install any library, just type eurodist). This variable contains real geographical distances among big European cities (in km).

  1. Using this distance matrix, calculate PCoA analysis and draw the PCoA ordination diagram - result will look somehow like a map of Europe.
  2. Draw also screeplot of eigenvalues for individual PCoA axes.

For hints click here ☛

For hints click here ☛

  1. cmdscale, ordiplot with argument type = 't'. To make the illusion perfect, you will perhaps need to rotate scores on the second ordination axis (the vertical one) to put Stockholm at the north and Rome at the south (you need to multiply these scores by -1).
  2. barplot. You need to calculate these eigenvalues - in cmdscale, use the argument eig = TRUE, and extract the resulting eigenvalues in the resulting object (list) as $eig.

Solution

Exercise 2

Use data about confusion of different Morse codes, originating from Rothkopf's experiment with Morse codes. This is a classical data set, used by Shepard (1962)1) to demonstrate the use of NMDS analysis.

  1. After importing the dataset into R, the column names contain letters and row names contain Morse codes - this needs to be unified, so as column names also contain Morse codes. In R, you will need to copy row names into column names.
  2. Use the distance matrix between the Morse codes (so called confusion matrix, each number represents the number of cases, when respondents consider given pair of codes as being different) to calculate NMDS analysis.
  3. What is the stress value of the resulting analysis?
  4. Draw ordination diagram and Shepard diagram.

For hints click here ☛

For hints click here ☛

  1. colnames or names, rownames
  2. metaMDS from vegan
  3. check the results of metaMDS
  4. ordiplot, stressplot

Solution

Exercise 3

Check the example Betadiversity of coral reefs after disturbance to apply NMDS analysis on community data from coral reefs.

1)
Shepard, R. N. (1962): The Analysis of Proximities: Multidimensional Scaling with an Unknown Distance Function, I and II. Psychometrika, 27: 125-139 and 219-246.
en/pcoa_nmds.1436902363.txt.gz · Last modified: 2017/10/11 20:36 (external edit)