Trace:

en:pcoa_nmds

This shows you the differences between two versions of the page.

Both sides previous revision Previous revision Next revision | Previous revision | ||

en:pcoa_nmds [2016/06/28 16:29] 127.0.0.1 external edit |
en:pcoa_nmds [2019/02/25 20:56] (current) David Zelený |
||
---|---|---|---|

Line 1: | Line 1: | ||

- | ====== Unconstrained ordination ====== | + | Section: [[en:ordination]] |

- | ===== PCoA (Principal Coordinate Analysis) ===== | + | ===== PCoA & NMDS (distance-based unconstrained ordination) ===== |

- | This method is also known as MDS (Metric Multidimensional Scaling). While PCA preserves Euclidean distances among samples and CA chi-square distances, PCoA provides Euclidean representation of a set of objects whose relationship is measured by any similarity or distance measure chosen by the user. As well as PCA and CA, PCoA returns a set of orthogonal axes whose importance is measured by eigenvalues. This means that calculating PCoA on Euclidean distances among samples yields the same results as PCA calculated on covariance matrix of the same dataset (if scaling 1 is used). | + | |

- | <WRAP left round box 96%> | + | |

- | ==== R functions ==== | + | |

- | * **''cmdscale''** (library ''vegan'') - calculates PCoA on matrix of distances among samples (this could be calculated e.g. by function ''vegdist'' from library ''vegan''). Use function ''ordiplot'' to project the ordination diagram. | + | |

- | * **''pcoa''** (library ''ape'') - another way how to achieve PCoA analysis. Use ''biplot.pcoa'' function to project ordination diagram. | + | |

- | </WRAP> | + | |

+ | [[{|width: 7em; background-color: light; color: firebrick}pcoa_nmds|**Theory**]] | ||

+ | [[{|width: 7em; background-color: white; color: navy}pcoa_nmds_R|R functions]] | ||

+ | [[{|width: 7em; background-color: white; color: navy}pcoa_nmds_examples|Examples]] | ||

+ | [[{|width: 7em; background-color: white; color: navy}pcoa_nmds_exercise|Exercise {{::lock-icon.png?nolink|}}]] | ||

+ | ==== Principal Correspondence Analysis (PCoA) ==== | ||

+ | This method is also known as MDS (Metric Multidimensional Scaling). While PCA preserves Euclidean distances among samples and CA chi-square distances, PCoA provides Euclidean representation of a set of objects whose relationship is measured by any dissimilarity index. As well as PCA and CA, PCoA returns a set of orthogonal axes whose importance is measured by eigenvalues. This means that calculating PCoA on Euclidean distances among samples yields the same results as PCA calculated on the covariance matrix of the same dataset (if scaling 1 is used), and PCoA on Chi-square distances similar results to CA (but not identical, because CA is applying the weights in the calculation). In case of using non-metric (non-Euclidean) distance index, the PCoA may produce axes with negative eigenvalues which cannot be plotted. Solution to this is to either convert the non-metric dissimilarity index to metric one (e.g. Bray-Curtis dissimilarity is non-metric, but after square-root transformation becomes metric) or using specific corrections (Lingoes or Cailliez). Since the PCoA algorithm is based on the matrix of dissimilarities between samples, the species scores are not calculated; however, the species can be projected to the ordination diagram by weighted averaging or correlations, similarly as supplementary environmental variables. | ||

- | ===== NMDS (Non-metric Multidimensional Scaling) ===== | + | ==== Non-metric Multidimensional Scaling (NMDS) ==== |

- | Non-metric alternative to PCoA analysis - it can use any distance measure among samples, and the main focus is on projecting the relative position of sample points into low dimensional ordination space (two or three axes). The method is distance based, not eigenvalue based - it means that it does not attempt to maximize the variance preserved by particular ordination axes and resulting projection could therefore be rotated in any direction. | + | Non-metric Multidimensional Scaling is a non-metric alternative of PCoA analysis. It can use any dissimilarity measure among samples, and the main aim is to locate samples in low dimensional ordination space (two or three axes) so as the Euclidean distances between these samples correspond to the dissimilarities represented by the original dissimilarity index. The method is non-metric, because it does not use the raw dissimilarity values, but converts them into the ranks and use these ranks in the calculation. The algorithm is iterative - it starts from the initial distribution of samples in the ordination space, and by the iterative reshuffling of samples it searches for optimal final distribution. Due to the iterative nature of the algorithm, each run may result in a different solution. |

The algorithm goes like this (simplified): | The algorithm goes like this (simplified): | ||

- | - Specify the number of dimensions //m// you want to use (into which you want to //scale// down the distribution of samples in multidimensional space - that's why it's scaling). | + | - Specify the number of dimensions //m// you want to use (into which you want to //scale// down the distribution of samples in multidimensional space - that's why it's scaling). |

- | - Construct initial configuration of all samples in //m// dimensions as a starting point of iterative process. The result of the whole iteration procedure may depend on this step, so it's somehow crucial - the initial configuration could be generated by random, but better way is to help it a bit, e.g. by using PCoA ordination as a starting position. | + | - Construct initial configuration of all samples in //m// dimensions as a starting point of the iterative process. The result of the whole iteration procedure may depend on this step, so it's somehow crucial - the initial configuration can be generated randomly, but a better way is to help it a bit, e.g. by using as starting positions results of PCoA ordination on the same dissimilarity matrix. |

- | - An iterative procedure tries to reshuffle the objects in given number of dimension in such a way that the real distances among objects reflects best their compositional dissimilarity. Fit between these two parameters is expressed as so called //stress value// - the lower stress value the better. | + | - An iterative procedure tries to reshuffle the objects in a given number of dimension in such a way that the real (Euclidean) distances among samples in the ordination spaces reflect best their compositional dissimilarity measured by used dissimilarity index. The fit between these two parameters is expressed by so-called //stress value// - the lower stress value the better. |

- | - Algorithm stops when new iteration cannot lower the stress value - the solution has been reached. | + | - The algorithm stops when the new iteration cannot lower the stress value - the solution has been reached. |

- | - After the algorithm is finished, the final solution is rotated using PCA to ease its interpretation (that's why final ordination diagram has ordination axes, even if original algorithm doesn't produce any). | + | - After the algorithm is finished, the final solution is rotated using PCA to ease its interpretation (that's why the final ordination diagram has ordination axes, even if the original algorithm doesn't produce any). |

+ | Similarly to PCoA, NMDS solution does not have species scores, which need to be added to the final configuration of samples using weighted averaging. | ||

- | <WRAP left round box 96%> | + | Considering the algorithm, NMDS and PCoA have close to nothing in common. NMDS is an iterative method which may return different solution on re-analysis of the same data, while PCoA has a unique analytical solution. The number of ordination axes (dimensions) in NMDS can be fixed by the user, while in PCoA the number of axes is given by the dataset properties (number of samples). If the initial configuration of samples in NMDS algorithm is produced by PCoA on the same matrix, then the iterative NMDS algorithm may be seen as a method how to further optimize the sample distribution so as more variation in species composition is represented by fewer ordination axes. |

- | ==== R functions ==== | + | |

- | * **''metaMDS''** (library ''vegan'') - rather advanced function, composed of many subroutine steps. See example below for details. | + | |

- | * **''stressplot''** (library ''vegan'') - draws Shepards stress plot, which is the relationship between real distances between samples in resulting //m// dimensional ordination solution, and their particular compositional dissimilarities expressed by selected dissimilarity measure. | + | |

- | * **''goodness''** (library ''vegan'') - returns goodness-of-fit of particular samples. See example how can be this result visualized (inspired by [[references|Borcard et al. 2011]]). | + | |

- | </WRAP> | + | |

+ | <imgcaption pcoa_nmds|Ordination diagrams of PCoA (left) and NMDS (right) calculated on Bray-Curtis dissimilarity index (square-rooted to made metric) using data from Vltava river valley dataset. The classification of samples into one of the four vegetation groups (GROUP 1-4) is displayed by different colour and symbol of individual site scores. Species are added to the ordination diagrams as weighted averages of their abundances in the sites; only species occurring in at least 20 sites are displayed.>{{:obrazky:pcoa_nmds.png?direct|}}</imgcaption> | ||

- | ==== Example of use ==== | ||

- | === NMDS of river valley data === | ||

- | |||

- | <code rsplus> | ||

- | vltava.spe <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-spe.txt', row.names = 1) | ||

- | NMDS <- metaMDS (vltava.spe) | ||

- | </code> | ||

- | <code> | ||

- | Square root transformation | ||

- | Wisconsin double standardization | ||

- | Run 0 stress 0.2022791 | ||

- | Run 1 stress 0.2193042 | ||

- | Run 2 stress 0.2130607 | ||

- | Run 3 stress 0.208742 | ||

- | Run 4 stress 0.2022791 | ||

- | ... procrustes: rmse 9.278716e-06 max resid 3.31574e-05 | ||

- | *** Solution reached | ||

- | </code> | ||

- | <code> | ||

- | NMDS | ||

- | </code> | ||

- | <code> | ||

- | Call: | ||

- | metaMDS(comm = vltava.spe) | ||

- | |||

- | global Multidimensional Scaling using monoMDS | ||

- | |||

- | Data: wisconsin(sqrt(vltava.spe)) | ||

- | Distance: bray | ||

- | |||

- | Dimensions: 2 | ||

- | Stress: 0.2022791 | ||

- | Stress type 1, weak ties | ||

- | Two convergent solutions found after 4 tries | ||

- | Scaling: centring, PC rotation, halfchange scaling | ||

- | Species: expanded scores based on ‘wisconsin(sqrt(vltava.spe))’ | ||

- | </code> | ||

- | |||

- | If the default setting of metaMDS function is used, the data are automatically (if necessary) transformed (in this case, combination of wisconsin and sqrt transformation was used). In this case, stress value is 20.2. | ||

- | |||

- | To draw the result, use the function ''ordiplot''. In this case, using ''type = 't''' will add text labels (default setting adds only points): | ||

- | <code rsplus> | ||

- | ordiplot (NMDS, type = 't') | ||

- | </code> | ||

- | {{:obrazky:ordination_unc21.png?600|}} | ||

- | <code rsplus> | ||

- | par (mfrow = c(1,2)) # this function divides plotting window into two columns | ||

- | stressplot (NMDS) | ||

- | plot (NMDS, display = 'sites', type = 't', main = 'Goodness of fit') # this function draws NMDS ordination diagram with sites | ||

- | points (NMDS, display = 'sites', cex = goodness (NMDS)*200) # and this adds the points with size reflecting goodness of fit (bigger = worse fit) | ||

- | </code> | ||

- | {{:obrazky:ordination_unc20.png?900|}} | ||

- | |||

- | ---- | ||

- | ===== Exercise 1 ===== | ||

- | Use data from the variable ''eurodist'', which is available in R (you don't need to install any library, just type ''eurodist''). This variable contains real geographical distances among big European cities (in km). | ||

- | - Using this distance matrix, calculate PCoA analysis and draw the PCoA ordination diagram - result will look somehow like a map of Europe. | ||

- | - Draw also screeplot of eigenvalues for individual PCoA axes. | ||

- | |||

- | <hidden For hints click here ☛> | ||

- | - ''cmdscale'', ''ordiplot'' with argument ''type = 't'''. To make the illusion perfect, you will perhaps need to rotate scores on the second ordination axis (the vertical one) to put Stockholm at the north and Rome at the south (you need to multiply these scores by -1). | ||

- | - ''barplot''. You need to calculate these eigenvalues - in ''cmdscale'', use the argument ''eig = TRUE'', and extract the resulting eigenvalues in the resulting object (list) as $eig. | ||

- | </hidden> | ||

- | [[en:pcoa_nmds:solution_ex1|Solution]] | ||

- | ===== Exercise 2 ===== | ||

- | |||

- | Use data about confusion of different Morse codes, originating from [[en:data:morse|Rothkopf's experiment with Morse codes]]. This is a classical data set, used by Shepard (1962)((Shepard, R. N. (1962): The Analysis of Proximities: Multidimensional Scaling with an Unknown Distance Function, I and II. //Psychometrika//, 27: 125-139 and 219-246.)) to demonstrate the use of NMDS analysis. | ||

- | |||

- | -After importing the dataset into R, the column names contain letters and row names contain Morse codes - this needs to be unified, so as column names also contain Morse codes. In R, you will need to copy row names into column names. | ||

- | - Use the distance matrix between the Morse codes (so called confusion matrix, each number represents the number of cases, when respondents consider given pair of codes as being different) to calculate NMDS analysis. | ||

- | - What is the stress value of the resulting analysis? | ||

- | - Draw ordination diagram and Shepard diagram. | ||

- | |||

- | <hidden For hints click here ☛> | ||

- | - ''colnames'' or ''names'', ''rownames'' | ||

- | - ''metaMDS'' from ''vegan'' | ||

- | - check the results of ''metaMDS'' | ||

- | - ''ordiplot'', ''stressplot'' | ||

- | </hidden> | ||

- | |||

- | [[en:pcoa_nmds:solution_ex2|Solution]] | ||

- | |||

- | ===== Exercise 3 ===== | ||

- | Check the example [[en:mix:coral_reef_betadiversity|Betadiversity of coral reefs after disturbance]] to apply NMDS analysis on community data from coral reefs. | ||

en/pcoa_nmds.1467102546.txt.gz · Last modified: 2017/10/11 20:36 (external edit)