[statistical exploration of landscapes of phylogenetic trees] treespace implements statistical tools for the exploration of sets of phylogenetic trees describing the evolutionary relationships between the same taxa. This web interface provides an easy access to the resources implemented in the package. Each tab is made of two panels: a
sidebar used to choose inputs, analysis tools and aesthetics, and a main panel displaying results.
Tree landscape explorer
The
Tree landscape explorer tab is where the whole tree space can be explored. Choose between a two- or three-dimensional plot to visualise the trees using Metric Multidimensional Scaling (MDS, a.k.a. Principal Coordinates Analysis, PCoA), which calculates the best reduced-spaced visualisation of the distances between trees. The
sidebar contains the following sections:
- Input: to upload the set of trees to analyse
- Analysis: to customize the analysis
- Aesthetics: to customize the graphics
Input
treespace takes a list of phylogenetic trees as input. The user can choose between data distributed with the package, or provide input files. Two types of input files can be used:
- R objects saved using the function save(x, file="x.RData") where 'x' is a list of trees of the class multiphylo (from the ape package. Accepted extensions are ".RData" and ".rda".
- list of trees saved in a nexus file, e.g. using ape's function write.nexus(x, file="x.nex") in R.
Analysis
Tree summary / metric: the method to be used to measure distances between tips of the trees. Choose from:
- Kendall Colijn: the tree metric developed by Kendall & Colijn; used by default
- Billera, Holmes, Vogtmann: the Billera, Holmes & Vogtmann tree metric (also known as the 'geodesic distance')
- Robinson Foulds (unrooted): the Robinson Foulds tree metric. Note that this implementation of the metric (from the package phangorn) treats the trees as unrooted and uses the unweighted edge-count distance (Robinson Foulds 1981).
- Tip-tip path distance (unrooted): metric by Steel and Penny which counts the number of internal nodes on the shortest path between each pair of tips. Along with its weighted version (below), this is also known as the tip distance, nodal distance, patristic distance and dissimilarity measure. Trees are treated as unrooted. (see ?nNodes in the package adephylo)
- Tip-tip branch-length distance (unrooted): similar to the tip-tip path distance, but using the branch lengths instead of counting the edges. (see ?distTips in the package adephylo)
- Abouheif test: the Abouheif test as presented in Pavoine et al. (2008) (see ?distTips in the package adephylo)
- Sum of direct descendents: another test related to the Abouheif test (see ?distTips in the package adephylo)
Lambda: The value of lambda used in Kendall & Colijn's metric.
Number of MDS axes retained : The number of principal components to retain in the Metric Multidimensional Scaling (MDS).
Assess quality of projection (Shepard plot)? It is important to be aware of how well or otherwise the Multidimensional Scaling (MDS) plot represents the tree space. Euclidean metrics lend themselves to MDS plotting, whereas other metrics and summaries may prove difficult to accurately project into a small number of dimensions. A Shepard plot is a scatter plot of the actual distances in tree space (x-axis) versus the projected distances in the plot (y-axis). The stronger the correlation, the better the MDS plot represents the true distances.
Identify clusters? Whether to identify clusters by different colours in the plot. Clusters may be found statistically using the
findGroves function, which attempts to group trees into clusters of nearby trees in the space. When this option is selected, you will also have the option to choose the clustering method and number of clusters. Alternatively, trees may be coloured by metadata. For example, if the trees were inferred from different genes, or using different inference software, then this can be shown on the plot. Upload a .RData or .csv file containing a list or vector corresponding to the trees. For example, if the trees were inferred from three different genes and there are 100 replicates per gene, create a character vector corresponding (in the same order) to the trees, e.g.
treeTypes <- c(rep("Gene1",100),rep("Gene2",100),rep("Gene3",100)) save as an .RData file, and upload. The trees will be coloured according to gene, making it possible to analyse whether or not the genes contain different phylogenetic signals, according to the chosen inference method.
Aesthetics
A number of graphical options are available.
View: The default is to view the entire "tree landscape", that is, a 2- or 3-dimensional map where the points correspond to trees and the distances between points approximate their relative distances in tree space, according to the chosen measure. The rest of the options detailed below correspond to this view. An alternative is to pick a single "reference tree" and plot the distances from it to each other tree. If this view is chosen, you will be given the option to select the reference tree of interest. When there are many trees in the set, it may be helpful to expand the plot by using the scale bar to increase the height, to view the tree labels more easily. Finally, if clusters are identified in the analysis then these will be marked on this plot in the corresponding colours. Note that this gives an indication of the quality of the clustering: highly scattered clusters correspond to poor resolution in the space. Clusters can overlap in this view: it is quite possible for multiple clusters of trees to be equidistant from the reference tree in different "directions" in the space.
Plot dimensions: If three or more axes have been retained, the option to view the space in three dimensions will be available. It is possible to rotate the image by clicking on it and dragging the mouse, and to zoom in and out with the mouse scroll button. Note that 3D plotting depends on the
rglwidget package. At the time of writing, the latest CRAN version (0.1.1431) contains a bug; if you are running this version then you will receive a warning recommending installing the latest development version:
install.packages('rglwidget', repos='http://R-Forge.R-project.org') .
x/y/z axes: Used to select which principal components are represented as x and y axes (and, if viewing in 3D, the z axis) on the scatterplot.
Output
Beneath the scatterplot are output options: the scatterplot, trees and analysis results can be exported in various formats. Currently available options are:
- save the 2D scatterplot as a png file or the 3D scatterplot as an interactive html file. Unfortunately a formal method to save a snapshot of the 3D plot is not yet supported, but you can right-click (Mac: long-click) on the image and select 'Save image as ...'.
- (if plotted) save the Shepard plot as a png file
- save trees to a Nexus file (.nex)
- save results as csv file (spreadsheet-like text file; compatible with most systems); results are output as a table where each row is a tree label, and columns contain optional clustering results as well as principal components (PC) of the Metric Multidimensional Scaling (MDS) of the tree space
- save results as an R object (.RData); in this case, two objects will be saved in the .RData: 'trees' will be a list of trees of class 'multiPhylo', and 'analysis' will be the results of the analysis; when clusters are not inferred, 'analysis' is the output of the function 'treespace'; when clusters are inferred, 'analysis' is the output of the function 'findGroves'.
Tree viewer
The
Tree viewer tab is where individual trees can be plotted, one or two at a time. The
sidebar contains the following sections:
- Input: to pick the tree or trees to view
- Aesthetics: to customize the graphics
Input
The tree selection options are as follows:
- Single tree view: for plotting a single tree
- Two tree comparison: for plotting two trees side by side, using tip colour to highlight topological differences
Within each of these viewing modes, the tree or trees may be selected as follows:
- Median tree: select the overall median tree, or, if clusters have been identified, the median from each cluster
- General tree selection: select an individual tree by its name (if provided) or number in the list of trees supplied
Aesthetics
A number of graphical options are available. See
?plot.phylo from the package
ape for more details.
Output
Beneath the tree(s) is the option to save the image as a .png file.
densiTree viewer
The
densiTree viewer tab is where collections of trees can be viewed together using the
densiTree function from the package
phangorn, which is based on the software
densiTree, which provides considerably more functionality. The
sidebar contains the following sections:
- Input: to pick the collection of trees to view
- Aesthetics: to customize the graphics
Input
The entire tree set can be viewed using the option 'All trees'. Note that this is likely to be slow for large sets of trees. If clusters have been detected in the
Tree landscape explorer tab then the collection of trees from each cluster can also be selected. This can help to show the variation within the cluster.
Aesthetics
A number of graphical options are available; further options will be added soon.
Output
Beneath the densiTree plot is the option to save the image as a .png file.
More information
treespace is developed on
github. For questions, bug reports, feature requests and contributions, please use
github's issue system.
[Back to top]