How to understand phylogenetic tree

Phylogenetic Trees | Biological Principles

Learning Objectives

  1. Know and use the terminology required to describe and interpret a phylogenetic tree.
  2. Know the different types of data incorporated into phylogenetic trees and recognize how this data is used to construct phylogenetic trees
  3. Interpret the relatedness of extant species based on phylogenetic trees

What is a phylogenetic tree?

A phylogenetic tree is a visual representation of the relationship between different organisms, showing the path through evolutionary time from a common ancestor to different descendants.  Trees can represent relationships ranging from the entire history of life on earth, down to individuals in a population.

The diagram below shows a tree of 3 taxa (a singular taxon is a taxonomic unit; could be a species or a gene).

Terminology of phylogenetic trees

This is a bifurcating tree. The vertical lines, called branches, represent a lineage, and nodes are where they diverge, representing a speciation event from a common ancestor. The trunk at the base of the tree, is actually called the root. The root node represents the most recent common ancestor of all of the taxa represented on the tree. Time is also represented, proceeding from the oldest at the bottom to the most recent at the top. What this particular tree tells us is that taxon A and taxon B are more closely related to each other than either taxon is to taxon C. The reason is that taxon A and taxon B share a more recent common ancestor than they do with taxon C. A group of taxa that includes a common ancestor and all of its descendants is called a clade. A clade is also said to be monophyletic. A group that excludes one or more descendants is paraphyletic; a group that excludes the common ancestor is said to be polyphyletic.

The image below shows several monophyletic (top row) vs a polyphyletic (bottom left) or paraphyletic (bottom right) trees. Notice how the clades include the common ancestor and all of its descendants (the green and blue examples), while those labeled “not a clade” leave out some common ancestors (polyphyletic in red) or some descendants (paraphyletic in orange).


The video below focuses on terminology and explores some misconceptions about reading trees:

Misconceptions and how to correctly read a phylogenetic tree

Trees can be confusing to read. A common mistake is to read the tips of the trees and think their order has meaning. In the tree above, the closest relative to taxon C is not taxon B. Both A and B are equally distant from, or related to, taxon C. In fact, switching the labels of taxa A and B would result in a topologically equivalent tree. It is the order of branching along the time axis that matters. The illustration below shows that one can rotate branches and not affect the structure of the tree, much like a hanging mobile:$baseURL;%20?%3E_0_0/evotrees_primer_08

Hanging bird mobile by Charlie Harper

It can also be difficult to recognize how the trees model evolutionary relationships. One thing to remember is that any tree represents a minuscule subset of the tree of life.

Given just the 5-taxon tree (no dotted branches), it is tempting to think that taxon S is the most “primitive” or most like the common ancestor represented by the root node, because there are no additional nodes between S and the root. However, there were undoubtedly many branches off that lineage during the course of evolution, most leading to extinct taxa (99% of all species are thought to have gone extinct), and many to living taxa (like the purple dotted line) that are just not shown in the tree. What matters, then, is the total distance along the time axis (vertical axis, in this tree) – taxon S evolved for 5 million years, the same length of time as any of the other 4 taxa. As the tree is drawn, with the time axis vertical, the horizontal axis has no meaning, and serves only to separate the taxa and their lineages. So none of the currently living taxa are any more “primitive” nor any more “advanced” than any of the others; they have all evolved for the same length of time from their most recent common ancestor.

The time axis also allows us to measure evolutionary distances quantitatively. The distance between A and Q is 4 million years (A evolved for 2 million years since they split, and Q also evolved independently of A for 2 million years after the split). The distance between A and D is 6 million years, since they split from their common ancestor 3 million years ago.

Phylogenetic trees can have different forms – they may be oriented sideways, inverted (most recent at bottom), or the branches may be curved, or the tree may be radial (oldest at the center). Regardless of how the tree is drawn, the branching patterns all convey the same information: evolutionary ancestry and patterns of divergence.

This video does a great job of explaining how to interpret species relatedness using trees, including describing some of the common incorrect ways to read trees:

Constructing phylogenetic trees

Many different types of data can be used to construct phylogenetic trees, including morphological data, such as structural features, types of organs, and specific skeletal arrangements; and genetic data, such as mitochondrial DNA sequences, ribosomal RNA genes, and any genes of interest.

These types of data are used to identify homology, which means similarity due to common ancestry.   This is simply the idea that you inherit traits from your parents, only applied on a species level: all humans have large brains and opposable thumbs because our ancestors did; all mammals produce milk from mammary glands because their ancestors did.

Trees are constructed on the principle of parsimony, which is the idea that the most likely pattern to is the one requiring the fewest changes.  For example, it is much more likely that all mammals produce milk because they all inherited mammary glands from a common ancestor that produced milk from mammary glands, versus multiple groups of organisms each independently evolving mammary glands.

Here is an excellent resource on phylogenetic trees:

Phylogenetic Trees | Biology for Majors I

Read and analyze a phylogenetic tree that documents evolutionary relationships

In scientific terms, the evolutionary history and relationship of an organism or group of organisms is called phylogeny. Phylogeny describes the relationships of an organism, such as from which organisms it is thought to have evolved, to which species it is most closely related, and so forth. Phylogenetic relationships provide information on shared ancestry but not necessarily on how organisms are similar or different.

Learning Objectives

  • Identify how and why scientists classify the organisms on earth
  • Differentiate between types of phylogenetic trees and what their structure tells us
  • Identify some limitations of phylogenetic trees
  • Relate the taxonomic classification system and binomial nomenclature

Scientific Classification

Figure 1. Only a few of the more than one million known species of insects are represented in this beetle collection. Beetles are a major subgroup of insects. They make up about 40 percent of all insect species and about 25 percent of all known species of organisms.

Why do biologists classify organisms? The major reason is to make sense of the incredible diversity of life on Earth. Scientists have identified millions of different species of organisms. Among animals, the most diverse group of organisms is the insects. More than one million different species of insects have already been described. An estimated nine million insect species have yet to be identified. A tiny fraction of insect species is shown in the beetle collection in Figure 1.

As diverse as insects are, there may be even more species of bacteria, another major group of organisms. Clearly, there is a need to organize the tremendous diversity of life. Classification allows scientists to organize and better understand the basic similarities and differences among organisms. This knowledge is necessary to understand the present diversity and the past evolutionary history of life on Earth.

Phylogenetic Trees

Scientists use a tool called a phylogenetic tree to show the evolutionary pathways and connections among organisms. A phylogenetic tree is a diagram used to reflect evolutionary relationships among organisms or groups of organisms. Scientists consider phylogenetic trees to be a hypothesis of the evolutionary past since one cannot go back to confirm the proposed relationships. In other words, a “tree of life” can be constructed to illustrate when different organisms evolved and to show the relationships among different organisms (Figure 2).

Each group of organisms went through its own evolutionary journey, called its phylogeny. Each organism shares relatedness with others, and based on morphologic and genetic evidence, scientists attempt to map the evolutionary pathways of all life on Earth. Many scientists build phylogenetic trees to illustrate evolutionary relationships.

Structure of Phylogenetic Trees

A phylogenetic tree can be read like a map of evolutionary history. Many phylogenetic trees have a single lineage at the base representing a common ancestor. Scientists call such trees rooted, which means there is a single ancestral lineage (typically drawn from the bottom or left) to which all organisms represented in the diagram relate. Notice in the rooted phylogenetic tree that the three domains—Bacteria, Archaea, and Eukarya—diverge from a single point and branch off. The small branch that plants and animals (including humans) occupy in this diagram shows how recent and miniscule these groups are compared with other organisms. Unrooted trees don’t show a common ancestor but do show relationships among species.

Figure 2. Both of these phylogenetic trees shows the relationship of the three domains of life—Bacteria, Archaea, and Eukarya—but the (a) rooted tree attempts to identify when various species diverged from a common ancestor while the (b) unrooted tree does not. (credit a: modification of work by Eric Gaba)

In a rooted tree, the branching indicates evolutionary relationships (Figure 3). The point where a split occurs, called a branch point, represents where a single lineage evolved into a distinct new one. A lineage that evolved early from the root and remains unbranched is called basal taxon. When two lineages stem from the same branch point, they are called sister taxa. A branch with more than two lineages is called a polytomy and serves to illustrate where scientists have not definitively determined all of the relationships. It is important to note that although sister taxa and polytomy do share an ancestor, it does not mean that the groups of organisms split or evolved from each other. Organisms in two taxa may have split apart at a specific branch point, but neither taxa gave rise to the other.

Figure 3. The root of a phylogenetic tree indicates that an ancestral lineage gave rise to all organisms on the tree. A branch point indicates where two lineages diverged. A lineage that evolved early and remains unbranched is a basal taxon. When two lineages stem from the same branch point, they are sister taxa. A branch with more than two lineages is a polytomy.

The diagrams above can serve as a pathway to understanding evolutionary history. The pathway can be traced from the origin of life to any individual species by navigating through the evolutionary branches between the two points. Also, by starting with a single species and tracing back towards the “trunk” of the tree, one can discover that species’ ancestors, as well as where lineages share a common ancestry. In addition, the tree can be used to study entire groups of organisms.

Another point to mention on phylogenetic tree structure is that rotation at branch points does not change the information. For example, if a branch point was rotated and the taxon order changed, this would not alter the information because the evolution of each taxon from the branch point was independent of the other.

Many disciplines within the study of biology contribute to understanding how past and present life evolved over time; these disciplines together contribute to building, updating, and maintaining the “tree of life.” Information is used to organize and classify organisms based on evolutionary relationships in a scientific field called systematics. Data may be collected from fossils, from studying the structure of body parts or molecules used by an organism, and by DNA analysis. By combining data from many sources, scientists can put together the phylogeny of an organism; since phylogenetic trees are hypotheses, they will continue to change as new types of life are discovered and new information is learned.

Video Review

Limitations of Phylogenetic Trees

It may be easy to assume that more closely related organisms look more alike, and while this is often the case, it is not always true. If two closely related lineages evolved under significantly varied surroundings or after the evolution of a major new adaptation, it is possible for the two groups to appear more different than other groups that are not as closely related. For example, the phylogenetic tree in Figure 4 shows that lizards and rabbits both have amniotic eggs, whereas frogs do not; yet lizards and frogs appear more similar than lizards and rabbits.

Figure 4. This ladder-like phylogenetic tree of vertebrates is rooted by an organism that lacked a vertebral column. At each branch point, organisms with different characters are placed in different groups based on the characteristics they share.

Another aspect of phylogenetic trees is that, unless otherwise indicated, the branches do not account for length of time, only the evolutionary order. In other words, the length of a branch does not typically mean more time passed, nor does a short branch mean less time passed— unless specified on the diagram. For example, in Figure 4, the tree does not indicate how much time passed between the evolution of amniotic eggs and hair. What the tree does show is the order in which things took place. Again using Figure 4, the tree shows that the oldest trait is the vertebral column, followed by hinged jaws, and so forth. Remember that any phylogenetic tree is a part of the greater whole, and like a real tree, it does not grow in only one direction after a new branch develops.

So, for the organisms in Figure 4, just because a vertebral column evolved does not mean that invertebrate evolution ceased, it only means that a new branch formed. Also, groups that are not closely related, but evolve under similar conditions, may appear more phenotypically similar to each other than to a close relative.

Head to this website to see interactive exercises that allow you to explore the evolutionary relationships among species.

The Taxonomic Classification System

Taxonomy (which literally means “arrangement law”) is the science of classifying organisms to construct internationally shared classification systems with each organism placed into more and more inclusive groupings. Think about how a grocery store is organized. One large space is divided into departments, such as produce, dairy, and meats. Then each department further divides into aisles, then each aisle into categories and brands, and then finally a single product. This organization from larger to smaller, more specific categories is called a hierarchical system.

The taxonomic classification system (also called the Linnaean system after its inventor, Carl Linnaeus, a Swedish botanist, zoologist, and physician) uses a hierarchical model. Moving from the point of origin, the groups become more specific, until one branch ends as a single species. For example, after the common beginning of all life, scientists divide organisms into three large categories called a domain: Bacteria, Archaea, and Eukarya. Within each domain is a second category called a kingdom. After kingdoms, the subsequent categories of increasing specificity are: phylum, class, order, family, genus, and species (Figure 5).

Figure 5. The taxonomic classification system uses a hierarchical model to organize living organisms into increasingly specific categories. The common dog, Canis lupus familiaris, is a subspecies of Canis lupus, which also includes the wolf and dingo. (credit “dog”: modification of work by Janneke Vreugdenhil)

The kingdom Animalia stems from the Eukarya domain. For the common dog, the classification levels would be as shown in Figure 5. Therefore, the full name of an organism technically has eight terms. For the dog, it is: Eukarya, Animalia, Chordata, Mammalia, Carnivora, Canidae, Canis, and lupus. Notice that each name is capitalized except for species, and the genus and species names are italicized. Scientists generally refer to an organism only by its genus and species, which is its two-word scientific name, in what is called binomial nomenclature. Therefore, the scientific name of the dog is Canis lupus. The name at each level is also called a taxon. In other words, dogs are in order Carnivora. Carnivora is the name of the taxon at the order level; Canidae is the taxon at the family level, and so forth. Organisms also have a common name that people typically use, in this case, dog. Note that the dog is additionally a subspecies: the “familiaris” in Canis lupus familiaris. Subspecies are members of the same species that are capable of mating and reproducing viable offspring, but they are considered separate subspecies due to geographic or behavioral isolation or other factors.

Figure 6 shows how the levels move toward specificity with other organisms. Notice how the dog shares a domain with the widest diversity of organisms, including plants and butterflies. At each sublevel, the organisms become more similar because they are more closely related. Historically, scientists classified organisms using characteristics, but as DNA technology developed, more precise phylogenies have been determined.

Practice Question

Figure 6. At each sublevel in the taxonomic classification system, organisms become more similar. Dogs and wolves are the same species because they can breed and produce viable offspring, but they are different enough to be classified as different subspecies. (credit “plant”: modification of work by “berduchwal”/Flickr; credit “insect”: modification of work by Jon Sullivan; credit “fish”: modification of work by Christian Mehlführer; credit “rabbit”: modification of work by Aidan Wojtas; credit “cat”: modification of work by Jonathan Lidbeck; credit “fox”: modification of work by Kevin Bacher, NPS; credit “jackal”: modification of work by Thomas A. Hermann, NBII, USGS; credit “wolf”: modification of work by Robert Dewar; credit “dog”: modification of work by “digital_image_fan”/Flickr)


At what levels are cats and dogs considered to be part of the same group?

Show Answer

Visit this website to classify three organisms—bear, orchid, and sea cucumber—from kingdom to species. To launch the game, under Classifying Life, click the picture of the bear or the Launch Interactive button.

Recent genetic analysis and other advancements have found that some earlier phylogenetic classifications do not align with the evolutionary past; therefore, changes and updates must be made as new discoveries occur. Recall that phylogenetic trees are hypotheses and are modified as data becomes available. In addition, classification historically has focused on grouping organisms mainly by shared characteristics and does not necessarily illustrate how the various groups relate to each other from an evolutionary perspective. For example, despite the fact that a hippopotamus resembles a pig more than a whale, the hippopotamus may be the closest living relative of the whale.

Check Your Understanding

Answer the question(s) below to see how well you understand the topics covered in the previous section. This short quiz does not count toward your grade in the class, and you can retake it an unlimited number of times.

Use this quiz to check your understanding and decide whether to (1) study the previous section further or (2) move on to the next section.

"Molecular archaeology" clarified the phylogenetic relationships of vertebrates

Scientists from Konstanz university refined the tree for jawed vertebrates, using transcriptome data from a hundred different species. They found out, among other things, the location on phylogenetic tree of a number of groups that have long been questioned, including turtles, lungfish and salamanders. The article with the study was published in the journal Ecology & Evolution .

Evolutionary studies relationships between different organisms has always attracted the attention of scientists. Phylogenetic trees help to explore such important processes like adaptive radiation or convergent evolution. To build trees kinship, various data are used, including morphological and genetic. Now, in addition to individual genes and genomes, in the phylogenetic The analysis also uses transcriptomes (a set of RNA cells). This science is called phylotranscriptomics and, according to the authors of this article, it helps to resolve some disputes that phylogenomics could not resolve.

Phylogenetic relationships vertebrates is also part of the history of human development, as representative of vertebrate tetrapods. The phylum of vertebrates includes more than 68 thousand species. In this study, scientists studied jawed vertebrates ( Gnathostomata). These include most vertebrates (with the exception of hagfish, lampreys and a number of extinct classes) - fish, amphibians, reptiles, mammals and birds. Vertebrates appeared about 470 million years ago. From the very beginning, the process of speciation was quite intense among them, so it can be quite difficult to understand their relationship. Some species, in addition, have common features, while being on different branches of the phylogenetic tree. For example, both birds and bats have the ability to fly, and flying snakes and whales use echolocation.

In the new work, scientists took transcriptomes for 100 species of jawed vertebrates, including 23 species for which transcriptomes have not been studied previously. They selected 7,189 genes from which they built a phylogenetic tree calibrated for important geological events. The method of constructing trees from DNA or RNA is called "molecular archeology" because it allows tracking evolutionary changes over time. By examining these changes, one can reconstruct events that happened millions of years ago. Scientists used a number of bioinformatics tools to optimize the method: for example, they were able to take into account sample contamination and the role of paralog genes, ignore poorly sequenced regions, and, using hidden Markov models, minimize the role of incorrectly annotated genes. Scientists note that this method can be used to reconstruct evolutionary relationships not only between vertebrates, but also any other living organisms.

It turned out that the size of the genome did not affect the rate of evolution and speciation. In addition, it was previously believed that the number of small insertions and deletions (indels) affects the size of the genome, but in this study, no significant correlation was observed in this regard, either in the coding or non-coding regions.

The resulting tree suggested that the main groups of cartilaginous and ray-finned fishes appeared in the Ordovician period earlier than previous phylogenetic studies showed, which is consistent with recent paleontological work. The same goes for turtles and birds from the early Cretaceous. Turtles turned out to be a sister group to crocodiles and birds, and their role as an independent group of anapsids was refuted. The researchers also managed to resolve the dispute about lungfish, which turned out to be close relatives of land vertebrates (at the same time, the hypothesis that lobe-finned fish are closer to them was not confirmed). Amphibians turned out to be a monophyletic group, and not paraphyletic, as some researchers thought. The oldest species of salamanders were representatives of group Andrias , not sirens.

Many scientists set themselves the task of understanding and tracing the origin of species and their relationships. Recently, we talked about the presented preliminary version of the "Tree life" as part of a large-scale phylogenetic project that includes data for more than 2.3 million species of living organisms.

Nadezhda Potapova and Anna Kaznadzey

Editor's Note: Initially, there were a number of inaccuracies in the note, which were later corrected. We apologize to our readers.

Found a typo? Select the fragment and press Ctrl+Enter.

How is the course of evolution reconstructed? / Habr

The course of evolution is depicted in the form of trees

The most important task of biology is the reconstruction of the course of evolution. Paleontological data are sorely lacking, and all that we have besides it is now living organisms. The paths of evolution are depicted as trees of life, or phylogenetic trees, showing the order in which the evolutionary paths of various organisms diverged. They are called phylogenetic because the process of evolutionary history of an organism is called phylogenesis.

Trees now have many important uses. As a purely fundamental nature, for example, to find out the structural features of organisms that lived billions of years ago. And more applied. They are used in assembling genomes, searching for genes important for the pathogenicity of pathogens, and much more.

Like much in evolutionary biology, the first phylogenetic tree was built by Darwin when he traveled on the Beagle ship (it is in the picture), however, then it was only a concept that illustrated the idea that during the course of evolution one species was divided into several.

How do we know which tree suits us?

Let's say we have 4 organisms. Cow, pig, human and lizard, and we want to know their evolutionary history. How to understand what tree we need? After all, you can build several options for their evolution.

There is no universal answer to this question. One answer is a local variant of Occam's razor. You need to choose a tree where the same mutations occurred only once (or, the least number of times). Each species has some features by which it can be compared with others. For example, you can consider the presence of hooves. A pig and a cow have hooves, but a man and a lizard obviously do not. There are several options for the arrangement of these four species on the tree. We will analyze only two of them, but for a larger number of trees, the principle is exactly the same.

Let's say we have built two trees and we want to know which one is better.

For the first tree, it suffices to assume that hooves arose once in a common ancestor of a pig and a cow. That is, it was enough to make one change.

For the second tree, the hooves either had a common ancestor of all four organisms and were lost twice, in the human and lizard lines, then two changes are needed to explain this model. Or the common ancestor did not have hooves, in which case they must have arisen independently in the cow and pig lines, which, again, requires two changes.

So we think the first tree is better than the second. Of course, everything is not always so simple, and sometimes there are several equivalent trees, and it is impossible to say which one is better within the framework of this method. But by studying not one sign, but many, this issue can be resolved.

What signs to take and what does DNA have to do with it?

Previously, trees were built on the basis of the development of organisms, external and internal structure, and, later, biochemical characteristics, which is why they often turned out to be inaccurate, rough and, most importantly, highly dependent on the opinions and preferences that dominated in the head of their creators.

A much more reliable and objective method is now available. If we look at the DNA molecule, we will see that it is a polymer chain consisting of four types of monomers, the sequence of which is precisely the genetic code. In other words, DNA can be thought of as text written in a four-letter alphabet (called A, T, G, and C in bioinformatics). And, each position on the DNA can be taken as a feature that can take on the values ​​of any of the four letters, or, be dropped out.

Each time a cell is divided, this text is copied and sometimes errors occur during copying. Accordingly, the more times the texts were copied independently of each other, the greater the differences between them. However, it does not make sense to compare complete genomes due to the fact that such mutations are constantly encountered that cannot be described by modern tree-building algorithms. A chromosome can split into several, two can merge into one. Viruses can insert their genetic material into the DNA of a cell. Genes can be duplicated or lost. The list of such mutations is quite extensive, so it is common to compare small sections of DNA that are known to have undergone only small mutations, such as single letter substitutions or small deletions and insertions. Such sections of DNA can, for example, be the same genes of different species. Trees are also often built by proteins, but the differences between the methods of building by proteins and by DNA are purely technical, and are not important for understanding the essence. Trees reflecting the evolution of whole organisms are usually made on the basis of many such trees, built on small sections of DNA, or, on individual genes, about which it is known in advance that their evolution coincides with the evolution of organisms.

How are trees built?

There are three principal ways to build a tree. The first is the enumeration of all possible trees that can be built for given organisms. Not all plausible, but all in general. But even for a tree with twenty leaves, such enumeration is already too complicated, and it is often necessary to build much larger trees. The second is tree cultivation. In this approach, a small, simple tree is first built, and then one branch is added to it, so that a good tree comes out each time. And so, until it is completely built., And the third is heuristic methods, with the help of which some kind of plausible tree is built according to a certain algorithm. The previously described method for assessing the quality of a tree is exactly what is needed, for example, to answer the question which of the built trees is better, or where to put a new branch when growing a tree.

In conclusion, I would like to talk about an interesting application of trees.

Learn more