Molecular Homology

Molecular homology is an important concept in modern evolutionary biology, used to test the relationships between modern taxa, and to examine the evolutionary processes driving evolution at a molecular level. It is a rapidly changing field, and one that students who wish to "explore evolution" should surely understand. Explore Evolution will not provide that understanding.

The book focuses too narrowly on peripheral topics, without giving students an appreciation of the techniques and concepts central to molecular biology and molecular systematics. Explore Evolution only considers homologies in DNA sequence, but scientists also examine homologous amino acid sequences in proteins (which can result from several different DNA sequences), and homologous protein folds (which can result from numerous different amino acid sequences). Before it was practical to sequence genomes, scientists also used simpler measures like molecular size and electrical charge to identify similarities and differences between molecules.

Rather than surveying this range of homologies, Explore Evolution focuses on a few minor points: the consistency of the code for translating between nucleic acids and amino acids, the ability to estimate the timing of evolutionary divergence based on molecular divergence, modest and predictable differences between the patterns of homology in different genes, and genes that seem to have no homologs in other species. In each case, the account given is misleading and inaccurate, reflecting creationist misconceptions rather than the best science.

p. 58: "Biologists have long thought that the genetic code is basically the same in all organisms."

Since the genetic code was first identified, biologists knew that small variants were possible, and existed. Such small variants do not undermine the basic consistency of the code.

p. 59: "advocates of universal common descent have always assumed that a change in code was not survivable"

Scientists have known that the code does vary since at least 1968. Such variants do not undermine the evidence for common ancestry, and understanding the mechanism by which such variants evolve strengthens the evidence for a common source of the genetic code.

p. 58: "It's very hard to see how an organism could have survived a transformation from the standard code"

Mechanisms for this transformation are understood and have been studied in the lab.

p. 57: "A 'family history' of organisms based on their anatomy should match the 'family history' based on their molecules"

There are many well-studied evolutionary processes that can cause a molecular phylogeny to differ from a phylogeny based on anatomy. These well-understood processes are unmentioned.

p. 57: "Michael Lynch has noted that creating a clear picture of evolutionary relationships is 'an elusive problem'"

Lynch was not referring to all evolutionary relationships, only to distant evolutionary connections. He explains: "Given the substantial evolutionary time separating the animal phyla, it is not surprising that single-gene analyses yield … discordant results." In such distant relationships, "Statistical noise" can make "inferences based on single genes… very misleading."

p. 58: "Carl Woese…now thinks that biology must abandon what he calls Darwin's 'Doctrine of Common Descent'"

Woese is engaged in a technical dispute which Explore Evolution misrepresents consistently. Woese does not question the common ancestry of all animals and plants, nor does he think it unlikely that all modern life descended from a common population of organisms with extensive gene swapping. This is a different form of common ancestry, not a rejection of common ancestry in all forms.

p. 57: "the rate of mutation varies in response to a number of environmental factors"

The citation offered to justify this claim offers no support. The claim is a staple of creationist writings, but has no basis in modern evolutionary biology.

Parents for ORFans: Ongoing research finds homologues for supposed ORFansParents for ORFans: Ongoing research finds homologues for supposed ORFans

p. 60: "a large number of genes code for proteins whose functions we don't understand… ORFan genes"

This is not how ORFans are defined. Some have known functions, others do not code for proteins. While coding sections of DNA with no known homologues were identified early in whole-genome sequencing, continued sequencing efforts have revealed homologues for ever more such sequences.

p. 61: "conflicting phylogenetic trees or ORFans … undermine … Universal Common Descent"

Modern evolutionary biologists have addressed these issues, and the incomplete and inaccurate account in Explore Evolution fails to address the current state of the science, or current thinking on the shape of the tree of life.

p. 61: "Michael Syvanen… Michael Gordon"

Syvanen and Gordon have explored some possible alternative shapes for the tree of life, but neither scientist has changed the state of scientific opinion, nor do either challenge the basic principles of common ancestry of all animal and plant life, or the likelihood that all life descends from a united population of cells at some point, however many lineages entered that population.

Major Flaws:

A Universal Tree of Life: The scientists cited by Explore Evolution do not dispute that multicellular life shares a common ancestry, nor the likelihood that all modern life descended from a single population of ancient organisms that swapped their genes widely.

Evolving Codes and Novel Genes: Explore Evolution cites minor variations in the universal genetic code to undermine common ancestry. The mechanisms behind such variations are well-understood and in fact strengthen our understanding of common ancestry. ORFans, sections of DNA that have the structure of genes but lack homologues in other species, do not undermine claims of common ancestry. As new genomes are sequenced, ever more homologues are found.

Molecular Clocks: Common ancestry, contrary to the claims in Explore Evolution, is not dependent on molecular clocks. Nonetheless, some potential problems cited by the book have been addressed by working scientists. Others are based on creationist misconceptions about molecular clocks, and simply lack scientific basis.

A Universal Tree of Life

Molecular homology, genes shared due to common ancestry, is a powerful tool for reconstructing the history of life. A small fraction of the research in molecular phylogeny is concerned with tracing all life to a common ancestor, or population of ancestors. Explore Evolution ignores the bulk of research in molecular evolution to focus on this narrow topic.

Even within this overly-narrow focus, the treatment is deeply inaccurate. The farther back in time a common ancestor would be, the more opportunities there have been for statistical noise to obscure the evolutionary signal. Explore Evolution emphasizes these more complicated issues, and misrepresents the views of scientists like Michael Lynch, Carl Woese, Michael Syvanen, and Michael Gordon who have addressed the shape of the early tree of life.

Whether or not there was one origin of life or several, the best available science indicates that all modern life descended from a single population of organisms. The species in this ancestral population would have shared genes so readily that the entire population can be treated as the ancestor of modern life. Explore Evolution muddles this new field of research, proposing instead a set of entirely separate trees, a vision of life tendered only by creationists, not by any active researchers in the field.

Do Different Genes Mean Different Phylogenetic Trees?

Phylogenetic trees based on single genes (or small numbers of genes) can differ from one another, but Explore Evolution overstates both the extent of the inconsistencies and their implications for phylogenetic reconstruction. Inconsistencies are most common when analyzing phylogenetic events in the very deep past (such as separation of the main animal groups in the pre-Cambrian), and occur for reasons that are well characterized and indeed predicted based on statistical and evolutionary considerations (changes in evolutionary rates, convergent evolution, etc.). In addition, the recent exponential increase in available sequence data has been shown successfully overcoming these artifacts, generating consistent trees with high confidence. Most importantly, the authors' claim that these discrepancies mean that "molecular evidence cannot be reconciled with the theory of Universal Common Descent" (p. 57) is entirely unsupported.

Yhe authors of Explore Evolution reveal a major gap in their understanding of phylogeny and of much of modern biology when they state:

… if Darwin's single Tree of Life is accurate, then we should expect that different types of biological evidence would all point to that same tree. A "family history" of organisms based on their anatomy should match the "family history" based on their molecules (such as DNA and proteins).
Explore Evolution, p. 57

Phylogenetic trees based on a specific gene (gene trees) and those based on several genetic and anatomical traits map the relationships of different entities, genes and organisms. Inconsistencies in phylogenetic tree reconstructions are a fascinating issue and research addressing these inconsistencies has led to a better understanding of complex evolutionary processes.

Phylogenetic trees reconstructed from different genes in the same organism can differ. The possible causes of such differences are understood, ranging from methodological issues (such as different parameters being applied to the algorithms used to weigh sequence similarities) to bona fide biological phenomena. The latter are more interesting and significant, and are generally due to the effects of recognized evolutionary processes on the history of individual genes: convergence, the same sequence change appearing independently in different lineages either because of similar selective pressures, or by chance; changes in evolutionary rates , certain organisms evolve faster than others; horizontal gene transfer, sequences being transfered from one species to another by mechanisms other than vertical, linear descent; and timing, two lineages radiate from a third in relatively close succession, before enough differences mayhave accumulated between them to be able to discern the order of emergence. Thus, even in the best scenarios, absolutely congruent phylogenies from the analysis of individual genes are not expected. The authors of Explore Evolution make it seem as if biologists are surprised and stumped by these inconsistencies:

Evolutionary biologist Michael Lynch has noted that creating a clear picture of evolutionary relationships is "an elusive problem." He also notes that "analyses based on different genes - and even different analyses based on the samegenes — [yield] a diversity of phylogenetic trees."
Explore Evolution, p. 57

But in the very next paragraph of the same paper, Lynch makes the main underlying issues clear:

Given the substantial evolutionary time separating the animal phyla, it is not surprising that single-gene analyses yield such discordant results. Under such circumstances, the statistical noise associated with the substitution process leads to a high probability that phylogenetic analyses based on different molecules will yield different topologies (Philippe et al. 1994; Ruvolo 1997), so that inferences based on single genes can potentially be very misleading (leaving aside for now the additional problem of orthology).[emphasis added]
Lynch M. "The Age and Relationships of the Major Animal Phyla." Evolution. 1999; 53:319-325.

In Lynch's paper the phrase "elusive problem" and the issue of multiple phylogenetic trees applied specifically to the "phylogenetic relationships of the major animal phyla", i.e. very distant evolutionary events; the Explore Evolution authors craftily presented it as if Lynch was referring to all "evolutionary relationships".

Another example of the issues encountered in phylogenetic reconstruction, and their misrepresentation in Explore Evolution, comes from the following paragraph:

A "family tree" based on anatomy may show one pattern of relationships, while a tree based on DNA or RNA may show quite another. For example, one analysis of the mitochondrial cytochrome b gene produced a "family tree" in which cats and whales wound up in the order Primates. Yet, an anatomical analysis says that cats belong to the order Carnivora, while whales belong to Cetacea – and neither of them are Primates.
Explore Evolution, p. 57

The authors are talking about a review paper by Michael Lee (Lee MSY, 1999 Trends Ecol Evol 14:177-178), in which he refers to data obtained on one of the proteins involved in the respiratory chain in mitochondria, cytochrome b. The figure below shows the tree as presented in Lee's review paper: Cytochrome b phylogenetic tree: from Lee, 1999 Trends Ecol Evol 14:177-178Cytochrome b phylogenetic tree: from Lee, (1999) Trends Ecol Evol 14:177-178

The phylogenetic inconsistency here is the misplacement of a single branch, that of tarsiers (a primitive group of primates), as if they had separated from other primates before cats and fin-back whales. Actually, the data in the original publication (see figure below, Andrews et al. 1998 "Accelerated Evolution of Cytochrome b in Simian Primates: Adaptive Evolution in Concert with Other Mitochondrial Proteins?" J Mol Evol. 47:249–257) gives a slightly different picture, namely that the analysis of cytochrome b sequence is statistically incapable of resolving the phylogenetic relationship of most of the species in the tree (the numbers in the figure represent a measure of the statistical confidence in each branch of the tree, and numbers below 30 generally indicate lower confidence; the statistically robust values are underlined). In other words, cytochrome b is simply not a good protein to choose for constructing the evolutionary tree of these species. But why is that?

Cytochrome b phylogenetic tree: from Andrews et al., 1998; adapted to match layout and nomenclature in Lee, 1999 (see prevoius figure)Cytochrome b phylogenetic tree: from Andrews et al., 1998; adapted to match layout and nomenclature in Lee, 1999 (see prevoius figure)

Both the Andrews and Lee papers suggested, based on other data, that the phylogenetic incongruence in this tree was caused by cytochrome b and other respiratory chain proteins having evolved much faster in some primate lineages compared to other mammals, possibly following unique selective pressures. As mentioned above, both accelerated and adaptive evolution can cause errors in phylogenetic tree reconstruction, masking or enhancing the similarities of related genes, depending on the circumstances. And indeed, in more recent years the accelerated adaptive evolution of respiratory chain proteins in monkeys and apes (but not tarsiers and lemurs) has been extensively confirmed (see for instance Grossman LI, et al. 2004 "Accelerated evolution of the electron transport chain in anthropoid primates." Trends Genet. 20:578-585). Thus, the inconsistency in the cytochrome b tree, rather than highlighting hopeless phylogenetic confusion as alleged in Explore Evolution, is the result of real biological and evolutionary processes. The existence of this extensive literature offers opportunities for an inquiry-based lesson on molecular evolution and evolutionary processes. Instead of offering that lesson, the supposedly inquiry-based Explore Evolution throws up its hands in confusion at any sign of difficulty.

Although molecular phylogenetic tree inconsistencies are hardly a fundamental theoretical concern for evolutionary biology, if persistent they could still cause practical problems in assessing certain evolutionary relationships. However, a number of new approaches have recently emerged that address these difficulties. These methods include the combination of large sets of sequence information from genomic databases, as well as the use of genetic features, such as large-scale structural changes or the mapping of mobile genetic elements, that are less prone to convergence and selection-related artifacts. For a thorough discussion of the potential of these approaches, see Lokas A and Carroll SB, (2006) "Bushes in the Tree of Life" PLoS Biol 4:e352.

Finally, the authors of Explore Evolution conclude:

Critics point out that the real problem may be that Universal Common Descent is wrong. In other words, maybe the reason the family trees don't agree is that the organisms in question never did share a common ancestor. Even some evolutionary biologists agree. Carl Woese of the University of Illinois, for instance, now thinks that biology must abandon what he calls Darwin's "Doctrine of Common Descent".
Explore Evolution, p. 58

Woese argues that the earliest history of life may show multiple early lineages which swapped genes extensively, making reconstruction of the early tree of life difficult. This is very different from the strictly non-overlapping trees Explore Evolution suggests as an alternative to universal common ancestry. Woese argues that these multiple lineages converged into a single population from which modern life, and would absolutely reject the claim that molecular data cannot discern the pattern of common ancestry linking all primates, or the relationship between primates, carnivores, and whales, or indeed the common ancestry of all multicellular organisms.

Phylogenetic Trees and Molecular Family Histories

The authors do not understand phylogeny, and have a very limited understanding of the biological vocabular and issues. They propose that if common descent is correct, then:

A "family history" of organisms based on their anatomy should match the "family history" based on their molecules (such as DNA and proteins).
Explore Evolution, p. 57

This is simply wrong.

Genes and organisms are very different things; gene family trees and organism family trees can, and do, differ. Contrary to the book's statements, evolutionary biology does not expect these trees to match up exactly. Rather, the relationship between genes and organisms is an issue in biology that is under active research in a wide range of fields.

The Last Universal Common Ancestor

Explore Evolution claims that certain scientists dispute the existence of a single universal ancestor, citing authors would not actually dispute universal common descent. They just disagree about the form it took, and the nature of the population of organisms from which modern living things evolved.

An Uprooted Tree of Life: From W. Ford Doolittle (2000) "Uprooting the tree of life." Scientific American, 282(2):90-5.  Note that distances are not necessarily to scale in this image.  This image reflects a view held by some practicing scientists (including Dr. Doolittle, the author of the original article) that there was a period in life's early history when genes swapped so frequently that it is impossible to treat those earlier lineages as truly distinct, nor to trace those lineages back cleanly to a single ancestor.  They do not dispute that life has some common ancestor, but they do seek to clarify how we talk about that ancestor.An Uprooted Tree of Life: From W. Ford Doolittle (2000) "Uprooting the tree of life." Scientific American, 282(2):90-5. Note that distances are not necessarily to scale in this image. This image reflects a view held by some practicing scientists (including Dr. Doolittle, the author of the original article) that there was a period in life's early history when genes swapped so frequently that it is impossible to treat those earlier lineages as truly distinct, nor to trace those lineages back cleanly to a single ancestor. They do not dispute that life has some common ancestor, but they do seek to clarify how we talk about that ancestor.

As discussed in the critique of the Introduction, there is ongoing research into the nature of the Last Universal Common Ancestor. Scientists traditionally envisioned an ancestral population of a single species which branched and gave rise to the modern diversity of life. More recently, researchers are suggesting that that ancestral population of bacteria was not composed of a single species, but of multiple species which swapped genes freely.

This is the point Michael Syvanen is making when he is quoted by Explore Evolution

Do the puzzles of conflicting phylogenetic trees or ORFans … undermine the theory of Universal Common Descent? Most evolutionary biologists say no. …Others are not so sure. Molecular evolutionist Michael Syvanen of the University of California-Davis argues that, "there is no reason to postulate that a LUCA (Last Universal Common Ancestor) ever existed."
Explore Evolution, p. 61

Once again, Explore Evolution misrepresents the views of a scientist by selecting a phrase that sounds as though evolution were on shaky ground. This 'mined' quote suggests, incorrectly, that Syvanen is arguing against Universal Common Descent. Here is what Syvanen actually wrote, in a paper indicating that genes for certain biochemical pathways had been transferred between lineages long after their divergence:

There has been recent discussion that horizontal gene transfer is so frequent that it may never be possible to reconstruct the last common ancestor. However, if biochemical unities could be achieved after speciation events by horizontal gene transfer, then there is no reason to even postulate that a LUCA ever existed. If horizontal gene transfer is as common as I am implying, the modern cell could have evolved in multiple parallel lineages. Earliest life could have been truly polyphyletic.
Michael Syvanen (2002) "On the occurrence of horizontal gene transfer among an arbitrarily chosen group of 26 genes," Journal of Molecular Evolution, 54:258-266

Syvanen is not necessarily disputing Universal Common Descent; he is disputing the existence of a Last Universal Cellular Ancestor [LUCA]. Syvanen is participating in an ongoing debate about the shape of the trunk of the tree of life. As shown in the figure above, some scientists see the evidence of extensive gene flow between ancient bacterial species as a sign that, at the base of the tree of life, lines between lineages were less clear, and that the branches didn't begin separating until a later point. Syvanen explained his views in some more depth in a post at the Panda's Thumb blog, pointing out that our increasingly detailed knowledge of the order in which certain ubiquitous genes evolved places greater and greater constraints on the composition of LUCA, and that "if we accept the existence of this LUCA there are a variety of reasons to believe that the LUCA itself was the product of an evolutionary process that employed horizontal transfer events." The full details of this debate require students to have a grasp of biological details that they will not have until after their high school biology classes, but an inquiry-based textbook might work with students to explain what sorts of research is under way to resolve some of these unresolved issues. Instead, Explore Evolution claims the existence of the discussion as proof that nothing is known, a profoundly unscientific attitude, and an unacceptable approach for a textbook to adopt. The existence of horizontal gene transfer, a major topic in discussions of the early tree of life, is not mentioned in the book's index, glossary, or relevant sections.

Malcolm Gordon

Explore Evolution claims that the common ancestry of life is disputed because:

Biologist Michael Gordon of UCLA argues that the single branching-tree picture of life's history is not accurate, but that life must have had multiple, independent starting points.
p. 61

Malcolm Gordon is an expert in the functional morphology of fish, not the origin of life. His argument in "The Concept of Monophyly: A Speculative Essay." (1999, Biology and Philosophy 14:331–348) is that there was likely to be many different environments where life could have arisen (almost certainly true), and that the mobilities of the first organisms would be limited (also likely to be true), that it is probable (note that he never says must as Explore Evolution claims) there would be sufficient time for more than one origin of life event to occur. However, he vastly underestimates the spreading capacity of even slow growing non-motile bacteria. Anyone who has had the misfortune to have a sterile solution contaminated with a few bacteria can attest to how rapidly they grow, and in the prebiotic environment there would be no competition for them. So again, unless the origin of life is astoundingly easy, the first living organism would have had the field to itself. The near universality of the genetic code is consistent with this scenario, but universal common descent does not require a single origin of life, nor have evolutionary biologists ever required only a single origin of life.

There is grandeur in this view of life, with its several powers, having been originally breathed by the Creator into a few forms or into one …
Charles Darwin, The Origin of Species, 1859, Final paragraph

As that quotation from Darwin demonstrates, the theory of evolution does not, a priori, demand that life could not have more than one independent origin. However, there are several lines of evidence that compellingly suggest that the origin of life was limited to a very few, most probably one, initial population serving as an "organism." The strongest evidence for this idea is the nearly universal genetic code. This does not argue that the genetic code used by current organisms was found in the first life form; there are several alternative codes that would work. But if there were multiple origins for present-day life, we would expect to see significantly different genetic codes. However, all we see are minor mutational variants of the standard code.

Also, the basic biochemistry of organisms is fairly universal. If life had developed multiple times, we would expect to see at least some organisms with a radically different biochemistry, but we just don't see that.

Furthermore, unless the origin of life was exceptionally easy, it took some time for "living organisms" to develop. Hundreds of thousands to millions of years (even if they were simple self-replicating molecules) were likely required. Once the first living organism had developed, within the space of a few decades its descendants would have monopolized the prebiotic Earth, preventing the development of competitor organisms. It is unlikely, even if the origin of life were astoundingly simple, that a second form of life developed within a few years of the first one to prevent the first's monopoly. Even if a second organism had developed independently a few years after, the first organism with its head start would be much fitter, and would likely out-compete the second.

Evolving Codes and Novel Genes

The genetic code, the translation between each sequence of three nucleotides to an amino acid, is shared widely across the tree of life. Explore Evolution seizes upon some small variations in that translation to claim that common ancestry must be wrong.

Contrary to claims in Explore Evolution, the genetic code is still considered to be universal, in that the known exceptions are very minor variations on the same basic code found in all organisms. Long before the discovery of variant codes, it was known that the genetic code can in fact change, and how it can change, based on laboratory findings in bacteria and yeast mutants. We also have a good understanding of the kind of mutations and selective forces that can allow the genetic code to evolve new variants.

We also have a growing understanding of so-called ORFans, coding DNA sequences which don't seem homologous to genes in any other lineage. In the account of Explore Evolution, these are insuperable challenges to evolution, but this misrepresents the current state of knowledge. The more of these apparently unique sections of coding DNA we study, and the more genomes we sequence, the fewer of these ORFans actually seem unique. Many of what were thought to be ORFans actually have families, and Explore Evolution misinforms students by treating these sequences as unsolvable problems.

The Genetic Code

Explore Evolution wrongly state that biologists originally maintained that the genetic code is absolutely universal (invariant); that this absolute universality was considered evidence for common descent; that this would be a reasonable inference because changing the code would be invariably lethal ("not survivable"); and finally, that the claim of universality fell apart in the 1980s with the discovery of variant genetic codes. Thus, the authors claim, the genetic code is not universal and the inference of common descent is in question and life must have "multiple separate origins." They cite physicist Hubert Yockey to justify the claim: "Some scientists think this is a possibility, saying that the evidence may point to a polyphyletic view of the history of life." (p. 59)

There are many problems with this argument, which is based on misunderstanding and misrepresentation of the available knowledge and of the scientific record.

First, contrary to the key assertion, scientists have been aware of natural genetic code mutants since at least the 1960s, and the actual molecular mechanism of some of these mutations (such as "suppressors of amber") was elucidated in both bacteria and yeast (Goodman HM, Abelson J, Landy A, Brenner S, Smith JD. (1968) "Amber suppression: a nucleotide change in the anticodon of a tyrosine transfer RNA." Nature 217:1019-24; Capecchi MR, Hughes SH, Wahl GM. (1975) "Yeast super-suppressors are altered tRNAs capable of translating a nonsense codon in vitro." Cell 6:269-77.) Amber suppressor mutations change the read-out of certain codons from STOP to an amino acid by altering the structure of one of the transfer RNAs. This tRNA recognizes the codons in messenger RNAs and allows the addition of the correct amino acid during protein synthesis. These mutants showed how new variant genetic codes can evolve, and what kind of selective pressures can favor such changes (in this case, the need for reversion of point mutations which introduce deleterious STOP codons in critical genes). Therefore, it was recognized fairly early that the genetic code did not need to be absolutely invariant to be fundamentally shared between all organisms ("universal"). Already in 1966, Francis Crick stated in his Croonian lecture:

The best evidence to date [for the universality of the code] is probably the excellent agreement between the code deduced for E. coli and the mutagenic data … derived from tobacco plants or human beings. There is thus little doubt that the genetic code is similar in most organisms. Whether there are any organisms which use a slightly modified version of the code remains to be seen.
Crick, FH (1967) The Croonian Lecture. "The Genetic Code." Proc R Soc Lond B Biol Sci. 167:331-47
Code Variants: from Knight RD, Freeland SJ, Landweber LF. "Rewiring the keyboard: evolvability of the genetic code." Nat Rev Genet. 2001; 2:49-58Code Variants: from Knight RD, Freeland SJ, Landweber LF. "Rewiring the keyboard: evolvability of the genetic code." Nat Rev Genet. 2001; 2:49-58

Second, the small number of organisms with variant genetic codes and the limited extent of the changes (involving a few codons at most) strongly support the view that these represent new variations of the "standard," universal code, as opposed to independently originated codes. Moreover, the known code variants themselves offer in many cases evidence for common descent, being shared by related organisms according to the established phylogenetic hierarchy, as shown in the figure below, from Knight RD, Freeland SJ, Landweber LF. (2001) "Rewiring the keyboard: evolvability of the genetic code." Nat Rev Genet. 2:49-58, which contains a thorough discussion on the phylogenetic distribution and mechanisms of genetic code variation.

In particular, the authors of Explore Evolution mention the specific example of organisms in which 2 of the 3 "stop codons" have been reassigned to encode for amino acids (change "a" in the figure above). They argue:
It's very hard to see how an organism could have survived a transformation from the standard code to this one. Changing to this new code would cause the cell to produce useless strings of extra amino acids when it should have stopped protein production.
Explore Evolution, p. 58

However, the mechanisms that underly this particular kind of code change in some organisms are known, and undermine the authors' argument. The studies have been performed mainly in Ciliates, a group of unicellular eukaryotes belonging to the Protozoans, which include Tetrahymena, the organism specifically cited by the authors. Notably, Ciliates have a peculiar genomic organization, with hundreds of very small chromosomes, often containing a single gene, organized in two distinct nuclei; moreover, their genes tend to have unusually short sequences past their termination codons. This is important because it means that mutations that suppress termination (such as that mentioned by the authors) are less likely to generate very long amino acid stretches past the normal protein end, and hence to cause deleterious phenotypes. Consistent with this possibility, Ciliates comprise the majority of organisms with alternative genetic codes containing termination suppressors (Knight RD, Freeland SJ, Landweber LF. (2001) "Rewiring the keyboard: evolvability of the genetic code." Nat Rev Genet. 2:49-58; Lopuzone CA, Knight RD, Landweber LF. (2001) "The molecular basis of genetic code change in ciliates." Curr Biol. 11:65-74).

The particular genetic code mentioned by the authors, in which the UAG and UAA codons are used to encode the amino acid glutamine instead of STOP, results from two sets of changes. The first involves a reassignment of the transfer RNA for glutamine to recognize UAG and UAA. Interestingly, this kind of change can occur through intermediates with only partial effects ("wobble") (Schultz DW, Yarus M. 1994 "Transfer RNA mutation and the malleability of the genetic code" J. Mol. Biol. 235(5):1377-80). The second set of changes affects eRF1, one of the proteins involved in recognizing STOP codons (Lopuzone CA, Knight RD, Landweber LF. 2001 "The molecular basis of genetic code change in ciliates." Curr Biol. 11:65-74). Because of this, it is not "very hard" at all, and in fact very possible, to envision gradual evolution of this new genetic code through intermediates in which the codon interpretation is ambiguous; "hybrid codes" in a sense.

To justify the claim that some scientists see variation in the genetic code as evidence against a single tree of life, the authors quote a sentence from a 1992 book by Hubert Yockey, a physicist with an interest in information theory of biological systems. Interestingly, the quote in question is absent from the latest edition of the book. More importantly, in the current edition, Yockey approvingly quotes Francis Crick's suggestion that all extant life forms descended from a small interbreeding population (i.e. common descent), and once again prophesizing the possibility of organisms with variant codes:

Crick (1981), in one of those marvelous intuitions that have led him to so many discoveries, and without the mathematical argument above, has proposed: “What the code suggests is that life, at some stage, went through at least one bottleneck, a small interbreeding population from which all subsequent life has descended… Nevertheless, one is mildly surprised that several versions of the code did not emerge, and the fact that the mitochondrial codes are slightly different from the rest supports this.
Yockey H, (2005) Information Theory, Evolution, and The Origin of Life. Cambridge University Press, 2005, p.102

Here, the authors of Explore Evolution make the fundamental mistake of conflating (either on purpose or due to a misunderstanding of the underlying issues) two very different questions: common descent, whether extant organisms can trace their ancestry to a single population at some point in the past, and abiogenesis, whether life originated only once. Quite obviously the two issues are distinct. It is entirely possible that life originated more than once, but that early life forms were so promiscuous in sharing genetic material that they constituted, from a genetic standpoint, a single population from which all later organisms evolved. Essentially no scientist at this point objects to the possibility of the latter proposition, although some disagree as to the pattern in which various lineages arose from the original population. These differences may affect the extent to which linear, vertical descent lineages can be unequivocally identified when analyzing the deepest phylogenetic relationships (such as the separation of life into the Domains of Eubacteria, Archaea, and Eukarya), but do not change the view that all organisms ultimately are phylogenetically related.


Explore Evolution claims:

[Molecular biologists] have been surprised to learn that a large number of of genes code for proteins whose function we don't understand yet. They call these ORFan genes.
Explore Evolution, p. 60

This is not the definition of ORFans. ORFans are "open reading frames," sections of a chromosome with a start codon followed by a stretch of nucleotide triplets and ended by a stop codon and which do not match a known coding DNA sequence in other species. There is no guarantee that these sections even code for a protein, let alone that they have any function. More importantly, these merely have no currently recognized relatives. (Siew N, Fischer D. (2003) "Analysis of singleton ORFans in fully sequenced microbial genomes." Proteins. 53:241-51) Function is not a consideration in defining ORFans. Some of these proteins with no known relatives do have recognized functions (e.g. bacterial virulence factor staphostatin B (1nycA)).

In contrast, we do have many genes that are in recognizable gene families, but whose functions are not clear from their sequence alone. For example, alpha-beta barrel family proteins have a wide variety of functions, and it is difficult to deduce the function of a member from simple inspection. The incorrect definition given in Explore Evolution artificially inflates the purported number of ORFans.

According to evolutionary theory, new genes arise from old genes by mutation … . New genes should resemble the older "ancestor genes." However, these newly discovered genes do not match any sequence that codes from a known protein.
Explore Evolution, p. 61

Most ORFans have relatives found for them rather rapidly as new genomes are sequenced. With the larger databases available now, old ORFans are finding relatives (e.g. in 2004 hypothetical protein Apc1120 was an ORFan, now several relatives have turned up) and fewer new ORFans are being found. Also, we know that proteins can be generated de novo, so not all proteins must be traced back to older ancestor genes.

Thus, there are two claims here:

  1. There are a substantial number of ORFans have no similarity to other sequences and
  2. Common descent assumes all (or a very high proportion) of current proteins all originated with the Last Universal Common Ancestor.

The first claim is deeply misleading and the second is wrong.

Explore Evolution gives the impression that there are many genes with no relation to any other genes (especially by selectively quoting from older papers). In fact while initially many putative genes in a newly sequenced organism may appear to be unrelated to any then known gene, relatives are usually found rather rapidly. When H. influenzae was first sequenced, 64% of its Open Reading Frames (ORF's, putative genes) were ORFans, as of 2003, only 5.2% were. When Mycoplasma genitalium was first sequenced, roughly 30% of its predicted genes were ORFans, and now all have homologues in other lineages.

Explore Evolution quotes the brief review N. Siew, D. Fischer. 2003 "Twenty Thousand ORFan microbial protein families for the biologist?" Structure 11:7-9.

If proteins in different organisms have descended from common ancestral proteins by duplication and adaptive variation, why is it that so many today show no similarity to each other? Why is it that we do not find today any of the necessary “intermediate sequences” that must have given rise to these ORFans?
Explore Evolution, p. 62

This citation ignores the following sentences from that paper:

Regardless of their origin, ORFans may be of two types. Some ORFans may correspond to newly evolved (through a yet unknown mechanism) or to unique descendants of ancient proteins, with unique functions and three-dimensional (3D) structures not currently observed in other families. Alternatively, ORFans may correspond to highly diverse members of known protein families, but with functions and/or 3D structures similar to proteins already known.

As well as the prescient observation:

More sensitive computational methods, such as fold recognition or sequence-to-profile comparisons, may succeed in assigning some ORFans to known families, and thus, their roles and functions may be gained.

This is what has turned out to be the case. By ignoring work in this area since 2003, (including papers from Siew and Fischer published after this mini-review, such as Siew N, Fischer D. (2003) Proteins. 53:241-51), Explore Evolution gives a highly distorted picture of our current understanding of ORFans.

ORFans versus Genome Number: The proportion of ORFans in the genome, as compared to the total number of sequenced genes. As we increase the number of genes sequenced, the percent of ORFans fall. As of 2003, only 5% of long ORFans (ORF's that are unlikley to be simple sequencing artefacts) were unaccounted for. Figure 1, C from Siew, N and Fisher D, PROTEINS: Structure, Function, and Genetics 53:241–251 (2003)ORFans versus Genome Number: The proportion of ORFans in the genome, as compared to the total number of sequenced genes. As we increase the number of genes sequenced, the percent of ORFans fall. As of 2003, only 5% of long ORFans (ORF's that are unlikley to be simple sequencing artefacts) were unaccounted for. Figure 1C from Siew N, Fischer D. (2003) "Analysis of singleton ORFans in fully sequenced microbial genomes." Proteins. 53:241-51). Figure 1, C from Siew, N and Fisher D, PROTEINS: Structure, Function, and Genetics 53:241–251 (2003)

In an inquiry-based class, a teacher might ask the students to suggest reasons why some putative genes appear to be ORFans. Once students generated that list, the teacher could encourage students to generate testable hypotheses and even to test those hypotheses. Instead of guiding students and teachers along that path, Explore Evolution encourages students simply to surrender in the face of the unexplained, a decidedly inquiry-averse approach. Some of the reasons scientists have offered for genes to remain ORFans includ:

  1. Some ORFans may be artefacts: Many ORFans are very short, 100-150 codons long. It is likely that many of these represent database or annotation errors. Also, in any genome, one would expect some random ORFs being formed. Fukuchi S and Nishikawa K. ("Estimation of the number of authentic orphan genes in bacterial genomes." DNA Res. 2004 Aug 31;11(4):219-31, 311-313.) closely examined sequences and estimated that about half of all short ORFans are sequencing or other errors.
  2. Some ORFans may have relatives, but we haven't sampled enough genomes yet. As of 2003, when most of the ORFan comparisons were done, something like 60 complete bacterial genomes had been sequenced. Note the diagram above, with the continuing fall of ORFans as more genomes are sequenced. By 2006 the percentage of ORFans fell by a further 5% (Marsden RL, et al., "Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space." Nucleic Acids Res. 2006 34:1066-80). More genomes have been sequenced since then, but there are many, many more bacteria that are not yet sequenced, and will have genomes quite divergent from the human pathogens that form the majority of current sequences. This will be especially important because a horizontal transfer from a distantly related bacteria that has not been sequenced will look like an ORFan (until that distantly related bacteria is sequenced). A recent paper shows that many E. coli ORFans are the result of horizontal gene transfer from bacteriophages (Daubin and Ochman, 2004; "Bacterial genomes as new gene homes: the genealogy of ORFans in E. coli". Genome Res. (6):1036-42.). Bacteriophages are viruses, which is why they didn't turn up in bacterial database comparisons.
  3. Some ORFans may have relatives, but our tools aren't good enough to detect these relatives yet. Rapidly evolving proteins, especially small proteins, can have have their evolutionary history obscured by multiple substitutions during their evolution. More sensitive techniques are needed to find the relatives of these proteins, usually based on structural recognition. For example, using improved fold recognition software and a large database of fold family structures, Siew et al. have found that in Bacillus sp., some related ORFans are members of the of the alpha/beta hydrolase superfamily, and most likely derive from the haloperoxidases (N. Siew, H. K. Saini and D. Fischer. (2005) "A Putative Novel Alpha/Beta Hydrolase Family in Bacillus." FEBS Letters, 579:3175-82.).

    So most ORFans have been accounted for, and as we study more genomes with better tools we will resolve the status of many more. In an inquiry-based approach, students could recheck Escherichia coli ORFans from 2003, and would find that the vast majority now have resolved relatives. Indeed, if some of the non-artefactual ORFans are due to horizontal transfer from bacteriophages, as recent experiments suggest (Daubin and Ochman, 2004), then they may prove to be a valuable tool in understanding the phylogeny of bacteria, in the same way that families of LINES, SINES and pseudo genes have been. Far from being a threat to common descent, the patterns seen of the nested hierarchies of singleton, lineage specific and family specific ORFans are those you would expect from common descent.
  4. Some ORFans may be de novo generated proteins. We fully expect a modest proportion of new genes to be generated de novo during evolution. We even have examples of proteins that are so generated. The most famous of these is the nylonase gene, which allows bacteria to metabolise the artificial polymer nylon. This was produced by a mutation in a piece of non-coding DNA which generated a transcribable protein (Okada H, et al., (1983) "Evolutionary adaptation of plasmid-encoded enzymes for degrading nylon oligomers." Nature. 306(5939):203-6.). The sperm-specific dynein intermediate chain gene (Sdic) was generated by a fusion mutation between two genes (so strictly speaking it falls under the gene duplication rubric), but the coding region of the new Sdic gene is generated from the non-coding intronic regions, so protein homology studies would have a hard time identifying it (Nurminsky DI, et al., (1998) "Selective sweep of a newly evolved sperm-specific gene in Drosophila." Nature. 396(6711):572-5). Formation of new genes poses no problem for evolutionary biology or common descent, as we do not demand that all, or the vast majority of genes originate in the Last Common Universal Ancestor. Furthermore, we are quite able to trace common ancestry with some genes being generated de novo, as this does not disturb the trees generated from other genes.

Molecular Clocks

Explore Evolution's arguments against molecular clocks are a bungled mishmash of actual facts, misinterpretations and completely spurious claims. First, the authors raise the issue of calibration of the molecular clock. This is an acknowledged potential problem in using the clock to date certain evolutionary events, especially those in the very deep past. Nevertheless, when appropriate methodology and controls are used, molecular clock dating has been shown to be reliable and consistent.

Contrary to the authors' statement, "environmental factors" such as magnetic field changes and mass extinctions are thought to play a minor role, at best, in molecular clock variation, as opposed to intrinsic differences in generation times/molecular substitution rates and selective forces across lineages and time. Significantly, the author's use and misattribution of this "environmental factors" claim appears to have resulted from their reliance on secondary creationist sources, instead of the primary scientific literature they cite.

Regardless of the claims in Explore Evolution, common ancestry is not dependent on any particular molecular clock. At best, this is again dependent on creationist sources. At worst, it is simply irrelevant.

Environmental Influences

The authors of Explore Evolution raise another issue, that of variations of clock rates along lineages due to "environmental factors", which if true would be more problematic because they would be harder to control for. However, while mutation rates in individuals can of course change according to environmental factors (e.g radiation exposure), for this change to noticeably affect substitution rates in populations and species (which is what the molecular clock measures), the change in mutation rates would have to be very substantial, to be sustained over many generations, and to affect a very large proportion of individuals in a species. These are not a common occurrence, even on a paleontological scale, and indeed environmental changes are not counted by experts among major hurdles in molecular clock studies. The authors provide the following citations for their statements:

16 James W. Valentine, David Jablonski, and Douglas H. Erwin, "Fossils, molecules and embryos: new perspectives on the Cambrian explosion," Development 126 (1999):851-859.

17 According to James W. Valentine, David Jablonski, and Douglas H. Erwin, these environmental factors might include the collapse of magnetic fields, ad mass extinctions (which may create environmental niches).

p. 62

Nowhere in the paper by Valentine, Jablonski and Erwin do the authors find that magnetic field changes and mass extinctions affect clock rates. In fact, while the idea that inversions of the Earth’s magnetic field could correlate with accelerated evolution by effects on cosmic radiation levels was considered early in the 1960s, it was quickly shown to be very unlikely based on physical and biological principles.

Explore Evolution wrongly attributes this idea to Valentine and his collaborators, and gives no basis for this attribution.

Molecular Clock Rates

Explore Evolution's arguments against molecular clocks are a bungled mishmash of actual facts, misinterpretations and completely spurious claims. First, the authors raise the issue of calibration of the molecular clock. This is an acknowledged potential problem in using the clock to date certain evolutionary events, especially those in the very deep past. Nevertheless, when appropriate methodology and controls are used, molecular clock dating has been shown to be reliable and consistent.

Explore Evolution is wrong to present molecular clocks as "evidence for Common Descent" or to claim that such a link constitutes circular reasoning. The authors do not bring any specific example of this usage of the molecular clock by biologists, so it is hard to evaluate in what context it has been made, if at all. Again, however, allegations of this claim being made by unspecified "evolutionists" and its refutation are found in several Creationist sources.

Explore Evolution claims:

Critics also dispute both the accuracy and the importance of the "molecular clock." They dispute the accuracy because of many known problems with calibrating such clocks. To time something accurately, you must know that your watch runs at a constant rate—that it doesn't speed up or slow down. Unfortunately, say the critics, the rate of mutation varies in response to a number of environmental factors. As a result, even if we knew when species diverged, we couldn't be sure that the molecular clock was "ticking" at a constant rate.
Explore Evolution, p. 59

There are several problems in this paragraph. First, it illustrates a rather grating habit found throughout Explore Evolution: the appropriation of arguments made by evolution scientists as if they were made by "critics," followed by the misrepresentation of such arguments. In this case, it is not anonymous "critics" who have pointed out that the molecular clock can perform unevenly, it is the very same scientists who then improve and continue to use the molecular clock approach.

More specifically, scientists have identified two main potential sources of error in molecular clock studies. The first is unevenness in the "ticking" rate. The finding that different proteins evolve (indeed, must evolve) at different rates was already known in the 1960s, and it was incorporated into the early theoretical formulations of molecular clocks by Jukes, Dickerson, Kimura and others. Differences in clock rates for the same protein between evolutionary lineages became clear with the advent of large-scale gene sequencing in the 1980s (Wu CI, Li WH. 1985 "Evidence for higher rates of nucleotide substitution in rodents than in man." Proc Natl Acad Sci U S A. 82:1741-5. Li WH, Tanimura M. 1987 "The molecular clock runs more slowly in man than in apes and monkeys." Nature. 326:93-6). Both these kinds of differences are generally measurable, and can be accounted for using appropriate calibration methods and adjustments.

The choice and statistical evaluation of calibration points has been more difficult. In order to accurately date events, scientists must set the clock based on events that are recognized as being accurately dated based on independent fossil or (for more recent events) archaeological evidence. Once one or more such events (for instance, the separation of the lineages giving rise to birds and mammals) are identified, genetic differences in a specific set of proteins between relevant species (in this case, birds and mammals such as humans and chickens) can be measured to "set" the clock, which can then be applied to the separation of other lineages in comparable time frames. Calibration points, especially for analyses in the deep past, have been a source of sometimes heated debate among scientists (Graur D, Martin W. 2004 "Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision." Trends Genet. 20:80-6.; Hedges SB, Kumar S. 2004 "Precision of molecular time estimates." Trends Genet. 20:242-7; Glazko GV, Koonin EV, Rogozin IB. 2005 "Molecular dating: ape bones agree with chicken entrails." Trends Genet. 21:89-92). Still, the consensus is that application of the molecular clock with the appropriate controls and cautions can be useful and reliable.



Knight RD, Freeland SJ, Landweber LF. Rewiring the keyboard: evolvability of the genetic code. Nat Rev Genet. 2001; 2:49-58,