Exploring the temporal structure of heterochronous sequences using TempEst


By academic.oup.com

Gene sequences are denoted 'heterochronous' (or measurably evolving) if they have been obtained from natural populations at evolutionarily distinct points in time. In this context, two sampling times are ‘evolutionarily distinct if genetic sequences sampled at those times differ by a measurable amount of nucleotide or amino acid substitution within the sampled population (Drummond et al. 2003a). Such data sets have become increasingly common in a range of biological disciplines, including infectious disease epidemiology, molecular ecology, molecular taxonomy, archaeology, and anthropology (e.g., Willerslev and Cooper 2005; Pybus and Rambaut 2009; Biek et al. 2015). In the past, most heterochronous data sets comprised gene sequences from either RNA viruses or ancient DNA studies of animal populations. Many RNA viruses evolve so rapidly that sequences sampled only weeks or months apart may be evolutionarily distinct, whereas ancient DNA sequences can be recovered from preserved biological material many thousands of years old, such that they are genetically different from those obtained from contemporary animal populations. More recently the concept of heterochronous data has been extended to slower-evolving micro-organisms, including DNA viruses (e.g., Firth et al. 2010) and bacteria (e.g., Lowder et al. 2009), in part as a result of the increasing availability of whole-genome sequences for these species (Biek et al. 2015). It seems very likely that heterochronous data sets will continue to grow in popularity as sequencing technologies increase in power and decline in cost.

The evolutionary and phylogenetic analysis of gene or genome sequences from different points in time necessitates the use of methods distinct from those typically applied to ‘isochronous’ data sets (i.e., alignments that contain sequences sampled simultaneously, or over a time range whose duration is trivial compared to the evolutionary timescale of the species under investigation). Most importantly, the sampling dates of heterochronous sequences contain information about the rate of sequence evolution and consequently such data sets can be used to directly infer molecular phylogenies on a natural timescale of months, years, or millennia. By contrast, the branches of phylogenies estimated from isochronous data sets represent genetic distance only, and the independent effects of evolutionary rate and divergence time on genetic distances cannot be separated without external information about one or the other. Phylogenies whose branch lengths represent time (‘time trees’ or ‘clock trees’) have advantages over those measured as genetic distance, as the timescale provides a common frame of reference that enables evolutionary change to be directly compared with known historical events. For example, Shapiro et al. (2004) used heterochronous ancient mtDNA sequences sampled across a period of 60,000 years to suggest that the rapid decline in the genetic diversity of North American bison began before, not after, the first evidence of human hunters in the region.

In order to estimate phylogenies (and other evolutionary parameters, such as effective population sizes or speciation rates) on a natural timescale of years requires a ‘molecular clock’ model, which, in essence, is a statistical description of the relationship between observed genetic distances and time. The early development of the ‘molecular clock’ concept is intertwined with historical debates over the applicability of Kimura’s Neutral Theory of Evolution to empirical data (e.g., Gojobori, Moriyama, and Kimura 1990). However, it is important to note that it is not necessary to assume the absence of natural selection in order to infer phylogenies on a natural timescale. A suite of models, generally referred to as relaxed or local molecular clocks, have been developed that allow the rate of evolution to vary (for whatever reason) among the branches of a phylogenetic tree (e.g., Huelsenbeck, Larget, and Swofford 2000; Kishino, Thorne, and Bruno 2001; Drummond et al. 2006; Drummond and Suchard 2010). Time-scaled trees can be estimated using many statistical approaches, including Bayesian inference (e.g., Drummond et al. 2012), maximum likelihood (e.g., Rambaut 2000; Sanderson 2003; Yang 2007), or heuristic methods (e.g., Drummond and Rodrigo 2000).

Before using a molecular clock model to infer a time-scaled tree from heterochronous sequences, it is advisable to confirm that the sequences under investigation contain sufficient ‘temporal signal’ for reliable estimation. In other words, there must be sufficient genetic change between sampling times to reconstruct a statistical relationship between genetic divergence and time. This is particularly important for Bayesian inference approaches such as those implemented in BEAST, because the molecular clock models employed are statistically conditioned on having an evolutionary rate greater than zero, and will usually allow inference to proceed even when the alignments being analysed contain little or no temporal information. In such cases, the software may give the appearance of a statistically well-supported timescale even when none exists. Ideally, in such circumstances, the posterior estimates of the rate of evolution should reflect the prior distribution, but in reality random error and model misspecification may result in misleading conclusions being drawn.

Source: https://academic.oup.com/ve/article/doi/10.1093/ve/vew007/1753488/Exploring-the-temporal-structure-of-heterochronous

cialis sirve para http://cialissom.com/ es mejor cialis que viagra cialis online average price for cialis cheap cialis ist cialis rezeptfrei http://cialisles.com/ difference between 36 hour cialis and daily cialis

No posts.
No posts.