ALYSIS OF SEQUENCE DIVERGENCE AT MITOCHONDRIAL GENES ON

  • Slides: 31
Download presentation
ALYSIS OF SEQUENCE DIVERGENCE AT MITOCHONDRIAL GENES ON FIVE DIFFERENT TAXONOMIC LEVELS. APPLICABILITY OF

ALYSIS OF SEQUENCE DIVERGENCE AT MITOCHONDRIAL GENES ON FIVE DIFFERENT TAXONOMIC LEVELS. APPLICABILITY OF mt. DNA DISTANCE BASE DATA IN GENETICS OF SPECIATION AND PHYLOGENETICS Y. Ph. Kartavtsev A. V. Zhirmunsky Institute of Marine Biology of Far Eastern Branch of Russian Academy of Sciences, Vladivostok 690041, Russia, e-mail: yuri. kartavtsev 48@hotmail. com

MAIN GOALS 1. What the Data Base is? 2. The Review of Literature Data

MAIN GOALS 1. What the Data Base is? 2. The Review of Literature Data on p- Distances. 3. Species Concept. Speciation Modes (SM): Population Genetic View.

INTRODUCTION (1) • Mitochondrion DNA (mt. DNA) is a ring molecule of 16 -18

INTRODUCTION (1) • Mitochondrion DNA (mt. DNA) is a ring molecule of 16 -18 kilo-base pairs (kbp) in length. As literature data show, mt. DNA of all fishes has similar organization (Lee et al. , 2001; Kim et al. , 2004; Kim et al. , 2005; Nagase et al. , 2005; Nohara et al. , 2005) and small differences among all vertebrate animals, including men (Anderson et al. , 1981; Bibb et al. , 1981; Wallace, 1992; Kogelnik et al. , 2005). • The complete content of whole mitochondrial genome (mitogenome) includes: control region (CR or D loop), where the site of initiation of replication and promoters are located, big (16 S) and small (12 S) r. RNA subunits, 22 t. RNA and 13 polypeptide genes.

INTRODUCTION (2) • Usually in phylogenetic research single gene sequences are used for both

INTRODUCTION (2) • Usually in phylogenetic research single gene sequences are used for both • • mt. DNA and nuclear genome. However, recently more and more frequent are become complete mitogenome usage. Japanese scientists are leading here for water realm organisms. Most popular in phylogenetics are sequences of cytochrome b (Cyt-b) and cytochrome oxidase 1 (Cо-1) genes, which used for taxa comparison at the species - family level (Johns, Avise, 1998; Hebert et al. , 2004; Kartavtsev, Lee, 2006). Many sequences that bringing the phylogenetic signal obtained for different taxa at gene 16 S r. RNA as well. Sequences of separate genes can have different phylogenetic signal because of differences in substitution rates. This is also true for different sections of genes. Also, under comparison of higher taxa there may be effects of homoplasy. When numerous taxa available there are problems of insufficient information capacity of sequences to cover big species diversity and adequate taxa representation is quite important (Hilish et al. , 1996). Nevertheless, for the species identification, excluding rare cases, fine results are available even with the usage of short sequences, like Со 1, with 650 bp.

Applicability of Different DNA Types in Phylogenetics and Taxonomy Species Genus Family Order Spacers

Applicability of Different DNA Types in Phylogenetics and Taxonomy Species Genus Family Order Spacers [ITS-1, 2] mt. DNA n. DNA, r. DNA Most substantiated statistically results Statistically significant results Class Phylum

1. WHAT THE DATA BASE IS?

1. WHAT THE DATA BASE IS?

1. 1. USING P-DISTANCES. SUMMARY • To estimate the actual number of substitutions among

1. 1. USING P-DISTANCES. SUMMARY • To estimate the actual number of substitutions among sequences X and Y it is necessary to introduce a certain mathematical model. • At least 8 major models (Nei, Kumar, 2000; Felsenstein, 2004) and 56 in total are referred in sources nowadays (Posada, 2005; http: //darwin. uvigo. es/software/modeltest. html). • Among most simple and known are Jukes, Cantor (1968; JC) and Kimura (1980) two parametric (K 2 P) models. The late is default in some packages (e. g. PAUP). These models consequently suggest the equality of all kinds of substitutions and non equality for transitions (α) and transversions (β). • Titles of some other models: Equal-input, Tamura, HKY (Hasigawa-Kishino. Yano), Tamura-Nei (Tr. N), General time reversible (GTR), Unrestricted.

1. 2. USING P-DISTANCES. SUMMARY • In the K 2 P model equilibrium frequencies

1. 2. USING P-DISTANCES. SUMMARY • In the K 2 P model equilibrium frequencies of 4 nucleotides are 0. 25. However, the algorithms suggested for calculations (expected p^ and its variance) here and in Jukes-Cantor model are applicable irrelevant to frequency deviations (Rzhetsky, Nei, 1995). Thus, both models are suitable for wider range of conditions, where real parameters stay unknown. • Be unconfused we should remember that in Kimura’s model ratio of transitions to transversions is R = α / 2β, however many authors and many software using different proportion - k = α / β. • In our estimates (Kartavtsev, Lee, 2006) most authors using K 2 P (29%) and many using simple p^ or such measures as HKY, Tr. N etc. To choose an appropriate model there is a popular program MODELTEST (Posada, Grandal, 1998). Very useful info on model properties and their applicability over wide range of specific data sets may be find in literature (Nei, Kumar, 2000; Hall, 2001; Sanderson, Shaffer, 2002; Felsenstein, 2004).

1. 3. USING P-DISTANCES. SUMMARY • Numerical simulations showed that when p-distances are small,

1. 3. USING P-DISTANCES. SUMMARY • Numerical simulations showed that when p-distances are small, <20%, then any model give similar values (Fig. 1. 1). • Because of heterogeneity of substitution rates along sequences and different parts of genes an important correction of p-distance is gamma-correction (e. g. Nei, Kumar, 2000; Felsenstein, 2004). Fig. 1. 1. Estimates of the number of nucleotide substitutions obtained by different distances measures when actual numbers follows Tr. N-model (From Nei, Kumar, 2000).

2. THE REVIEW OF LITERATURE DATA ON P- DISTANCES 2. 1. DIVERSITY AT DNA

2. THE REVIEW OF LITERATURE DATA ON P- DISTANCES 2. 1. DIVERSITY AT DNA MARKERS WITHIN SPECIES AND IN TAXA OF DIFFERENT RANK. AN ANALYSIS OF EMPIRICAL DATA

RESULTS (1) Fig. 2. 1. Rooted consensus (50%) tree showing phylogenetic interrelationships on the

RESULTS (1) Fig. 2. 1. Rooted consensus (50%) tree showing phylogenetic interrelationships on the basis of Cyt-b sequence data for the analyzed flatfish species (Pleuronectiformes). Bayesian tree; repetition frequencies for n=106 simulated generations are shown (%) in the nodes. The tree was built based on the Tr. N+I+G model and was rooted with the sequences of three outgroup species (Perciformes). The scales in the left bottom corners indicate relative branch lengths.

RESULTS (2) Fig. 2. 2. Rooted Neighbor Join Tree (NJ) showing phylogenetic interrelationships based

RESULTS (2) Fig. 2. 2. Rooted Neighbor Join Tree (NJ) showing phylogenetic interrelationships based on sequence diversity at Co-1 gene for 13 flatfish species (Pleuronectiformes) and two outgroup taxa (Perciformes), total 21 sequences. In the nodes a bootstrap support, n=1000. Kumara 2 parametric model is used. Line in the bottom shows the scale for branch length.

Intraspecies diversity There are many and variable estimates based on different markers. For instance,

Intraspecies diversity There are many and variable estimates based on different markers. For instance, two copepod species obtained nucleotide diversity (π) dependent on latitude at r. RNA gene of mt. DNA. Subarctic species Calanus finmarchicus, π=0. 37%, SD = 0. 26, was less variable, than temperate water, Nanocalanus minor, π=0. 50%, SD = 0. 32 (Bucklin, Wiebe, 1998). If focus on Cyt-b and Co-1 sequence diversity, К 2 Р value at Со-1 at sequence some 600 bp was estimated for 107 intraspecies groups of different species for five families of baterfly (Lepidoptera: Arctidae, Geometridae, Noctuidae, Notodontidae, Sphingidae), as small (Hebert et al. , 2002). For average values the variation is within limit: 0. 17 – 0. 36%. Our recalculation gives average for these groups: К 2 Р = 0. 25 ± 0. 04%. • In our data base for hundred intraspecies р-distances averages comprise at Cyt-b and Co-1: M=1. 55± 0. 56% and M=0. 55± 0. 19%, correspondingly (Kartavtsev, Lee, 2005). • Most important thing that I like to stress here is that for many species have been detected stable, geographically restricted intraspecies gatherings. They are marked by mt. DNA genes and obviously there are isolated intraspecies phylogroups existing for many generations, which as real are the local stocks that defined by other biological methods. Number of such examples is summarized by Avise & Wolker (1999) and many also presented in our review (Kartavtsev, Lee, 2006). Among others may be mentioned bottle-nose dolphin, Tursiops truncatus (Dowlin, Brown, 1993), Canadian gees, Branta canadiensis (Van Wagner, Baker, 1990), fishes, Fundulus heteroclitus and Stizostedion vitreum (Gonzales-Willasefior, Powers, 1990, Billington, Strange, 1990) etc. (Stepien, Faber, 1998).

p-DISTANCES IN GROUPS OF COMPARISON, Flatfish Fig. 2. 4. Resulting graph of one factor

p-DISTANCES IN GROUPS OF COMPARISON, Flatfish Fig. 2. 4. Resulting graph of one factor ANOVA and mean p-distance values at four levels of differentiation in the flatfish species (Pleuronectiformes) for Cyt-b gene. Groups: 1. Intraspecies, among individuals of the same species; 2. Intragenus, among species of the same genera; 3. Intrafamily, among genera of the same family; 4. Intraorder, families of the order Pleuronectiformes. Statistically significant variation are shown on top of the graph. SE: a standard error of mean (From Kartavtsev et al. , 2007, Marine Biology).

p-DISTANCES IN GROUPS OF COMPARISON, Review Fig. 2. 6. Categorized plot of distribution of

p-DISTANCES IN GROUPS OF COMPARISON, Review Fig. 2. 6. Categorized plot of distribution of weighted mean p-distances among four groups of comparison at Cyt-b and Co-1 genes. Groups here: 1. Intra-species, among individuals of the same species; 2. Intra-sibling species, 3. Intra-genus, among species of the same genera; 4. Intra-family, among genera of the same family (From Kartavtsev, Lee, 2006).

p-DISTANCES IN GROUPS OF COMPARISON, Review Fig. 2. 7. Plot of distribution of weighted

p-DISTANCES IN GROUPS OF COMPARISON, Review Fig. 2. 7. Plot of distribution of weighted mean p-distances among five groups of comparison at Cyt -b and Co-1 genes. Groups here: 1. Intra-species, among individuals of the same species; 2. Intrasibling species (semispecies + subspecies), 3. Intra-genus, among species of the same genera; 4. Intra-family, among genera of the same family , 5. Intra- oreder, families of the same order (From Kartavtsev, 2009, NOVA Publ. , NY). Thus, data available suggest that in general a phyletic evolution prevail in animal world, and so far, the Geographic speciation events (Type 1 a) prevail in nature. Do data presented assume that speciation is always follows the Type 1 a mode? I guess, no. Few examples below let to support this answer.

GENETIC DISTANCES AMONG SPECIES IN SEPARATE ANIMAL GENERA (After Avise, Aquadro, 1982) This plot

GENETIC DISTANCES AMONG SPECIES IN SEPARATE ANIMAL GENERA (After Avise, Aquadro, 1982) This plot illustrate a thought that different animal groups of the same rank are unequal in structural gene divergence; i. e. the rate of evolution differ either at genes or at morphology or both.

EXAMPLES OF REGULATORY DIVERGENCE AMONG FISH TAXA Comparison of chars Table 2. 1. COMPARISON

EXAMPLES OF REGULATORY DIVERGENCE AMONG FISH TAXA Comparison of chars Table 2. 1. COMPARISON OF ISOZYME ACTIVITY IN THREE WHITEFISH FORMS (COREGONIDAE) AND GRAYLING (THYMALLIDAE) LEVELS OF DIFFERENCES IN ACTIVITY LOCUS/ FORM Ratio, % Note. Total number of loci analyzed are: Whitefish – 22, Grayling – 23, “-” – Activity do not differ significantly, “+” Iterative activity difference, “++” – two-fold difference, “+++” – three-fold or greater difference

WHAT IS MAIN OUTCOME • Distance measure alone is not satisfactory • • •

WHAT IS MAIN OUTCOME • Distance measure alone is not satisfactory • • • descriptor. Data on intraspecies diversity (heterozygosity) at structural genes are necessary. Measures of regulatory genome changes should be necessary to describe transformative modes of speciation. Other descriptors of genomic change are required (e. g. chromoseme number, NF, etc. ).

3. SPECIES CONCEPT. OLD IDEAS AND NEW DEVELOPMENTS

3. SPECIES CONCEPT. OLD IDEAS AND NEW DEVELOPMENTS

WHAT SPECIES IS? Species is a biological unity which reproductively isolated from other unities

WHAT SPECIES IS? Species is a biological unity which reproductively isolated from other unities and consisting from one to several more or less stable populations of sexually reproducing individuals that occupy certain area in nature (my definition). In principal points, this is the definition of BSC (Biological Species Concept). In one of the original BSC definitions “A species is a reproductive community of populations (reproductively isolated from others) that occupies a specific niche in nature” (Mayr, 1982, p. 273). We will accept BSC for further discussion, although will keep in mind that it is restricted mainly to bisexual organisms (Mayr, 1963, Timofeev-Resovsky et al. , 1977, Templeton, 1998). n n n n n The Linnaean Species The Biological Species Concept (BSC) (Mayr, 1942, 1963) BSC Modification II (Mayr, 1982) The Recognition Species Concept (Paterson, 1978, 1985) The Cohesion Species Concept (Templeton, 1989) Evolutionary Species Concept Simpson (1961) Evolutionary Species Concept. Wiley’s (1978) Evolutionary Species Concept. The Ecological Species Concept (Van Vallen, 1976). The Phylogenetic Species Concept (Crawcraft, 1983).

SCHEMATIC REPRESENTATION OF SPECIES DIVERGENCE AND ORIGIN (After Dobzhansky, 1955) A C B A

SCHEMATIC REPRESENTATION OF SPECIES DIVERGENCE AND ORIGIN (After Dobzhansky, 1955) A C B A The keystone of STE (Synthetic Theory of Evolution) may be represented by Dobzhansky’s scheme (Fig. 3. 1), in which the gene pool separation is a key to speciation. If one provides a fact that evolution is possible without genetic change in lineages, then the evolutionary genetic paradigm and STE in particular can be rejected. Fig. 3. 1. Dobzhansky’s (1955) scheme of in time divergence. А – Single species population. B – Initial phase of divergence (subspecies). C – Different species.

Fig. 3. 1. Main Modes of Speciation Bush, 1975) FIG. 3. 2. DIAGRAMMATIC REPRESENTATION

Fig. 3. 1. Main Modes of Speciation Bush, 1975) FIG. 3. 2. DIAGRAMMATIC REPRESENTATION OF BASIC MODES OF SPECIATION (After Bush, 1975) The gene flow breaks are able to create Reproductive Isolating Barriers (RIB) or Reproductive Isolation Mechanisms (RIM), which in their turn lead to further origin of species; under different situation in nature, the different modes of speciation acted (Fig. 3. 2). Neither, the scheme above, nor the paper itself (Bush, 1975), answer many fundamental questions of speciation. For instance, it is unclear, what mode is most frequent and is a gene flow the sole primary factor, that alter gene pools or there are others? In other words we have to conclude that there is no a theory of speciation in scientific meaning at all.

SPECIATION MODES (SM): POPULATION GENETIC VIEW • ABSENCE OF QUANTITATIVE THEORY OF SPECIATION (QTS)

SPECIATION MODES (SM): POPULATION GENETIC VIEW • ABSENCE OF QUANTITATIVE THEORY OF SPECIATION (QTS) We have mentioned in preceding section that the speciation theory in evolutionary genetics is absent in exact scientific meaning, which expects the ability to predict future by theory. In this case this is to predict species origin, or at least discriminate among several speciation modes on the basis of some quantitative parameters or their empirical estimates. Attempts made in this direction (Avise, Wollenberg, 1997, Templeton, 1998) do not fit the above criteria. That is why we attempted to step in the discrimination of the speciation modes on the basis of main population genetic measurements available in literature, and that may be laid in the frame of a genetic speciation concept. • BASEMENT FOR THE QTS As a basis for the set of evolutionary genetic concepts we used the descriptions made by Templeton (1981). As a result the classification scheme for 7 different modes of speciation was created (Fig. 3. 3). This approach leads to quite simple experimental scheme that permits: (i) to arrange further investigation of speciation in different groups of organisms, and (ii) to derive analytical relations for each speciation mode (Fig. 3. 4). • EMPIRICAL QTS TESTING The scheme was tested for Cyprinids (Kartavtsev et al. , 2002) and explains well our own earlier data on salmons (Kartavtsev, Mamontov, 1983, Kartavtsev et al. , 1983). Certainly, both the testing of the scheme presented, and its theoretic background must be further developed.

Fig. 3. 3. SPECIATION MODES (SM): POPULATION GENETIC VIEW (After Kartavtsev et al, 2002)

Fig. 3. 3. SPECIATION MODES (SM): POPULATION GENETIC VIEW (After Kartavtsev et al, 2002) DIVERGENCE SM D 1. ADAPTIVE D 2. CLINAL D 3. HABITAT Necessary Conditions for Speciation D 1. a) Erection of extrinsic Isolating barriers followed by gene flow break; b) Pleotropic origin of RIB (Reproductive Isolatiion Barriers) in long time D 2. a) Selection on a cline with isolation by distance; b) Pleotropic origin of RIB D 3. a) Selection over multiple habitats with no isolation by distance; b) RIB origin by disruptive selection at genes determined behavior Sufficient Conditions for Speciation Lack of efficient hybridization in the zone of contact 1. DT > DS 2. ED = EP 3. HD = HP 4. TM- 1 (S) Lack of efficient hybridization outside the zone of contact 1. DT > DS 2. ED EP 3. HD = HP 4. TM- 2 (S) Lack of efficient hybridization inside and outside the zone of contact 1. DT = DS 2. ED EP 3. HD =< HP 4. TM- 3 (S) Experimentally measurable features and possible descriptors for the model (theory), (S) DESCRIPTORS: D – Genetic distance at structural genes: DT – in suggested parent taxa, DS – among conspecific demes, DD – among subspecies or sibling species; HD – Mean heterozygosity in suggested daughter population; Hp – Mean heterozygosity in suggested parent population; EP – Divergence in regulatory genes among suggested parent taxa; ED – Divergence in regulatory genes among suggested daughter taxa; TM+- Test for modification (positive); TM-- Test for modification (negative). RIB – Reproductive isolation Barriers.

Fig. 3. 4. ANALITICAL DESCRIPTION OF SEVEN TYPES OF SPECIATION MODES 1 (S) {(DT

Fig. 3. 4. ANALITICAL DESCRIPTION OF SEVEN TYPES OF SPECIATION MODES 1 (S) {(DT > DS) (ED = EP) (HD = HP) TM-} (D 1) 2 (S) {(DT = DS) (ED EP) (HD = HP) TM-} (D 2) 3 (S) {(DT = DS) (ED EP) (HD <= HP) TM+} (D 3) 4 (S) {(DT > DD) (ED EP) (HD < HP) TM-} (T 1) 5 (S) {(DT = DD) (ED = EP) (HD < HP) TM-} (T 2) 6 (S) {(DT > DD) (ED EP) (HD > HP) TM-} (T 3) 7 (S) {(DT > DS) (ED EP) (HD < HP) TM-} (T 4) Note. Descriptors are explained in previous figure.

DISTANCE VS TAXA SPLITTING Number of Splittings Has punctuation an impact in species origin

DISTANCE VS TAXA SPLITTING Number of Splittings Has punctuation an impact in species origin on molecular level? • Avise, Ayala, 1976; Kartavtsev et al. , 1980; current – No. • Pegel et al. , 2006 – Yes. rs = 0. 22, p < 0. 05 Transformed p-distance Fig. 3. 6. Plot of p-distance on number of splittings at Cyt-b sequence data for catfishes and flatfishes

FEW WORDS ON INSERTION SEQUENCES (IS) Fig. 3. 7. Bivariate plot of distribution of

FEW WORDS ON INSERTION SEQUENCES (IS) Fig. 3. 7. Bivariate plot of distribution of canonical variable (CV) roots among four groups of comparison among obtained IS sequences in 60 complete mitogenomes of birds and fishes. Groups: 1. Aves-1, aquatic species, that have fish as a food; 2. Aves-2, land species mostly plant & corn eaters, 3. Fish-1, species that abundant as a food for birds; 4. Fish-2, species that could not be abundant as a food for birds (From Kartavtsev, 2010). CV score variation is statistically significant: R = 0. 82, X 2 = 104. 48, P = 0. 0003. Mean classification precision for groups 1 & 2 is 85. 1%.

Summary • • • Algorithms of nucleotide diversity estimates and other measures of genetic

Summary • • • Algorithms of nucleotide diversity estimates and other measures of genetic divergence for the two genes Cyt-b (cytochrome b) and Co-1 (cytochrome oxidase 1) are analyzed. Based on theory and algorithms of distance estimates on DNA sequences, as well as on the observed distance values retrieved from literature, it is recommended for realistic tree building to use a specific nucleotide substitution model from at least 56 available from Modeltest 3. 7 or other software depending on the specific set of nucleotide sequences. Using a database of pdistances and similar measures gathered from published sources and Gen. Bank (http: //www. ncbi. nlm. nih. gov) sequences, genetic divergence of populations (1) and taxa of different rank, such as subspecies, semispecies or/and sibling species (2), species within a genus (3), species from different genera within a family (4), and species from separate families within an order (5) have been compared. Empirical data for 18, 192 vertebrate and invertebrate animal species demonstrate that the data series are realistic and interpretable when p-distance and its various derivates are used. The focus was on vertebrates and fish species in particular, and the newest dataset obtained in the framework of Fish. BOL (http: //www. fishbol. org). Distance data revealed various and increasing levels of genetic divergence of the sequences of the two genes Cyt-b and Co-1 in the five groups compared. Mean unweighted scores of p-distances (%) for five groups are: Cyt -b (1) 1. 46± 0. 34, (2) 5. 35± 0. 95, (3) 10. 46± 0. 96, (4) 17. 99± 1. 33 (5) 26. 36± 3. 88 and Co-1 (1) 0. 72± 0. 16, (2) 3. 78± 1. 18, (3) 10. 87± 0. 66, (4) 15. 00± 0. 90, (5) 19. 97± 0. 80. The estimates show good correspondence with former analyses. This testifies to the applicability of p-distance for most intraspecies and interspecies comparisons of genetic divergence up to the order level for the two genes compared. As seen from the numbers above, and from a regression analysis, there is no a sign of saturation, usually expected from a homoplasy effect. Differences in divergence between the genes themselves at the five hierarchical levels were also found. This conforms to the ample evidence showing different and nonuniform evolution rates of these and other genes and their various regions. The results of the analysis of the nucleotide as well as allozyme divergence within species and higher taxa of animals are, firstly, in a good agreement with previous results and showed the stability of a general trend, and, secondly, suggest that in animals, phyletic evolution is likely to prevail at the molecular level, and speciation mainly corresponds to the geographic model (type D 1). The prevalence of the D 1 speciation mode does not mean that other modes are absent. There at least seven possible modes of speciation. How we can recognize them formally with operational genetic criteria is a key question for establishing a quantifiable genetic model (theory) of speciation. An approach is suggested that allows a step forward in this direction

THANKS FOR ATTENTION!

THANKS FOR ATTENTION!

FEW FORMULAE • MEAN HETEROZYGOSITY (ON LOCUS/INDIVIDUAL) H = Li=1 hk / L hk

FEW FORMULAE • MEAN HETEROZYGOSITY (ON LOCUS/INDIVIDUAL) H = Li=1 hk / L hk = 1 - mi=1 pi 2, pi – an i allele frequency; L – loci number. • p-DISTANCE p^ = n d / n nd – number of nucleotides that are different between sequences X and Y, n – total number of nucleotides analysed.