True Age of mt-Eve based on Direct Pedigree Method

Since no one had taken effort to calculate the true age of mtEve based on the directly observed pedigree method (as I can’t find any paper), I am taking that effort now to post it as a blog.

HVR1 Estimation

Based on a recently published paper High Mitochondrial Mutation Rates Estimated From Deep-Rooting Costa Rican Pedigrees, I thought of estimating the age of mt-Eve. The paper is the result of the analysis of HVR1 on 19 deep-rooted pedigree in a population of mixed origin in Costa Rica.

 

Fig. 1 (High Mitochondrial Mutation Rates Estimated From Deep-Rooting Costa Rican Pedigrees)

As you can see, there are at least 7 mutations in 289 transmissions. Which means, there is a probability of (7/289) 0.024221 for every transmission to have a mutation. In other words, 1 new mutation can occur every 41 generations.

Now, let us consider how much mutations we have from mtEve. To look into it, let me consider the number of mutations from several FTDNA projects to find the maximum and minimum genetic distance. Kit #50252 from Cumberland gap-mtDNA project as 14 mutations while Kit #258240 has only 3 mutations, from X mtDNA project. As you can see, the HVR1 mutations vary from 3 to 14 as the distance from RSRS (or mtEve). There could be even more extremes but, let’s consider an average 8.5 mutations from mtEve for HVR1. This is reasonable because, if you take any mtDNA project from FTDNA, you will notice at least on average 8 mutations in HVR1 for RSRS.
One mutation can occur every 41 generations and humans have 8.5 mutations on average as the genetic distance from mtEve. So, there should be 41*8.5 generations from mtEve. If we consider 20 years for 1 generation, we have 41*8.5*20 = 6970 years as the age for mtEve.
The age of mtEve using HVR1 alone gives 6970 years.

HVR1 & HVR2 Estimation – Parsons Paper

In the paper, A high observed substitution rate in the human mitochondrial DNA control region, which includes HVR1 and HVR2, they took samples from Armed Forced DNA Identification Laboratory, Oxford British families, CEPH pedigree cell lines and Old Order Amish pedigree cell lines.
Fig 2. (A high observed substitution rate in the human mitochondrial DNA control region)
They found 10 mutations in 327 generational events. Which means, there is a probability of (10/327) 0.03058 for every transmission to have a mutation. In other worlds, 1 new mutation can occur every 33 generations.
Now, let us consider how much mutations we have from mtEve. To look into it, let me consider the number of mutations from several FTDNA projects to find the maximum and minimum genetic distance. Based on kits #282059 and #50252 from Cumberland gap-mtDNA project, the maximum mutations for HVR1 and HVR2 from RSRS is 22. Based on kits #N48849 and #N23635 from X mtDNA project, the minimum mutations from RSRS is 10. As you can see, the HVR1 and HVR2 mutations vary from 10 to 22 as the distance from RSRS (or mtEve). So, let’s consider an average 16 mutations from mtEve for HVR1 and HVR2. This is reasonable because, if you take any mtDNA project from FTDNA, you will notice at least on average 16 mutations in HVR1 and HVR2 for RSRS.
One mutation can occur every 33 generations and humans have 16 mutations on average as the genetic distance from mtEve. So, there should be 33*16 generations from mtEve. If we consider 20 years for 1 generation, we have 33*16*20 = 10560 years as the age for mtEve.
The age of mtEve using HVR1 and HVR2 gives 10560 years.

HVR1 and HVR2 Estimation – Santos Paper

In the paper, Understanding Differences Between Phylogenetic and Pedigree-Derived mtDNA Mutation Rate: A Model Using Families from the Azores Islands (Portugal), based on 321 mtDNA transmissions they detected 11 substitutions in the D-loop (more precisely, in 973 bp located between positions 16024–16596 of HVR1 and 1–400 of HVR2), which implies that 0.0343 mutations occur (at a detectable level) in the D-loop in each generation (95% CI: 0.014–0.054). The paper then adds if we employ the same definition of mutation used by other authors (for example Howell et al. 2003), only mutations for which there is evidence that they are germinal should be considered (see table 2). This implies that the mutation rate would be reduced almost by half: six mutations in 321 mtDNA transmissions, that is, 0.0187 mutations/generation for the entire D-loop.
In other words, 1 mutation can occur every 53.5 generations. With humans having 16 mutations from RSRS for HVR1 and HVR2, mtEve should be 53.5*16 generations. If we consider 20 years for 1 generation, we have 53.5*16*20 = 17120 years as the age for mtEve.

The age of mtEve using HVR1 and HVR2 gives 17120 years.

HVR1 and HVR2 Estimation – Other Papers

Some of the other papers that agree with the above two results include the following:

Data Set Region Analyzed No. of Mutations /  Generations Mutations from Eve Years Before Present
Howell et al. 1996 Control 2/88 16 14080
Bendall et al. 1996 HVR1 4/360 8.5 15300
Mumm et al. 1997 HVR1 1/59 8.5 10030
Parsons et al. 1997c HVR1, HVR2 1/32 16 10240
Parsons and Holland 1998 HVR1, HVR2 10/306 16 9792

Hence, based on HVR1 and HVR2, the age of mtEve should be approximately within a range of 9000 to 17000 YBP.

HVR1, HVR2 and Coding Region Estimation

Based on the paper, The pedigree rate of sequence divergence in the human mitochondrial genome: There is a difference between phylogenetic and pedigree rates,

The cumulative coding region data presented here can be combined with those published elsewhere (Howell et al. 1996), to derive a preliminary estimate of the pedigree divergence rate. Excluding the LHON mutations, the rate of newly arising germline mutations in the coding region is as follows: TAS1, 0 mutations/107 transmission events; ENG1, 1 mutation/26 transmission events; USA1, 1 mutation/11 transmission events; NWC1, 1 mutation/9 transmission events; and QLD1, 1 mutation/17 transmission events. Thus, there are 4 coding region mutations/170 transmission events, or ∼0.15 mutations/bp/Myr (99.5% CI 0.02–0.49).

Thus 4 coding region mutations for 170 transmissions. In other words, 4/170 = 0.02353 mutations per generation. For HVR1 and HVR2, we have 10/327 = 0.03058 mutations per generation (based on Parsons Paper). Since these two events are not mutually exclusive and the formula is P(A or B) = P(A) + P(B) – P(A and B).

P(Any mutation HVR1,HVR2 or CR) = 4/170 + 10/327 – (4/170 * 10/327) = 0.05339 (or 1 mutation every ~18 generations). With 57 mutations from mtEve, and 1 mutation takes 18 generations, mtEve must be 1026 generations back. Considering 20 years for 1 generation, mtEve must be 1026*20 = 20520 ybp.

Considering Santos Paper, having a mutation rate of 6/321 per generation, and combining with coding region mutation rate, we get,
P(Any mutation HVR1,HVR2 or CR) = 4/170 + 6/321 – (4/170 * 6/321) = 0.04178 (or 1 mutation every ~24 generations). With 57 mutations from mtEve, and considering 20 years for 1 generation, mtEve must be 24*57*20 = 27360 ybp.

Hence, mtEve based overall mtDNA mutations including HVR1, HVR2 and Coding Region, gives a range of 20520 to 27360 years before present.

Conclusion

  • HVR1 alone provides an age estimation of 6970 YBP.
  • HVR1 and HVR2 provides an age estimation of 9000 to 17000 YBP.
  • HVR1, HVR2 and Coding Region provides an age estimation of 20520 to 27360 YBP.
Since some mutations on the coding region can be lethal, estimation based on the coding region can be inaccurate and too exaggerated because the harmful mutations might be missing as they would have made the mutated person dead and thus the lineage is lost and the number of observed mutations becomes less. Hence, the overall estimation should be taken with a grain of salt must be taken well below the lower end, for the calculation that includes coding region.
Many scientists do acknowledge that the directly verifiable pedigrees mtDNA mutation rates are observed to be much higher in the order of 10x times compared to the phylogenetic mutation rates which are non-observable and has several assumptions.

The real value of mutation rate in humans has recently been the subject of an intense debate between those advocating the use of a phylogenetic mutation rate (~3 x 10^-6 substitutions per site per generation of 20 yr) calibrated by the divergence between humans and chimpanzees (Jazin et al. 1998) and those studying the mutation process directly on pedigrees giving numbers ~10 times larger (~2.7-3 10^-5 substitutions per site per generation; Howell et al. 1996; Parsons et al. 1997; Parsons and Holland 1998). 

Hence, the true mtEve age should be between 6000 – 25000 YBP based on directly observed pedigree method.

Update:

Criticism for the above post from a person with PhD in Population Genetics. Below email was forward to me to mention that I am wrong. I do not know who the person is as the person who forwards it to me didn’t reveal it to me. So, I am also not revealing any of the names here. For convenience, I will call the author of the below letter to John.

Hello ***,

Felix Chandrakumar really seems to be a troll exploiting the discrepancy between pedigree-based and evolutionary rates. As I understand, you need here, at first, a list of knowingly false statements of Mr Chandrakumar, rather than a discussion with the troll about these issues (e.g. admitting his ability to discuss the subject, his knowledge of basic literature published to date and his being a party in the discussion with the competence at least equal to that of authors of the papers he cites; NB! these authors never stated that evolutionary rates are useless or misleading, he does).

* He never goes into details of argumentation the authors engage. However, at least in Santos et al 2005 paper, many pitfalls of pedigree rates are discussed, among them: poor coverage of control-region-sites-known-to-mutate-evolutionary in all pedigree-based studies; inability to extrapolate what we know about the families where the data was collected, to the past; uncertail state of heteroplasmic mutations frequently found in pedigree studies.

* Mr Chandrakumar does not distinguish between mutation events of several different types. However, HVS mutations observed in pedigree studies are far from uniformity and only a portion of them can be directly compared to those employed in evolutionary studies. For instance, in the Costa-Rican study he cites I counted 3 phylogenetically informative mutations in 289 transmissions while he considers all 7 reported. If he reassesses the value of basic events in the mitochondrial molecule, he should supply the claim with appropriate arguments including data, simulations and math behind.

* The set of mutation observed in pedigree studies is a very small subset of all control-region sites known to mutate. Both sets clock and both can be used to estimate ages. What does he do when “revisiting” Eve’s age? Just applies his new rate (with confidence interval) to the age of Eve found by scientists and expressed in HVS mutation events. Which set was used by the scientists? The greater one. One with a mutation rate that cannot be estimated by pedigree studies, unless they lack an adequate funding. Narrowing the mtdna tree to the set of pedigree-estimated sites, we have to deal with a phylogeny with uncertain number of mutation events: for instance, positions such as 16095 of 16311 frequently found to vary in pedigrees, may quietly mutate 100 times in the way from any modern molecule to the Eve’s one, and only 10 of such changes, say, can be reconstructed in the tree. Even if the pedigree rate (10 times faster than one expected by the evolutionists) is the true rate for 16311 and similar sites, which tree we should use for the reconstruction? The tree now in use (one of Mannis van Oven, for instance) assigns the consensus state for such hotspots and doesn’t list the full set of their changes which appears to be unreconstructable. That’s the reason why most authors prefer coding-region estimations over control-region ones, at least for deep clades.

Nevertheless, even this small subset can be employed and useful, but with far more sophisticated techniques.

Regards
***

If you look at the criticism from a respected person with PhD on Population Genetics, who is from the scientific community, you will notice that in the first paragraph itself, two important points are well established.

  • Accepts the discrepancy between pedigree-based and evolutionary mutation rates
  • I am not fit to talk about it.

I will leave the “I am not fit to talk about it.” part because that shows his pride but that is not my concern. For some reason, he believes people in other disciplines cannot and/or will not be able to understand the scientific details of his discipline. As it much clear, the incapability is (falsely) primarily established in a deceitful manner to show that I disagreed with non-observable evolutionary mutation rates, pointing out that other authors agree on it.

Science is based on observation and experimentation. It is an accepted fact that non-observable evolutionary mutation rates (based on the human-chimp split) do not agree with observable direct pedigree mutation rates.

In the second paragraph, the tone of the statement “known-to-mutate-evolutionary” itself implies, no weight-age is given for observation in direct pedigree method, instead still assumes that evolutionary mutation-rate must be correct and one of the reasons for the disparity in mutation rates is because of the inability to extrapolate what they know about the families where the data was collected. It’s like saying, human/chimp must be true because I believe in it even though the observable evidence say contrary to it because that observable evidence must be wrong in some way.

In the third paragraph, as John says, let us consider only 3 mutations out of 289 transmissions, which gives a probability of 0.01038 (or 1 mutation every 96 generations). Hence, with 8.5 mutations from Eve, we have 96*8.5 = 816 generations. Taking 20 years per generation, the total age of Eve is only 816*20 = 16320 years, which is well within my conclusion. I don’t understand what John achieved by saying I took 7 mutations out of 289 instead of 3, even though all 7 are mutations. But still, 16320 YBP is nowhere near 200000 years mentioned for Eve using human/chimp split evolutionary rates.

In the fourth paragraph, I am amazed that John still insists in saying evolutionary rates as useful even though it greatly differs from observable evidence using the direct pedigree method. John then goes on to say that I am using the mutation rates which can be observed in direct-pedigree experiments which is much less and the scientists are using non-observable evolutionary rates which are greater. So, what is the true reason behind saying “coding-region estimations over control-region ones”? The coding region may not provide correct values in the direct pedigree method as we saw in the blog because the mutation in coding region is dangerous and the person may not survive giving an increased year if the coding region is included in the estimates. Hence, by wanting to use coding region estimates, what John is really saying, is to use evolutionary mutation rates because coding region estimates can be done only by assuming human/chimp split did happen.

I don’t understand how a mutation rate through observable direct-pedigree method does not have any weight-age over the non-observable human/chimp split based method. I thought scientists will disprove human/chimp split in evolution using the direct-pedigree experiment observations but rather, they all seem to twist the evidence and giving lame reasons like not knowing more about the families where data is collected. Shame on them!