03 October 2022
How did FamilyTreeDNA estimate the birth year of a military commander that served with Sir William Wallace in the Battle of Falkirk?
If you are serious about genealogy research using Y-DNA, you might have heard about the new FamilyTreeDNA Discover™ tool. You might have also heard about the first beta release of the TMRCA (Time to Most Recent Common Ancestor) estimates for Big Y. These age estimates are important for genealogists because they put a time perspective on the Tree of Humankind.
Big Y-700 is a powerful tool for genealogy because it connects all human genetic males into a large family tree. If you belong to the same branch of the tree as someone else, you share an ancestor on both your direct paternal lines. That often makes it easy to identify the most recent common ancestor and it can support or disprove genealogies. But sooner or later, you will reach a point where historical records are no longer available. Or, maybe you are just curious to know how far back you can trace your lineage. This is where the age estimates come in and can point you in the right direction!
Feedback about Age Estimates
Discover was released just two months ago, and we have been inundated with positive feedback from users. Some have posted pictures of their ancestors’ tombstones with birth years matching closely with their TMRCA estimate. But others have commented that some estimates are different from what they expected based on genealogical, or archaeological (ancient DNA) evidence. We took that feedback and went back to work to improve the TMRCA estimates. Thanks to Dr. Paul Maier and the R&D team, we present the second beta release of the TMRCA estimates. This update addresses, in particular, feedback about some estimates being younger than expected. We are happy to announce that the first major update to the TMRCA algorithm is now available in FamilyTreeDNA Discover!
A New Way to Tackle Tree Paradoxes
This update introduces a new way to tackle tree paradoxes. These paradoxes occur when some of the tree stems do not add up to the same length. This can result in time inconsistencies, like a child haplogroup estimated to be born before its parent haplogroup.
Tree paradoxes occur when some of the tree stems do not add up to the same length
Most models for age estimates that we see in the genetic genealogy world are based on a Strict Clock assumption. These models assume that every lineage in the tree accumulated mutations (SNPs) at roughly the same rate. When this is not the case, the resulting inconsistencies can be left unresolved or adjusted using various statistical models.
But the strict clock assumption is not always appropriate for the human Y chromosome. Looking at the Tree of Humankind from both the macro and micro scales, we can observe differences in stem lengths. From there, we see that some lineages have a much longer list of mutations than others. If you have used the Block Tree, you may have seen these differences in stem lengths (“SNP height”) for yourself. Differences in the rate of accumulated mutations across a tree are well known in phylogenetics. These outliers can often be attributed to rapid changes in population size or environmental factors. This is where Relaxed Clock models come in. Relaxed Clock models are aware of the possibility of rate differences in the tree. They resolve inconsistencies by comparing stems along the whole tree and adjusting those that do not fit.
With that being said, we added a relaxed clock step to our TMRCA pipeline. We’ve seen great improvements for both our modern DNA (known genealogies) and ancient DNA (carbon dating) validation sets. The new beta estimates are already live on the Discover site for you to see for yourself!
Examples of Age Estimates
You may still be a little curious how this works. We have provided two examples of changes from the first TMRCA beta release. One famous historical example based on genealogy information. The other is based on ancient DNA and radiocarbon dating. Please note that the TMRCA estimates are based purely on genetic data and self-reported birth years of present-day FamilyTreeDNA customers. They have not been adjusted or calibrated to fit with any other data.
Famous Historical Genealogy Example from Scotland
Sir John Stewart (of Bonkyll) was a Scottish knight and son of Alexander, the 4th High Steward of Scotland. Sir John’s exact birth year is not known, but it has been estimated to be about 1246. He died on July 22, 1298, while serving as a military commander together with Sir William Wallace at the Battle of Falkirk.
John Stewart of Bonkyll Gravestone – s781.org
We know, from genealogy and DNA testing, of his immediate relatives, that he is the most recent common ancestor (MRCA) of Y-DNA haplogroup R-S781. The old historical date and genealogical precision make S781 a great test case for the TMRCA algorithm. It turns out that a rate shift on the tree makes his TMRCA harder to estimate. Let’s take a look.
Here is the previous Haplogroup Story for R-S781 (accessed August 30, 2022):
“Haplogroup R-S781 represents a man who is estimated to have been born around 550 years ago, plus or minus 150 years. That corresponds to about 1500 CE with a 95% probability he was born between 1331 and 1578 CE.”
Now wait a minute! Sir John died in 1298. But this paragraph suggests he was born at least 33 years later, and more likely not until 1500. Let’s see how this changed with the update.
“Haplogroup R-S781 represents a man who is estimated to have been born around 800 years ago, plus or minus 200 years. That corresponds to about 1250 CE with a 95% probability he was born between 1038 and 1393 CE.”
Following the update, the TMRCA estimate is now well centered around the expected birth year of Sir John Stewart.
An Ancient DNA Example from the Corded Ware Culture in Bohemia
Another interesting example is the ancient DNA sample PNL001 (Plotiště nad Labem 1). The sample is from a man associated with the Corded Ware culture. He died at age 25-30 in present-day Bohemia, Czech Republic. His remains were DNA tested and directly carbon dated to between 2914 and 2879 BCE (about 2900 BCE) with 95% confidence (Papac et al., 2021).
Corded Ware Pottery – Einsamer Schütze, June 28, 2011 https://bit.ly/3xbhaBf
Genetic analysis shows that he belongs to haplogroup R-U106, so this can suggest a lower boundary for the haplogroup TMRCA. If PNL001 is a descendant of U106, then U106 must have lived before PNL001.
Here is the previous Haplogroup Story for R-U106 (accessed August 30, 2022):
“Haplogroup R-U106 represents a man who is estimated to have been born around 4,500 years ago, plus or minus 600 years. That corresponds to about 2400 BCE with a 95% probability he was born between 3044 and 1880 BCE.”
The PNL001 carbon date of ca 2900 BCE is within the broad range TMRCA estimate for R-U106. But the “most likely” estimate was about 500 years younger than the oldest value expected from the carbon dating. The estimate is not well centered. Let’s see how this changed with the update:
“Haplogroup R-U106 represents a man who is estimated to have been born around 4,950 years ago, plus or minus 700 years. That corresponds to about 2900 BCE with a 95% probability he was born between 3619 and 2297 BCE.”
We can’t know for sure if the PNL001 radiocarbon date is exactly right or when the U106 man truly was born. But with the new TMRCA algorithm, the new estimate is now better centered to align with external evidence.
The Future Of FamilyTreeDNA Age Estimates
We are very excited to share our updated (Beta 2) release of our Big Y age estimates!
You can help us improve the estimates by:
- Specifying birth years on your Big Y kits
- Documenting your patrilineal genealogy in your family tree with accurate names and birth years
- Linking Y-DNA matches with whom you share a known most recent common ancestor
This information is used for our validations and to calibrate the tree. Our age estimates will continue to change as new customers test with Big Y-700 and we improve the algorithm.