DNA Results

by Philip Shaddock

I am fortunate that I came into genealogy from an earlier study of fish genetics. The common guppy, poecilia reticulata, was an early model organism for genetics. It has the unique ability to vary its color patterns, a fact that brought it to the attention of geneticists in the early part of the last century. They discovered that guppies in one stream in Trinidad had a different color pattern than guppies in a neighboring stream and thereby found a useful model organism for exploring evolutionary change. And these males passed their unique genetic visual signal to their sons in genes found on the male Y chromosome. Guppies wear their genetic make-up on their skin. Today the fact the guppy broadcasts its paternity as a unique color pattern is still used in the evolutionary development studies. It is said that the discovery that certain traits are strictly linked to the Y chromosome, and passed on exclusively by a father to his sons is said to have been first discovered in guppies. 

Although human males and females also pass on signs of their paternity in visual signals like the color of hair, shape of the skull or nose, complexion, height or other characteristics, this is too crude a tool for scientists. Fortunately we have DNA testing as a much more accurate tool. Males pass on a copy of their Y chromosome exclusively to their sons and every 144 years or so a copying error produces a mutation on that Y chromosome. Those mutations are invisible to the naked eye, but they can be visualized using DNA testing. 

The visual signal of membership in the Shattocke - Parrish - Byars family group is a mutation on the Y chromosome of males. All Shattocke, Parrish and Byars direct male descendants have a single letter genetic code mutation on their Y chromosome called Y16884. (The FTDNA testing company uses a different marker called Y16895). That mutation can be tested for. If you are Y16884 or Y16895 positive you are a Shattocke and not one of the other seven billion other descendants of genetic Adam. Southern U.S. Parrishs and Byars are our genetic cousins by virtue of the fact they have inherited this SNP (single nucleotide polymorphism).

There is another mutation carried by descendants of a single colonial Parrish male that acts as another signal of Parrish and Byars paternal ancestry. Sometime in 17th century Virginia Y16884 / Y16895 male was born with a new SNP mutation, A8033. So we now know that if you carry the Y16884 and A8033 mutations you are not only a descendant of the common ancestor of Shattockes, you have a unique set of markers that identifies you as a Parrish or Byars descendant of the common Shattocke ancestor.

All you have to do to discover if you are a descendant of that single immigrant to Virginia, is test for the Y16884 and A8033 SNPs. Testing for the Y16884 SNP determines if you descend from the common ancestor with Shattockes. (You can order the test from YSEQ for $18 plus $5 shipping.) Testing for the A8033 SNP will tell you if you belong to a group of Parrishs who originate from colonial Virginia. (Order the A8033 test for an additional $18.) Consult me if you order these tests.

Later a Parrish male child was born that had yet another new mutation, another copying error, when the Y chromosome was duplicated. It is called Y19410 and it is a unique identifier of this individual's descendants, useful if your genealogical research using historical documents has hit a brick wall. Testing for the Y19410 SNP will identify a male descendant as belonging to a branch of the Virginia Parrishs who descend from John Parrish (ca. 1692), the probable man who had the Y19410 mutation. These two simple DNA tests, in combination with other genetic markers I will discuss shortly, have been used to build and validate a family tree for the colonial Virginia Parrishs who share a common ancestor with Shattockes. The same simple tests can also be used to determine lines of descent from the other branches and sub-branches of Shattockes. Consult with me and I will advise you.
The guppy's color patterns are a handy visual indication of the test guppy's paternal heritage. The DNA test does the same thing for humans. It makes our paternal ancestry visible. The tree at the very top of this page is a visible graphic of Shattocke, Parrish and Byars ancestry, all males who inherited that original Y chromosome of a male who lived about 1360 AD. Think of them as being branded with the Y16884 mutation to set them apart from all other possible ancestors.

The male Y chromosome is a like a book where the genetic history of our particular branch of the human family is stored. But it does take some skill to read it.

Paternal Testing

My research focus is the Shattocke surname, and the descendants of the Shattocke common ancestor, which includes the Parrish and Byars surnames. Since males pass Y chromosomes exclusively to their sons, Y chromosome DNA tests (YDNA) are best for for deep surname research. These tests include the YDNA-37 to -111 tests at Family Tree DNA and the Alpha, Beta, Delta and Gamma tests at YSEQ.NET.  

You can also test for paternal ancestry using an autosomal test, that looks at the other chromosomes. But you cannot do deep surname genealogical research with autosomal tests such as Ancestry.com's DNA test, 23andMe's test, Living DNA or FTDNA's Family Finder test. Autosomal tests are only useful for maternal or paternal lineages in the last four or five generations. You only share .00304% DNA with your seventh cousins. Trying to do deep ancestral surname research (i.e prior to 1800) with autosomal tests is very difficult and limited in scope. 

The problem with using autosomal testing for deep ancestral research is that every individual, except perhaps identical twins, have a unique combination of DNA fragments they have inherited from their mother and father. Even siblings only share 50% of their autosomal DNA. My father had an English background mixed with a little Dutch, French Huguenot and Portuguese ancestors on his maternal side. My mother had a mixed French and English background. My siblings would have inherited different amounts of these DNA fragments. My father's English background originated in Devon in the English southwest. My mother's English background originated in Liverpool in the English northeast. So what part of England is in my autosomal DNA?  I am a soup of these mixed fragments, so trying to determine my precise paternal origins from an autosomal test is difficult if not impossible. That is why my study of the deep ancestry Shattocke and Parrish / Byars surnames is restricted to the Y chromosome. I do use autosomal testing, but only for exploring the last three or four generations. I also use it to provisionally rule out family connections in the past two hundred years.

Autosomal testing is useful for sorting out relationships in the last few generations, but it is useless for determining deep ancestry prior to 1800. The exception to this rule are people who are descended from ancestors who have lived for a very long time in a single location. A study of British genetic origins (The Fine-Scale Genetic Structure of the British Population, Nature vol.519, pp. 309–314, 19 March 2015) using autosomal testing restricted candidates to participants who were all white British, lived in rural areas and had four grandparents all born within 50 miles (80km) of each other.  Both paternal and maternal ancestors had to be from that location. There is a reason they never sampled people from the former English colonies. English immigrants to other parts of the world, like my English ancestors, married spouses from different parts of England and different parts of the world and their descendants have a very mixed background. In fact, my half-brother Robert has a very different ancestral background than me, since his mother was from Scotland. The Family Finder autosomal test he took shows his ancestral origins to be totally different than mine, despite the fact we share a common father. We share 100% of YDNA (DNA derived from the male Y chromosome.) But we only share 25% autosomal DNA derived from both parents. 

The testing company Family Tree DNA, using our autosomal DNA, gave each of us our ancestral origins in a table. 

 Philip Shaddock Robert Shaddock
 West and Central Europe 60% 
 British Isles 18% British Isles 40%
 Scandinavia 9% Scandinavia 34%
 Southeast Europe 3% Southeast Europe 6%
 East Europe 11% 
  Iberia 20%

You can see that even though we share a surname and even though we share a father, our mothers contributed autosomal DNA that gives us entirely different ancestral origins. My background is weighted to west and central Europe (60%) and his is weighted to Scandinavia (34%) and southern Europe (Iberia 20%). He has twice the amount of British Isles DNA (40%) than I do (18%). 

But since we both inherited our male Y chromosomes from our father, the YDNA test results tell a very different story. Of the 111 markers tested by FTDNA, all of them are identical between us. Our Shattocke paternal ancestry traces back to a very small area in Somerset, the south west county of England. Even though Robert shares the maternal DNA that our father inherited from our mother, the fact he had a different mother than me makes his "ancient origins" completely different than mine when autosomal DNA is used. It is only when the test is restricted to the Y chromosome we inherited from our father, who in turn inherited his Y chromosome from our grandfather and so on up the tree, that we find the path back to the ancient origins of the Shattockes. 

Some people try to use ancestral origins from autosomal testing companies like Living DNA to make claims about the geographical origin of a surname. But it is clear you will get even get different "ancient origins" for different members of the same family. In the eleven generations that English immigrants have lived in the former English colonies, the marriages of Shattocke or Parrish males to women from other parts of England, and indeed from other parts of the world, have created a genetic soup of individuals with the same surname but vastly different autosomal "ancient origins." In fact, the only origins they have in common are found in a small part of the male Y chromosome, passed down father-to-son since genetic Adam in Africa 70,000 years ago, give or take 10,000 years. 

But this is not the only reason why you should not use autosomal results for determining the geographical and ethnic origin of your surame.  On top of this the paternal origins you get from these companies is inaccurate. Autosomal testing companies like Living DNA claim to be "highly accurate" in their determination of your ancestral background from your DNA. But even this claim appears to be overstated. Using her own well-documented genealogical studies, Roberta Estes, an expert genetic genealogist, compared her genealogically determined origins to that derived from her DNA by Living DNA. Read how far off the Living DNA report was: https://dna-explained.com/2017/05/04/livingdna-product-review/

I had Ancestry.com conduct an autosomal test on me as well. Here is what they reported as my "ethnicity estimate:"

Great Britain 33%
Europe West 28%
Ireland/Scotland/Wales 20%
Iberian Peninsula 9%

It is significantly different from my FTDNA report. 

There is one more final reason why you should not trust the ancient origins reports from autosomal testing companies. The technology they use to test autosomal DNA is error prone. Read this from one testing company at a DNA conference. The testing company was asked if identical triplets should have identical ancient origins results.

They analyze about 700,000 SNPs. 97-98% of those SNPs will have data. Odds that you will get a result exact every time is highly unlikely. Because they are not analyzing every datapoint, they may not be identical. From a non-lab perspective, E. said it runs the sample in replicate many many times. It runs about 20 different replicants and averages the results. Even the exact same sample will have slight variation.

I completely ignore these reports. So why do testing companies publish them? Because they attract customers looking for an easy and quick way of determining their ancestral origins. But that can only be achieved by the combination of painstaking and detailed genealogical research and deep analysis of the raw data of DNA results. A computer algorithm mining autosomal DNA for the answer to the question about ancient origins has not yet been designed and tested and the technology is just not at the point where you can get absolutely accurate results.

Contact me before purchasing a DNA test. We know a lot about Shattocke (and Parrish, Byars) genetics because so many of our genetic cousins have tested.  If you believe you are a Shattocke, Parrish or Byers descendant, I can recommend the most cost effective test for your paper trail or to validate a family legend about your origins. 

The Two Types of Y-DNA Tests

We test two different types of mutations on the male Y chromosome. One type of mutation is called an STR (Short Tandem Repeat) and the other is called an SNP (Single Nucleotide Polymorphism). The STR mutation occurs when a segment of DNA is repeated: gatagatagatagata. In this simple example the genetic code "gata" is repeated four times. When I talk about "repeats" in the explanation below, this is what I mean. The SNP mutation is a mutation of a single letter in the genetic code of the Y chromosome. For example, a letter of the genetic code like "t" changes to a "c." The practical difference between these two mutations is how often they mutate. STR mutations are very active and can either change by adding a repeat or subtracting a repeat (a back mutation). They are very useful for studying DNA in the last five hundred years. But because the number of repeats change back and forth, plus or minus, STRs are not reliable sources of information over very long periods of time. SNP mutations on the other hand are much more rare and stable. They are more useful for looking further into the past. They are also useful for more recent generations but occur more infrequently, once every 144 years on average (Big Y test). 

STR Testing

Scientists have identified specific locations on the Y chromosome to test to determine paternity. These are called markers. Markers are just STRs that are special because they are known to be useful for determining paternity. Instead of having to test all the STRs on the Y-chromosome, they select a relatively small set of markers. They are well known locations on the Y-chromosome where the genetic history of male humans can be read. 

FTDNA provides their tests at different levels: 12, 25, 37, 65, 111 markers. This refers to the number of markers they test. The minimum useful level is 37 markers. The more markers you test the more accurate the resulting family tree. When I am testing potential matches, I test people at the 37 marker level to determine if they share a common ancestor with all other Shattockes, Byars and Parrishes. Then I upgrade to a higher level test. You may wish to go to the highest level right at the start. 

SNP Testing

SNP testing uses a form of sequencing called Next Generation Sequencing (NGS). This form of testing produces about 1,000 SNP variants useful for genealogical purposes. I use a couple of dozen to determine what branch of the Shattocke family you belong to. You can also extract up to 500 STRs from NGS data, vastly expanding the number of STR markers you can use to determine relationships among people in a Shattocke or Parrish / Byars sub-branch. The price of a NGS test varies between $425 and $750. 

An important caveat is that you may not be genetically a Shattocke, Parrish or Byars! That is a risk you take in genetic testing. You have to be prepared emotionally for that possibility. For perspective, of the 70 Shattucks, Byars and Parrishs who have tested, 9 have come back as belonging to another branch of the human family. That is 13% of the total, which is close to the average of 15% for all west European families. You may be certain of your ancestry in the past several generations, but it is possible you are descended from a much more ancient NPE (non-parental event). This has proven to be the case for most of the 9 results that have come back showing an NPE had occurred. For this reason it might be best to start testing with an FTDNA YDNA-37 test. You can decide at that point if you want to proceed deeper into the mystery of your ancestry. And testing with 37 markers will give you an idea of what surname may be lurking in your past. Again, contact me for my advice. 

Individual Marker Testing

There is an even less expensive way of determine if you are descended from the Shattocke common ancestor. I have developed the genetic profile of our family to the point where you can test for the existence of a specific marker $15 (STR) or $25 (SNP). If you test positive for that marker we know that you belong in the tree and perhaps the specific branch of the tree. Negative and you belong to some other unknown branch of the human family tree. At that point you can decide if you want to proceed to a more comprehensive test. Contact me.

Analyzing the DNA Results

When the testing company returns the results from a YDNA test, it is in the form of a table full of numerical values. I have developed three different spreadsheets that color code these results for study.  I do not publicly share the spreadsheets for privacy reasons. You have to join the Shattock Shaddock Parrish Byars Google group to obtain a link to the spreadhseets or you can contact me directly.  I use this table to compare the results with other Shattockes, and our cousins the Byars and Parrishs, as well as several even more distantly related people that show up as matches in the FTDNA table of matches. I use the table to construct a family tree. 

Along with some Parrish co-admins, I run an FTDNA project where the results of YDNA tests are posted publicly. This may be all your need, at least in your initial studies.

My principle method for finding how descendants of our common ancestor separated into branches of the family is to compare the results to each other. If two individuals have the same mutation they are likely to have descended from a common ancestor. It is in the comparison of the results that the branching of the tree is found. A single set of results for one person tells you very little.

This page discusses the analysis of STR results. I have also included a section on analyzing autosomal results. 

Table Comparing Match STRs

When you get your results back from one of the YDNA 37 to 111 tests back from FTDNA. I transfer this information to a spreadsheet that includes additional information and information from other testing companies.

In the spreadsheet I group people that I think might belong to the same branch of the Shattocke - Parish  - Byars tree. The organizing principle governing the spreadsheet is very simple. If a number of DNA samples of descendants have the same signature marker, then they must have descended from a common ancestor. Each branch of the family tree will have its own signature markers.   The problem is determining which markers are "signature markers" for each branch of the family tree. All that is meant by "signature" marker is that the marker has a number of repeats that are unique to a branch of the Shattocke family.

In order to identify signature markers I look at how frequently the STR marker has changed in its value, what its most common values among ancestors was, what its most recent ancestral value was compared to now and which branch of the human family that marker is commonly found in with the current number of repeats.

Analyzing SNPs

The FTDNA Big Y test is expensive, regularly priced US $575, but sometimes on sale for $425. But it finds and tests thousands of locations on the male chromosome. At each of those locations is one of four letters of the genetic code: A,G and C,T.  The letter reported at each of these locations are compared to a reference set of SNPs, the common values found at each location. Software finds and identifies the SNPs that vary from the common values. When a variant is found that is shared between two people, the SNP is named (like Y16884) and catalogued. 

Here is how to think about SNP results. The term SNP (single nucleotide polymorphism) simply refers to a change in an individual letter of the genetic code. A "C" might have changed to a "T." In reports this change is indicated this way: C>T. C is what is found in the reference genome, T is the value found in the person being tested. Unlike STRs SNPs are very stable and very rare. 

The diagram on the left shows lists of SNPs, with ZZ45 at the very top.

SNP analysis is quite simple. We look at all your SNPs and see which ones you have that are similar or different from other people. At the highest branch in the human tree you only have a few mutant SNPs in common with almost everybody else. As you walk down the tree you find people who share more and more SNPs.

What you see at left is a capture from software that shows results of this analysis. It shows the SNPs found from Big Y testing of me and my genetic cousins. (The surnames of the people tested are found at the bottom of the diagram.) 

Now look at the third brown rectangle down from the top of the diagram. See all those numbers, beginning with Y16889? These are the SNPs that are used to define how our family tree branches. Branching of the human family tree occurs when the SNPs that are common to a set of individuals are identified. So the four SNPs (Y16889. Y17162, PH2997 and Y17161) in this brown box are held by all Parrishs and Shattockes in common, plus a very distant relative Strang. The common ancestor we have with Strang goes back about 4000 years. 

That long list of numbers found in the next brown rectangle, beginning with Y16885 and ending with Y17159, are the SNPs that Shattockes and Parrishs hold in common, but NOT Strang. The long list of numbers is also a measure of how much time has passed since the common ancestor lived, because SNPs occur roughly every 144 years (Big Y testing, shorter for other tests).

The diagram tells us that the branching occurred in the 15th century. You can roughly count time duration by counting the SNPs and multiplying by 144 years, a rough, back of the hand method of calculating the frequency of SNP mutations.  

In the case of the Parrishs, SNPs A8033, A8034 and A8035 are held in common by the three Parrishs that had been tested when I captured this screen grab some time ago. Then another split occurs and two Parrishs (N89266 and 259094) form their own branch with two SNPs unique to them. (If you do the calculation you see they split off from the other Parrish around 1800). 

The three Parrishs who have tested hold more SNPs in common with each other then with Shattockes.

Turning to the "Shattocke Variant SNPs" spreadsheet, the basic rule of thumb in SNP analysis is to find two people in the same branch who share one or more mutant SNPs. 

In the example at left, there are two people who have the A>G variant, Y19716. So you can assume they have a common ancestor. That turns out to be true. These two individuals have a common ancestor who lived in North Molton, Devon in the early 16th century. Y19716 is the SNP that defines the North Molton Shattockes. 

Notice in the column to the right that three individuals have the Y268 "T>T" SNP. These individuals belong to completely different branches of the Shattocke - Parrish - Byars family. The problem with NGS results is that they are bit like swiss cheese. In actual fact everybody in the family probably has the Y268 SNP but many are missing the SNP in the results.

In the case of the Y1940 and Y1941 SNPs (shown in green) they form a sub-branch of the A8033 Parrish branch of the family. 

That is all there really is to SNP analysis. The advantage of SNP testing is that the SNPs are rare and stable. You only need one to define branching in a family. And you can use them to find the date of the common ancestor among a group of related people. The disadvantage is the long time between mutation events. STRs are more useful in the last 500 years. 

I use both STR and SNP data to determine the branching of the Shattocke tree. If you look at the STR spreadsheets, you will see that individuals with the Y19410 mutation also share signature STRs. For example the North Molton Shattockes share the DYR60=18 marker. That makes sense because the mutation that changed the number of repeats for this marker occurred after the formation of the North Molton branch of the family.

Genetic Distance

In FTDNA's results a "genetic distance" value is often given. This is a somewhat misleading term. It doesn't actually measure how distantly related you are to the other person. It merely measures how many of your markers are different. The more markers you have that are different, the more likely that you are more distantly related.

The following table provides an age estimation for the first 37 markers in DNA results. For example, when two people have exactly the same markers in their first 37 markers (i.e. a genetic distance of zero), they are related within the last 330 years. Remember that these estimates are based on average mutation rates for the markers. Marker mutation is a random process. So only part of the time will the estimates be accurate, most of time they will good approximations and some of time they will be wildly inaccurate.  

GD = 0 : 0 - 330 years
GD = 1 : 30 - 570 years
GD = 2 : 60 - 660 years
GD = 3 : 90 - 840 years
GD = 4 : 150 - 990 years
GD = 5 : 210 - 1140 years
GD = 6 : 270 - 1290 years

GD = 0 : 0 - 270 years
GD = 1 : 0 - 480 years
GD = 2 : 30 - 510 years
GD = 3 : 60 - 630 years
GD = 4 : 120 - 750 years
GD = 5 : 150 - 840 years
GD = 6 : 210 - 960 years
GD = 7 : 240 - 1080 years
GD = 8 : 300 - 1170 years
GD = 9 : 360 - 1290 years

GD = 0 : 0 - 150 years
GD = 1 : 0 - 150 years
GD = 2 : 30 - 330 years
GD = 3 : 30 - 390 years
GD = 4 : 60 - 450 years
GD = 5 : 90 - 540 years
GD = 6 : 120 - 600 years
GD = 7 : 150 - 660 years
GD = 8 : 180 - 720 years
GD = 9 : 210 - 780 years
GD = 10 : 240 - 840 years
GD = 11 : 270 - 900 years

The more markers you have to compare, the more accurate you can make the estimation of when you shared a common ancestor with another person. 

This explanation of DNA testing is simplified. I welcome comments and suggestions that will help me improve it.

Analyzing Autosomal Results

When evaluating DNA results from either an Ancestry.com DNA test or FTDNA Family Finder (both autosomal tests) the following table might be helpful.

After you have looked at the table, come back and read this blog.  You will see why I use YDNA testing and not the autosomal tests by Ancestry.com or the Family Finder autosomal test.

Robert Shaddock, my half-brother, was flagged as a half-sibling or grandparent/ grandchild by the FTDNA matching system. The "measure" of our relationship was 2027 centimorgans. For perspective a parent / child relationship is roughly 3385 cM. I have a match to a 4th cousin who is 82 cM, but I have seen other matches between other people that are 4th cousins that are 41 cM apart. The minimum value for a distant relationship is considered to be 7 cM. These measures of genetic relationship are not actual physical measurements of chromosomes. They are similar to "genetic distance" in measuring YDNA matches. Variables can alter how accurately they reflect generational distance between two people. They also tend to vary somewhat from one family to another. So just use them as a rough guide.

Here is a table provided by ISOGG (International Society of Genetic Genealogy)

Average autosomal DNA shared by pairs of relatives, in percentages and centiMorgans
% sharedcM half-identical (or better)RelationshipDegree of relationshipNotes
100% (Method I)/50% (Method II)3400.00Identical twins (monozygotic twins)Degree 0Fully identical everywhere.[2]
50%3400.00Parent/childDegree 1Half-identical everywhere
50% (Method I)/37.5% (Method II)2550.00Full siblingsDegree 1Half-identical on 50%/1700cM and fully identical on a further 25%/850cM.
25%1700.00Grandparent/grandchild, aunt-or-uncle/niece-or-nephew, half-siblingsDegree 2
25% (Method I)/23.4375% (Method II)1593.75Double first cousinsDegree 3Half-identical on 21.875%/1487.5cM and fully identical on a further 1.5625%/106.25cM
12.5%850.00Great-grandparent/great-grandchild, first cousins, great-uncle or aunt/great-nephew or niece, half-uncle or aunt/half-nephew or nieceDegree 3
6.25%425.00First cousins once removed, half first cousins, great-great-aunt/uncle, half great-aunt/uncleDegree 4
6.25%425.00Double second cousinsDegree 5
3.125%212.50Second cousins, first cousins twice removed, half first cousin once removed, half great-great-aunt/uncleDegree 5
1.563%106.25Second cousins once removed, half second cousins, first cousin three times removed, half first cousin twice removedDegree 6
0.781%53.13Third cousins, second cousins twice removedDegree 7
0.391%26.56Third cousins once removedDegree 8
0.195%13.28Fourth cousins, third cousins twice removedDegree 9
0.0977%6.64Fourth cousins once removed. third cousins three times removedDegree 10
0.0488%3.32Fifth cousinsDegree 11
0.0244%1.66Fifth cousins once removedDegree 12
0.0122%0.83Sixth cousinsDegree 13
0.0061%0.42Sixth cousins once removedDegree 14
0.00305%0.21Seventh cousinsDegree 15
0.001525%0.10Seventh cousins once removedDegree 16
0.000763%0.05Eighth cousinsDegree 17

Notes to Table

  • There is no variation between families in the parent/child or identical twins shared cM figures; beyond these immediate relationships, recombination results in random variation around the average figures above from one pair of individuals to another.
  • When a grandchild is compared to a grandparent, the shared cM with the other grandparent on the same side is easily inferred. The grandchild gets all 3400cM of, say, his paternal autosomes from his father. If it is seen that 1600cM of this came from the paternal grandfather, then the other 1800cM must have come from the paternal grandmother. The initial estimate of 1700cM shared by grandchild and paternal grandmother can thus be updated to 1800cM when it has been ascertained that grandchild and paternal grandfather share only a below average 1600cM.
  • When the subjects of the comparison descend from identical twin children of their most recent common ancestral couple, then the figures in the above table should be doubled.
  • The expected % shared for a half-relationship will always be exactly half of the expected % shared for the corresponding full relationship.
  • A similar method to that used for full siblings and for double first cousins can be used to compute expected shared percentages for any two subjects of comparison who are doubly related. However, the expected % shared for a double relationship can be slightly less than the sum of the expected % shared for the appropriate single relationships.
    • If Jack is related to both of Jill's parents, then Method I and Method II will give slightly different figures, as double cousins of this type are expected to be fully identical in some regions.
    • If Jill is a more remote descendant of spouses who are both related to Jack, then Jill will clearly have inherited at most one of the two segments in regions where the child of those spouses was fully identical to Jack. This reduces Jack and Jill's expected % shared slightly from the ballpark figure obtained by adding the expected % shared for the two relationships.
    • For example, double second cousins, where the double relationship arises because at least one is related on both the paternal side and the maternal side to the other, are expected to share 3.125% (1/32) on each side, or 6.25% (1/16) in total, using Method I. Using Method II, a small adjustment must be made to allow for regions where they are fully identical (1/1024 or approximately 0.098%), so that they are expected to be half-identical or better on 63/1024 or approximately 6.152%.
    • On the other hand, double second cousins who are children of double first cousins are expected to be half-identical on a quarter of the approximately 23.438% on which their parents are half-identical or better, in other words on approximately 5.859%.

Theoretical probabilities

The content of the following two tables is derived from Table 1 in the paper The probability that related individuals share some section of genome identical by descent by Kevin P Donnelly, Statistical Laboratory, Cambridge University, Cambridge, England. (Source: Theoretical Population Biology 1983: 23, 34-63)

How many cousins do we have?

Although there is only a low chance of sharing enough DNA with a specific distant cousin for the relationship to be detected, we have a large number of distant cousins and so many of these more distant cousins will appear in our match lists. The following table from the paper Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples by Henn et al (2012) shows the expected number of cousins at different degrees of relationship and the expected number of detectable cousins along with the expected amount of Identical by descent (IBD) sharing if the relationship is detected.

How many cousins.jpg

Mathematician and genetic genealogist Paul Rakow has done his own computer simulations on family sizes and has published the results in an essay on Counting cousins (published online 31 March 2016).

A study by AncestryDNA, based on British birth rates, census data, parliamentary research briefings[3] and other sources for the last 200 years, produced the following statistics on the number of cousins that the average British person would be expected to have.[4]

RelationshipNumber of cousins
First cousins5
Second cousins28
Third cousins175
Fourth cousins1,570
Fifth cousins17,300
Sixth cousins174,000

It is not clear if these statistics relate to the whole of the United Kingdom or just England and Wales.

From Debbie Kennett, Administrator of the Devon DNA Project

It is quite common to find that you have a match in a database that is not shared by your parents. This generally happens with the matches on smaller segments where it’s more difficult to predict the relationships and also because at Family Tree DNA and GedMatch we are dealing with unphased data. Phasing is the process of sorting the alleles onto the maternal and paternal chromosomes:


The lack of phasing can produce false positive matches and can make a segment appear longer than it actually is.

The last time I checked about 22% of my matches at FTDNA did not match either of my parents, but these were all the small matches in the fifth to distant cousin range which are generally not worth pursuing. If you match someone at FTDNA and the largest segment is 9 cMs or less it is not declared a match unless the total cM sharing is 20 cMs or more. However, the 20 cM count is often made up of small pseudosegments under 5 cMs which is why a child will often get matches that a parent doesn’t.



It’s best to ignore segments under 5 cMs as they are mostly just noise. It’s also best to work with matches where the longest shared segment is 10 cMs or larger. The vast majority of matches below this threshold will be too far back in time for you to find a genealogical relationship.