Interpreting Y-DNA Results

by Philip Shaddock

Please contact me for the latest spreadsheet of the DNA results of Shattockes, Parrishs and Byars/Byas.

If you do not have the Excel spreadsheet program, you can download a free spreadsheet program here.

I am going to simplify this overview of Y-DNA genetic testing. There are lots of resources on the internet for those interested in a deeper grasp of the subject. See the Contact page of this site for recommended links.

Why Focus on Male Testing?

I focus on the male Y-chromosome as the subject for testing because it is the most effective tool for surname research: the Y37, Y67, or Y111 level tests at Family Tree DNA. If you follow the link, you will get a discount at the Shaddock Shattuck project at FTDNA.

The Family Finder and autosomal tests can be useful aids but for the deep paternal ancestry I explore, but they have to be used in combination with male only YDNA tests since they can only test back four or five generations on average, up to seven or eight generations very rarely. YDNA male testing provides information about paternal ancestry all the way back to genetic Adam.

If you are interested in tracing back your mother's lineage, you should do mtDNA testing with FTDNA. If your interest is in both your parental heritages and only in the last five or six generations, you should do autosomal testing.  

The Two Types of Y-DNA Tests
FTDNA tests two areas of the male Y chromosome. The two types of tests look at two different types of mutations. One type of mutation is called an STR and the other is called an SNP. The STR mutation occurs when a segment of DNA is repeated: gatagatagatagata. In this simple example the genetic code "gata" is repeated four times. When I talk about "repeats" in the explanation below, this is what I mean. The SNP mutation is a mutation at a single point on the Y chromosome. For example, a letter of the genetic code like "t" changes to a "c." The practical difference between these two mutations is how often they mutate. STR mutations are very active and can either change by adding a repeat or subtracting a repeat (a back mutation). They are very useful for studying DNA in the last five hundred years. But because the number of repeats change back and forth, plus or minus, STRs are not reliable sources of information over very long periods of time. SNP mutations on the other hand are much more rare and stable. They are more useful for looking further into the past. They are also useful for more recent generations but occur more infrequently, once every 144 years on average (Big Y test). 

STR Testing Results

FTDNA has identified specific locations on the Y chromosome to test to determine paternity. These are called markers. Markers are just STRs that are special because they are known to be useful for determining paternity. Instead of having to test all the STRs on the Y-chromosome, they select a relatively small set of markers. They are well known locations on the Y-chromosome where the genetic history of male humans can be read. 

FTDNA provides their tests at different levels: 12, 25, 37, 65, 111 markers. This refers to the number of markers they test. The minimum useful level is 37 markers. The more markers you test the more accurate a picture of the pattern of inheritance that emerges. When I am testing potential matches, I test people at the 37 marker level to determine if they are indeed a Shattocke descendant. Then I immediately go to the 111 marker level. For an even more in depth test I go to a SNP test (called Big Y test, aka NGS testing). The Big Y test is expensive but in the long run it is cost effective for a complete examination of your genetic markers. A bonus is that you can extract up to 400 STRs from Big Y test results.  

The minimum level of testing you should do is 67 markers.

When FTDNA returns the results from Y37, Y67 or Y111 testing, it is in the form of a table. I have developed my own version of that table that provides a lot more information than found in the FTDNA version. (I do not publicly display the table, it is only available to Shattocke descendants and our genetic cousins through my newsletter.) I use this table to compare the results with other Shattockes, and our cousins the Byars and Parrishs, as well as several even more distantly related people that show up as matches in the FTDNA table of matches. I use the table to construct a family tree. 

The family tree is composed of the descendants of the common ancestor of all Shattockes, Parrishs and Byars. He had the Y16884 SNP mutation. He lived in the 15th century. 

My principle method of finding how descendants of our common ancestor separated into branches of the family is to compare the results of my relatives within the last 1000 years. It is in the comparison of the results that the branching of the tree is found. A single set of results for one person tells you very little.

Deciphering the STR Spreadsheet

In the spreadsheet I group people that I think might belong to the same branch of the Shattocke - Parish  - Byars tree. The organizing principle governing the spreadsheet is very simple. If a number of DNA samples of descendants have the same signature marker, then they must have descended from a common ancestor. Each branch of the family tree will have its own signature markers.   The problem is determining which markers are "signature markers" for each branch of the family tree. All that is meant by "signature" marker is that the marker has a number of repeats that are unique to a branch of the Shattocke family.

In order to identify signature markers I look at how frequently the STR marker has changed in its value, what its most common values among ancestors was, what its most recent ancestral value was compared to now and which branch of the human family that marker is commonly found in with the current number of repeats.

Here is a portion of the spreadsheet that I use:

  1. The spreadsheet is divided horizontally into the major branches of the family (such as Somerset/West Virginia Shattocks). 
  2. Genetic markers (STRs) are arranged in columns, beginning with DYS393. These are the markers whose values appear in FTDNA results. 
  3. I have color coded the markers to identify their relative importance.

    Green: Signature Markers: these are markers that have values unique to a branch or sub-branch of the family. They signify a common ancestor.
    Dark Orange: These also signify a common ancestor, but not all members of branch or sub-branch will share the values. 
    Other Colors: These markers differ from those of the other members but are usually unique to an individual.

  4. Legend for the colors listed in 3.

  5. ***** Mutation rate: Red stars rate how fast this marker mutates, with the slowest at 2 stars (**) and the fastest at 5 stars (*****). This is derived from the YFull database. Because tandem repeats can be gained or lost, rapidly moving STR values over a long period of time can yield deceptive results. For example a fast moving marker can go from 14 to 15 repeats, back to 14, then to 13 and back to 14. Fast moving markers over long periods of time (more than 5000 years) are not very useful, but over short periods of time they can help identify recent branches (in the past 500 years) of the family.
  6. The probable STR values for the common ancestor of Shattockes and Parrishs / Byars. The ancestral values.
  7. The ancestral value and the new value. Sometimes a repeat is gained, sometimes lost.

Important Shattocke / Parrish Genetic Markers

I use a combination of SNPs and STRs to define the branching of the Shattocke Tree. Note that what I call "STR markers" below means that not all members in the group have this marker, indicating it is volatile. 

Y16884 / Y17163: The SNP for the common ancestor of all Shattockes and Parrishs had this mutation. YFull names it Y16884. FTDNA names it Y17163

  • Southwark London Shattocks

            CDY=35-37: The STR double marker used temporarily to define this branch.

  • West Somerset Shattocks

            CDY=36-38: The STR double marker used temporarily to define this branch.

                                    Y29590: SNP indicating common ancestor between Milverton Shattocks and Virginia Shaddocks

  • Parrishs and Byars / Byas

            A8033: Parrish / Byars SNP    
                            CDY=37-39 STR branch marker
                            DFY387.2: signature marker
                            DYS518=35 signature marker
                            DYS542=16 STR marker
                            DYS720=32 signature marker

                            Y19410: SNP Defines sub-branch of A8033
                                            DYS562=19: STR marker
                                            DYS631=9: STR marker
                                            DFY387.1=30: STR marker

  • Massachusetts Shattucks

            Y19751: SNP defining the group.
                            CDY=36-37 STR marker defining this branch
                            DYS447: signature marker
                            DYS710: signature marker
                            DYS552: signature marker
                            DYR60=16: signature marker
                            DYF399.2 STR marker
                            DYR6=15: STR marker
                            DYS491=13: STR marker
                            DYS518=37: questionable marker
                            DYS719=13 STR marker

                            Y23841: SNP defining descendants of Philip Shattuck 1648-1722, son of William the founder
                                            DYS532=14 STR questionable marker for SC Shaddocks
                                            DYS612=31 STR somewhat questionable
                                            DYS518=37 STR somewhat questionable
                            South Carolina Shaddocks, who are descendants of William Shattuck the founder's son Samuel
                                            DYS464c= 16 probably a signature marker
                                            YS627=30 STR marker
                                            DYS695=34 STR marker

                            YGATAH4=12 temporary marker defining descendants of John Shattuck 1647-1675
                                            Y24059: SNP defines sub-branch
                                                            DYR88.1=18: STR marker

  • North Devon Shattockes

            Y19716: defining SNP

                                    FGC43713: North Molton Shattockes

                                    FGC53716: Yarnscombe Shattockes
                                                          CDY=35-38: STR marker for the branch
                                                          DYS576=18: signature marker
                                                          DYR60=18: signature marker
                                                          DYS452=32: STR marker for the branch
                                                          DYS712=23: STR marker for the branch

                                                          Burrington Shaddocks sub-branch
                                                          DYS452=32: signature marker for Burrington Shaddocks

Big Y Results STRs
I have extended the spreadsheet to include STR results provided by the YFull SNP interpretation service. FTDNA, the Y-DNA testing company, only provides testing for up to 111 STR markers. YFull extracts up to 375 extra STRs from the Big Y BAM file. I have added the relevant extra markers to the rows of people in the spreadsheet who have taken the Big Y test. 

How to View the Spreadsheet File

The spreadsheet I use is color coded. You will have to have Excel installed on your computer to view it properly. If you do not have Excel or a spreadsheet program that will import an Excel file and preserve the color coding, you can download a free spreadsheet, Apache Open Office here: 

Genetic Distance

In FTDNA's results a "genetic distance" value is often given. This is a somewhat misleading term. It doesn't actually measure how distantly related you are to the other person. It merely measures how many of your markers are different. The more markers you have that are different, the more likely that you are more distantly related.

The following table provides an age estimation for the first 37 markers in DNA results. For example, when two people have exactly the same markers in their first 37 markers (i.e. a genetic distance of zero), they are related within the last 330 years. Remember that these estimates are based on average mutation rates for the markers. Marker mutation is a random process. So only part of the time will the estimates be accurate, most of time they will good approximations and some of time they will be wildly inaccurate.  

GD = 0 : 0 - 330 years
GD = 1 : 30 - 570 years
GD = 2 : 60 - 660 years
GD = 3 : 90 - 840 years
GD = 4 : 150 - 990 years
GD = 5 : 210 - 1140 years
GD = 6 : 270 - 1290 years

GD = 0 : 0 - 270 years
GD = 1 : 0 - 480 years
GD = 2 : 30 - 510 years
GD = 3 : 60 - 630 years
GD = 4 : 120 - 750 years
GD = 5 : 150 - 840 years
GD = 6 : 210 - 960 years
GD = 7 : 240 - 1080 years
GD = 8 : 300 - 1170 years
GD = 9 : 360 - 1290 years

GD = 0 : 0 - 150 years
GD = 1 : 0 - 150 years
GD = 2 : 30 - 330 years
GD = 3 : 30 - 390 years
GD = 4 : 60 - 450 years
GD = 5 : 90 - 540 years
GD = 6 : 120 - 600 years
GD = 7 : 150 - 660 years
GD = 8 : 180 - 720 years
GD = 9 : 210 - 780 years
GD = 10 : 240 - 840 years
GD = 11 : 270 - 900 years

The more markers you have to compare, the more accurate you can make the estimation of when you shared a common ancestor with another person. 

This explanation of DNA testing is simplified. I welcome comments and suggestions that will help me improve it.

SNP Testing

SNP (single nucleotide polymorphisms) simply refer to the points in the genetic code where a single letter of the code has changed. Unlike STRs they are very stable and very rare. 

The diagram on the left shows lists of SNPs, with ZZ45 at the very top, followed by Z36 (also known as S206) and Y16889 and so on.

Here is how SNP analysis works. It really is quite simple. We look at all your SNPs and see which ones you have that are similar or different from other people. At the highest branch in the human tree you only have a few mutant SNPs in common with almost everybody else. As you walk down the tree you find people who share more and more SNPs.

What you see at left is a capture from software that implements this analysis. It shows the SNPs found from Big Y testing of me and my genetic cousins. (The surnames of the people tested are found at the bottom of the diagram.) 

Now look at the third brown rectangle of the diagram. See all those numbers, beginning with Y16889? These are the SNPs that are used to define the branching. Branching of the human family tree occurs when the SNPs that are common to a set of individuals are identified. So the four SNPs (Y16889. Y17162, PH2997 and Y17161) in this brown box are held by all Parrishs and Shattockes in common, plus a very distant relative Strang. The common ancestor we have with Strang goes back about 4000 years. That long list of numbers found in the middle rectangle, beginning with Y16885 and ending with Y17159, are the SNPs that Shattockes and Parrishs hold in common, but NOT Strang. The long list of numbers is also a measure of how much time has passed since the common ancestor lived, because SNPs occur roughly every 160 years.

That is a very, very long time between Strang and us. It appears that in the 14th century the family began branching. (It may be the case that there was several ancestors but with different names. If that is the case their descendants have not DNA tested. It seems more likely there was a single line of descent.) 

The diagram tells us that the branching occurred in the 14th century. This is using the estimate calculated by the YFull analysis, but you can roughly count time duration by counting the SNPs and multiplying by 160 years, a rough, back of the hand method of calculating the frequency of SNP mutations.  Note that the results show how the family branched only among the people who have tested. There may have been a common ancestor between Shattockes and Parrishs earlier than the 14th century. We just haven't found an individual to test that will allow us to identify that earlier date.

In the case of the Parrishs, SNPs A8033, A8034 and A8035 are held in common by the three Parrishs that have been tested. Then another split occurs and two Parrishs (N89266 and 259094) form their own branch with two SNPs unique to them. (If you do the calculation you see they split off from the other Parrish around 1800). 

The three Parrishs who have tested hold more SNPs in common with each other then with Shattockes.

You can perform the same analysis on the Shattockes. It looks like there is a lineage composed of Shattucks and Virginia Shaddocks on the right and two more closely related Shaddock and Shaddick lineages on the left. I think I will call the two lineages on the left the North Molton Shattockes and the Shaddocks and Pomeroy (Shattuck), and those on the right the Somerset Shattockes. The Somerset Shattockes and the Virginia Shattockes appear to have a common ancestor that goes back to 1300 AD. (At the time I am writing this it is probably the case that we had not tested an individual who give us more recent common ancestor.) The North Molton Shattockes have more recent branches, with Donald Shaddick (kit 443452) perhaps branching off about 1500. The other two Shattockes, Philip Shaddock (407511) and Mark Shaddick (442039) branch off about 1700. I call the family lineage that Mark and I belong to the Yarnscombe Shattockes. In fact the genealogical information almost exactly confirms the genetic genealogy data. Mark and I share a common ancestor who was born in 1680, Thomas Shattocke. 

That is all there really is to SNP analysis. The advantage of SNP testing is that the SNPs are rare and stable. You only need one to define branching in a family. And you can use them to find the date of the common ancestor among a group of related people. The disadvantage is the long time between mutation events. STRs are more useful in the last 500 years. When you combine the SNP analysis with the STR analysis you get this Experimental Shattocke Tree: click here.

The tree shows how I position the SNP information (blue labels), relative to the STR marker information (green labels). The green STR markers are used to define the branching. I show how the people from the spreadsheet are attached to to the tree. This tree identifies possible common ancestors among the people tested, and acts as a guide for further genealogical research. 

A Note on Autosomal Results (Family Finder and Ancestry DNA)

When evaluating DNA results from either an DNA test or FTDNA Family Finder (both autosomal tests) the following table might be helpful.

After you have looked at the table, come back and read this blog.  You will see why I use YDNA testing and not the autosomal tests by or the Family Finder autosomal test.

I have a half-sibling (now deceased) who was flagged as a half-sibling or grandparent/ grandchild by the FTDNA matching system. The "measure" of our relationship was 2027 centimorgans. For perspective a parent / child relationship is roughly 3385 cM. I have a match to a 4th cousin who is 82 cM, but I have seen other matches between other people that are 4th cousins that are 41 cM apart. The minimum value for a distant relationship is considered to be 7 cM. These measures of genetic relationship are not actual physical measurements of chromosomes. They are similar to "genetic distance" in measuring YDNA matches. Variables can alter how accurately they reflect generational distance between two people. They also tend to vary somewhat from one family to another. So just use them as a rough guide.

Here is a table provided by ISOGG (International Society of Genetic Genealogy)

Average autosomal DNA shared by pairs of relatives, in percentages and centiMorgans
% sharedcM half-identical (or better)RelationshipDegree of relationshipNotes
100% (Method I)/50% (Method II)3400.00Identical twins (monozygotic twins)Degree 0Fully identical everywhere.[2]
50%3400.00Parent/childDegree 1Half-identical everywhere
50% (Method I)/37.5% (Method II)2550.00Full siblingsDegree 1Half-identical on 50%/1700cM and fully identical on a further 25%/850cM.
25%1700.00Grandparent/grandchild, aunt-or-uncle/niece-or-nephew, half-siblingsDegree 2
25% (Method I)/23.4375% (Method II)1593.75Double first cousinsDegree 3Half-identical on 21.875%/1487.5cM and fully identical on a further 1.5625%/106.25cM
12.5%850.00Great-grandparent/great-grandchild, first cousins, great-uncle or aunt/great-nephew or niece, half-uncle or aunt/half-nephew or nieceDegree 3
6.25%425.00First cousins once removed, half first cousins, great-great-aunt/uncle, half great-aunt/uncleDegree 4
6.25%425.00Double second cousinsDegree 5
3.125%212.50Second cousins, first cousins twice removed, half first cousin once removed, half great-great-aunt/uncleDegree 5
1.563%106.25Second cousins once removed, half second cousins, first cousin three times removed, half first cousin twice removedDegree 6
0.781%53.13Third cousins, second cousins twice removedDegree 7
0.391%26.56Third cousins once removedDegree 8
0.195%13.28Fourth cousins, third cousins twice removedDegree 9
0.0977%6.64Fourth cousins once removed. third cousins three times removedDegree 10
0.0488%3.32Fifth cousinsDegree 11
0.0244%1.66Fifth cousins once removedDegree 12
0.0122%0.83Sixth cousinsDegree 13
0.0061%0.42Sixth cousins once removedDegree 14
0.00305%0.21Seventh cousinsDegree 15
0.001525%0.10Seventh cousins once removedDegree 16
0.000763%0.05Eighth cousinsDegree 17

Notes to Table

  • There is no variation between families in the parent/child or identical twins shared cM figures; beyond these immediate relationships, recombination results in random variation around the average figures above from one pair of individuals to another.
  • When a grandchild is compared to a grandparent, the shared cM with the other grandparent on the same side is easily inferred. The grandchild gets all 3400cM of, say, his paternal autosomes from his father. If it is seen that 1600cM of this came from the paternal grandfather, then the other 1800cM must have come from the paternal grandmother. The initial estimate of 1700cM shared by grandchild and paternal grandmother can thus be updated to 1800cM when it has been ascertained that grandchild and paternal grandfather share only a below average 1600cM.
  • When the subjects of the comparison descend from identical twin children of their most recent common ancestral couple, then the figures in the above table should be doubled.
  • The expected % shared for a half-relationship will always be exactly half of the expected % shared for the corresponding full relationship.
  • A similar method to that used for full siblings and for double first cousins can be used to compute expected shared percentages for any two subjects of comparison who are doubly related. However, the expected % shared for a double relationship can be slightly less than the sum of the expected % shared for the appropriate single relationships.
    • If Jack is related to both of Jill's parents, then Method I and Method II will give slightly different figures, as double cousins of this type are expected to be fully identical in some regions.
    • If Jill is a more remote descendant of spouses who are both related to Jack, then Jill will clearly have inherited at most one of the two segments in regions where the child of those spouses was fully identical to Jack. This reduces Jack and Jill's expected % shared slightly from the ballpark figure obtained by adding the expected % shared for the two relationships.
    • For example, double second cousins, where the double relationship arises because at least one is related on both the paternal side and the maternal side to the other, are expected to share 3.125% (1/32) on each side, or 6.25% (1/16) in total, using Method I. Using Method II, a small adjustment must be made to allow for regions where they are fully identical (1/1024 or approximately 0.098%), so that they are expected to be half-identical or better on 63/1024 or approximately 6.152%.
    • On the other hand, double second cousins who are children of double first cousins are expected to be half-identical on a quarter of the approximately 23.438% on which their parents are half-identical or better, in other words on approximately 5.859%.

Theoretical probabilities

The content of the following two tables is derived from Table 1 in the paper The probability that related individuals share some section of genome identical by descent by Kevin P Donnelly, Statistical Laboratory, Cambridge University, Cambridge, England. (Source: Theoretical Population Biology 1983: 23, 34-63)

How many cousins do we have?

Although there is only a low chance of sharing enough DNA with a specific distant cousin for the relationship to be detected, we have a large number of distant cousins and so many of these more distant cousins will appear in our match lists. The following table from the paper Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples by Henn et al (2012) shows the expected number of cousins at different degrees of relationship and the expected number of detectable cousins along with the expected amount of Identical by descent (IBD) sharing if the relationship is detected.

How many cousins.jpg

Mathematician and genetic genealogist Paul Rakow has done his own computer simulations on family sizes and has published the results in an essay on Counting cousins (published online 31 March 2016).

A study by AncestryDNA, based on British birth rates, census data, parliamentary research briefings[3] and other sources for the last 200 years, produced the following statistics on the number of cousins that the average British person would be expected to have.[4]

RelationshipNumber of cousins
First cousins5
Second cousins28
Third cousins175
Fourth cousins1,570
Fifth cousins17,300
Sixth cousins174,000

It is not clear if these statistics relate to the whole of the United Kingdom or just England and Wales.

From Debbie Kennett, Administrator of the Devon DNA Project

It is quite common to find that you have a match in a database that is not shared by your parents. This generally happens with the matches on smaller segments where it’s more difficult to predict the relationships and also because at Family Tree DNA and GedMatch we are dealing with unphased data. Phasing is the process of sorting the alleles onto the maternal and paternal chromosomes:

The lack of phasing can produce false positive matches and can make a segment appear longer than it actually is.

The last time I checked about 22% of my matches at FTDNA did not match either of my parents, but these were all the small matches in the fifth to distant cousin range which are generally not worth pursuing. If you match someone at FTDNA and the largest segment is 9 cMs or less it is not declared a match unless the total cM sharing is 20 cMs or more. However, the 20 cM count is often made up of small pseudosegments under 5 cMs which is why a child will often get matches that a parent doesn’t.

It’s best to ignore segments under 5 cMs as they are mostly just noise. It’s also best to work with matches where the longest shared segment is 10 cMs or larger. The vast majority of matches below this threshold will be too far back in time for you to find a genealogical relationship.
Subpages (1): Latest News