Interpreting DNA Results

by Philip Shaddock

I am going to simplify this overview of Y-DNA genetic testing. There are lots of resources on the internet for those interested in a deeper grasp of the subject. See the Contact page of this site for recommended links.

Paternal Testing

My research focus is the Shattocke surname, and the descendants of the Shattocke common ancestor, which includes the Parrish and Byars surnames. Since males pass Y chromosomes exclusively to their sons, Y chromosome DNA tests (YDNA) are best for for deep surname research. These tests include the YDNA-37 to -111 tests at Family Tree DNA and the Alpha, Beta, Delta and Gamma tests at YSEQ.NET.  

You can also test for paternal ancestry using an autosomal test, that looks at the other chromosomes. But you cannot do deep paternal ancestry with autosomal tests such as Ancestry.com's DNA test, 23andMe's test or FTDNA's Family Finder test. Autosomal tests are only useful for maternal or paternal lineages in the last four or five generations. You only share .00304% DNA with your seventh cousins. Trying to do surname research with autosomal tests is difficult and limited in scope. 

Contact me before purchasing a test. We know a lot about Shattocke (and Parrish, Byars) genetics because so many of our genetic cousins have tested.  If you contact me, I can recommend the most cost effective test. 

The Two Types of Y-DNA Tests

We test two different types of mutations on the male Y chromosome. One type of mutation is called an STR (Short Tandem Repeat) and the other is called an SNP (Single Nucleotide Polymorphism). The STR mutation occurs when a segment of DNA is repeated: gatagatagatagata. In this simple example the genetic code "gata" is repeated four times. When I talk about "repeats" in the explanation below, this is what I mean. The SNP mutation is a mutation of a single letter in the genetic code of the Y chromosome. For example, a letter of the genetic code like "t" changes to a "c." The practical difference between these two mutations is how often they mutate. STR mutations are very active and can either change by adding a repeat or subtracting a repeat (a back mutation). They are very useful for studying DNA in the last five hundred years. But because the number of repeats change back and forth, plus or minus, STRs are not reliable sources of information over very long periods of time. SNP mutations on the other hand are much more rare and stable. They are more useful for looking further into the past. They are also useful for more recent generations but occur more infrequently, once every 144 years on average (Big Y test). 

STR Testing

Scientists have identified specific locations on the Y chromosome to test to determine paternity. These are called markers. Markers are just STRs that are special because they are known to be useful for determining paternity. Instead of having to test all the STRs on the Y-chromosome, they select a relatively small set of markers. They are well known locations on the Y-chromosome where the genetic history of male humans can be read. 

FTDNA provides their tests at different levels: 12, 25, 37, 65, 111 markers. This refers to the number of markers they test. The minimum useful level is 37 markers. The more markers you test the more accurate the resulting family tree. When I am testing potential matches, I test people at the 37 marker level to determine if they share a common ancestor with all other Shattockes, Byars and Parrishes. Then I upgrade to a higher level test. You may wish to go to the highest level right at the start. 

SNP Testing

SNP testing uses a form of sequencing called Next Generation Sequencing (NGS). This form of testing produces about 1,000 SNP markers. There are a couple of dozen that are used to determine what branch of the Shattocke family you belong to. You can also extract up to 400 STRs from NGS data, vastly expanding the number of STR markers you can use to determine relationships among people in a Shattocke or Parrish / Byars sub-branch. The price of a NGS test varies between $575 and $750. I would recommend the Full Genomes Y Elite NGS test ($645) as it provides better coverage and more accurate results than the FTDNA Big Y test.

An important caveat is that you may not be genetically a Shattocke, Parrish or Byars! That is a risk you take in genetic testing. You have to be prepared emotionally for that possibility. For perspective, of the 68 Shattucks, Byars and Parrishs who have tested, 9 have come back as belonging to another branch of the human family. That is 13% of the total, which is close to the average of 15% for all west European families. You may be certain of your ancestry in the past several generations, but it is possible you are descended from a much more ancient NPE (non-parental event). This has proven to be the case for most of the 9 results that have come back showing an NPE had occurred. For this reason it might be best to start testing with an FTDNA YDNA-37 test. You can decide at that point if you want to proceed deeper into the mystery of your ancestry. And testing with 37 markers will give you an idea of what surname may be lurking in your past. Again, contact me for my advice. 

Individual Marker Testing

There is an even less expensive way of determine if you are descended from the Shattocke common ancestor. I have developed the genetic profile of our family to the point where you can test for the existence of a specific marker $15 (STR) or $25 (SNP). If you test positive for that marker we know that you belong in the tree and perhaps the specific branch of the tree. Negative and you belong to some other unknown branch of the human family tree. At that point you can decide if you want to proceed to a more comprehensive test. Contact me.

Analyzing the DNA Results

When the testing company returns the results from a YDNA test, it is in the form of a table full of numerical values. I have developed three different spreadsheets that color code these results for study.  I do not publicly share the spreadsheets for privacy reasons. You have to join the mailing list to obtain a link to the results or you can contact me directly.  I use this table to compare the results with other Shattockes, and our cousins the Byars and Parrishs, as well as several even more distantly related people that show up as matches in the FTDNA table of matches. I use the table to construct a family tree. 

My principle method for finding how descendants of our common ancestor separated into branches of the family is to compare the results to each other. If two individuals have the same mutation they are likely to have descended from a common ancestor. It is in the comparison of the results that the branching of the tree is found. A single set of results for one person tells you very little.

Spreadsheets

I have created three different spreadsheets containing YDNA results from Shattockes, Parrishs and Byars. They compare results from Shattockes, Parrishs and Byars who have done the various tests I outlined above.

Table Comparing Match STRs

When you get your results back from one of the YDNA 37 to 111 tests back from FTDNA. I transfer this information to a spreadsheet that includes additional information and information from other testing companies.

In the spreadsheet I group people that I think might belong to the same branch of the Shattocke - Parish  - Byars tree. The organizing principle governing the spreadsheet is very simple. If a number of DNA samples of descendants have the same signature marker, then they must have descended from a common ancestor. Each branch of the family tree will have its own signature markers.   The problem is determining which markers are "signature markers" for each branch of the family tree. All that is meant by "signature" marker is that the marker has a number of repeats that are unique to a branch of the Shattocke family.

In order to identify signature markers I look at how frequently the STR marker has changed in its value, what its most common values among ancestors was, what its most recent ancestral value was compared to now and which branch of the human family that marker is commonly found in with the current number of repeats.

Here is a portion of the spreadsheet that I use:


  1. The spreadsheet is divided horizontally into the major branches of the family (such as Y19716 North Molton Shattockes). "Y19716 is the SNP defining this branch of the family.
  2. Genetic markers (STRs) are arranged in columns, beginning with DYS393. These are the markers whose values appear in FTDNA results. In this case Shaddick, whose kit number is 443452 (row 9) had a 13 repeats for marker DYS393.
  3. Where you see a value that is different from the average value for the marker, it is color coded. In this case the Shaddock and Shaddick in rows 16 and 17 have 14 repeats for the DYS393 marker. They are probably descended from a more recent common ancestor than the rest of the people in this branch of the family.  I have color coded the markers to identify their relative importance. 

    Green: Signature Markers: these are markers that have values unique to a branch or sub-branch of the family. They signify a common ancestor.
    Dark Orange: These also signify a common ancestor, but not all members of branch or sub-branch will share the values. 
    Other Colors: These markers differ from those of the other members but are usually unique to an individual.

  4. Legend for the colors listed in 3.

  5. ***** Mutation rate: Red stars rate how fast this marker mutates, with the slowest at 2 stars (**) and the fastest at 5 stars (*****). This is derived from the YFull database. Because tandem repeats can be gained or lost, rapidly moving STR values over a long period of time can yield deceptive results. For example a fast moving marker can go from 14 to 15 repeats, back to 14, then to 13 and back to 14. Fast moving markers over long periods of time (more than 5000 years) are not very useful, but over short periods of time they can help identify recent branches (in the past 500 years) of the family.
  6. The DNA testing service YSEQ.NET offers tests that are less expensive than the same tests offered by FTDNA. The letters above the STR name indicate which group of tests include the marker. In this case the YSEQ "Alpha" test includes the DYS393 marker.
  7. x>y Ancestral Value > Current Value. In some cases you will see repeats in this format. For example 13>14 would mean that the legacy value for the marker was 13, but sometime in the more recent past the marker gained a repeat. This is useful when you are trying to determine when a mutation occurred.

YFull STR Comparisons

This spreadsheet includes over 400 STR markers derived from an NGS test like Big Y, including the 111 markers of the previous test.  I have tried to give each branch of the family with signature markers its own colors. Use it in the same way as the previous spreadsheet. 

Shattocke SNP Variants

This spreadsheet shows those SNPs returned from an NGS test, like the FTDNA Big Y test. A person's NGS results are compared to a reference set of SNPs and those SNP that vary from the reference set are plugged into the spreadsheet. Notice that I have color coded those SNPs that are shared by all Shattockes, Parrishs and Byars, and those SNPs that are found only in people who belong to a branch of the family. For example, we know we all descend from an ancient ancestor with the U152 SNP because every Shattocke, Parrish or Byars has this mutation. But there is a sub-branch of the Parrish branch that shares the Y19410 SNP, not found in anybody outside their branch. 

Here is how to analyze SNP results. The term SNP (single nucleotide polymorphism) simply refers to a change in an individual letter of the genetic code. A "C" might have changed to a "T." In the spreadsheet this is indicated this way: C>T. C is what is found in the reference genome, T is the value found in the person being tested. Unlike STRs SNPs are very stable and very rare. 

The diagram on the left shows lists of SNPs, with ZZ45 at the very top.

SNP analysis is quite simple. We look at all your SNPs and see which ones you have that are similar or different from other people. At the highest branch in the human tree you only have a few mutant SNPs in common with almost everybody else. As you walk down the tree you find people who share more and more SNPs.

What you see at left is a capture from software that implements this analysis. It shows the SNPs found from Big Y testing of me and my genetic cousins. (The surnames of the people tested are found at the bottom of the diagram.) 

Now look at the third brown rectangle of the diagram. See all those numbers, beginning with Y16889? These are the SNPs that are used to define the branching. Branching of the human family tree occurs when the SNPs that are common to a set of individuals are identified. So the four SNPs (Y16889. Y17162, PH2997 and Y17161) in this brown box are held by all Parrishs and Shattockes in common, plus a very distant relative Strang. The common ancestor we have with Strang goes back about 4000 years. That long list of numbers found in the middle rectangle, beginning with Y16885 and ending with Y17159, are the SNPs that Shattockes and Parrishs hold in common, but NOT Strang. The long list of numbers is also a measure of how much time has passed since the common ancestor lived, because SNPs occur roughly every 144 years (Big Y testing, shorter for other tests).

The diagram tells us that the branching occurred in the 15th century. You can roughly count time duration by counting the SNPs and multiplying by 160 years, a rough, back of the hand method of calculating the frequency of SNP mutations.  

In the case of the Parrishs, SNPs A8033, A8034 and A8035 are held in common by the three Parrishs that have been tested. Then another split occurs and two Parrishs (N89266 and 259094) form their own branch with two SNPs unique to them. (If you do the calculation you see they split off from the other Parrish around 1800). 

The three Parrishs who have tested hold more SNPs in common with each other then with Shattockes.

Turning to the "Shattocke Variant SNPs" spreadsheet, the basic rule of thumb in SNP analysis is to find two people in the same branch who share one or more mutant SNPs. 

In the example at left, there are two people who have the A>G variant, Y19716. So you can assume they have a common ancestor. That turns out to be true. These two individuals have a common ancestor who lived in North Molton, Devon in the early 16th century. Y19716 is the SNP that defines the North Molton Shattockes. 

Notice in the column to the right that three individuals have the Y268 "T>T" SNP. These individuals belong to completely different branches of the Shattocke - Parrish - Byars family. The problem with NGS results is that they are bit like swiss cheese. In actual fact everybody in the family probably has the Y268 SNP but many are missing the SNP in the results.

In the case of the Y1940 and Y1941 SNPs (shown in green) they form a sub-branch of the A8033 Parrish branch of the family. 

That is all there really is to SNP analysis. The advantage of SNP testing is that the SNPs are rare and stable. You only need one to define branching in a family. And you can use them to find the date of the common ancestor among a group of related people. The disadvantage is the long time between mutation events. STRs are more useful in the last 500 years. 

I use both STR and SNP data to determine the branching of the Shattocke tree. If you look at the STR spreadsheets, you will see that individuals with the Y19410 mutation also share signature STRs. For example the North Molton Shattockes share the DYR60=18 marker. That makes sense because the mutation that changed the number of repeats for this marker occurred after the formation of the North Molton branch of the family.

Genetic Distance

In FTDNA's results a "genetic distance" value is often given. This is a somewhat misleading term. It doesn't actually measure how distantly related you are to the other person. It merely measures how many of your markers are different. The more markers you have that are different, the more likely that you are more distantly related.

The following table provides an age estimation for the first 37 markers in DNA results. For example, when two people have exactly the same markers in their first 37 markers (i.e. a genetic distance of zero), they are related within the last 330 years. Remember that these estimates are based on average mutation rates for the markers. Marker mutation is a random process. So only part of the time will the estimates be accurate, most of time they will good approximations and some of time they will be wildly inaccurate.  

Y-37
GD = 0 : 0 - 330 years
GD = 1 : 30 - 570 years
GD = 2 : 60 - 660 years
GD = 3 : 90 - 840 years
GD = 4 : 150 - 990 years
GD = 5 : 210 - 1140 years
GD = 6 : 270 - 1290 years

Y-67
GD = 0 : 0 - 270 years
GD = 1 : 0 - 480 years
GD = 2 : 30 - 510 years
GD = 3 : 60 - 630 years
GD = 4 : 120 - 750 years
GD = 5 : 150 - 840 years
GD = 6 : 210 - 960 years
GD = 7 : 240 - 1080 years
GD = 8 : 300 - 1170 years
GD = 9 : 360 - 1290 years

Y-111
GD = 0 : 0 - 150 years
GD = 1 : 0 - 150 years
GD = 2 : 30 - 330 years
GD = 3 : 30 - 390 years
GD = 4 : 60 - 450 years
GD = 5 : 90 - 540 years
GD = 6 : 120 - 600 years
GD = 7 : 150 - 660 years
GD = 8 : 180 - 720 years
GD = 9 : 210 - 780 years
GD = 10 : 240 - 840 years
GD = 11 : 270 - 900 years

The more markers you have to compare, the more accurate you can make the estimation of when you shared a common ancestor with another person. 

This explanation of DNA testing is simplified. I welcome comments and suggestions that will help me improve it.

Analyzing Autosomal Results

When evaluating DNA results from either an Ancestry.com DNA test or FTDNA Family Finder (both autosomal tests) the following table might be helpful.

After you have looked at the table, come back and read this blog.  You will see why I use YDNA testing and not the autosomal tests by Ancestry.com or the Family Finder autosomal test.

I have a half-sibling (now deceased) who was flagged as a half-sibling or grandparent/ grandchild by the FTDNA matching system. The "measure" of our relationship was 2027 centimorgans. For perspective a parent / child relationship is roughly 3385 cM. I have a match to a 4th cousin who is 82 cM, but I have seen other matches between other people that are 4th cousins that are 41 cM apart. The minimum value for a distant relationship is considered to be 7 cM. These measures of genetic relationship are not actual physical measurements of chromosomes. They are similar to "genetic distance" in measuring YDNA matches. Variables can alter how accurately they reflect generational distance between two people. They also tend to vary somewhat from one family to another. So just use them as a rough guide.

Here is a table provided by ISOGG (International Society of Genetic Genealogy)

Average autosomal DNA shared by pairs of relatives, in percentages and centiMorgans
% sharedcM half-identical (or better)RelationshipDegree of relationshipNotes
100% (Method I)/50% (Method II)3400.00Identical twins (monozygotic twins)Degree 0Fully identical everywhere.[2]
50%3400.00Parent/childDegree 1Half-identical everywhere
50% (Method I)/37.5% (Method II)2550.00Full siblingsDegree 1Half-identical on 50%/1700cM and fully identical on a further 25%/850cM.
25%1700.00Grandparent/grandchild, aunt-or-uncle/niece-or-nephew, half-siblingsDegree 2
25% (Method I)/23.4375% (Method II)1593.75Double first cousinsDegree 3Half-identical on 21.875%/1487.5cM and fully identical on a further 1.5625%/106.25cM
12.5%850.00Great-grandparent/great-grandchild, first cousins, great-uncle or aunt/great-nephew or niece, half-uncle or aunt/half-nephew or nieceDegree 3
6.25%425.00First cousins once removed, half first cousins, great-great-aunt/uncle, half great-aunt/uncleDegree 4
6.25%425.00Double second cousinsDegree 5
3.125%212.50Second cousins, first cousins twice removed, half first cousin once removed, half great-great-aunt/uncleDegree 5
1.563%106.25Second cousins once removed, half second cousins, first cousin three times removed, half first cousin twice removedDegree 6
0.781%53.13Third cousins, second cousins twice removedDegree 7
0.391%26.56Third cousins once removedDegree 8
0.195%13.28Fourth cousins, third cousins twice removedDegree 9
0.0977%6.64Fourth cousins once removed. third cousins three times removedDegree 10
0.0488%3.32Fifth cousinsDegree 11
0.0244%1.66Fifth cousins once removedDegree 12
0.0122%0.83Sixth cousinsDegree 13
0.0061%0.42Sixth cousins once removedDegree 14
0.00305%0.21Seventh cousinsDegree 15
0.001525%0.10Seventh cousins once removedDegree 16
0.000763%0.05Eighth cousinsDegree 17

Notes to Table

  • There is no variation between families in the parent/child or identical twins shared cM figures; beyond these immediate relationships, recombination results in random variation around the average figures above from one pair of individuals to another.
  • When a grandchild is compared to a grandparent, the shared cM with the other grandparent on the same side is easily inferred. The grandchild gets all 3400cM of, say, his paternal autosomes from his father. If it is seen that 1600cM of this came from the paternal grandfather, then the other 1800cM must have come from the paternal grandmother. The initial estimate of 1700cM shared by grandchild and paternal grandmother can thus be updated to 1800cM when it has been ascertained that grandchild and paternal grandfather share only a below average 1600cM.
  • When the subjects of the comparison descend from identical twin children of their most recent common ancestral couple, then the figures in the above table should be doubled.
  • The expected % shared for a half-relationship will always be exactly half of the expected % shared for the corresponding full relationship.
  • A similar method to that used for full siblings and for double first cousins can be used to compute expected shared percentages for any two subjects of comparison who are doubly related. However, the expected % shared for a double relationship can be slightly less than the sum of the expected % shared for the appropriate single relationships.
    • If Jack is related to both of Jill's parents, then Method I and Method II will give slightly different figures, as double cousins of this type are expected to be fully identical in some regions.
    • If Jill is a more remote descendant of spouses who are both related to Jack, then Jill will clearly have inherited at most one of the two segments in regions where the child of those spouses was fully identical to Jack. This reduces Jack and Jill's expected % shared slightly from the ballpark figure obtained by adding the expected % shared for the two relationships.
    • For example, double second cousins, where the double relationship arises because at least one is related on both the paternal side and the maternal side to the other, are expected to share 3.125% (1/32) on each side, or 6.25% (1/16) in total, using Method I. Using Method II, a small adjustment must be made to allow for regions where they are fully identical (1/1024 or approximately 0.098%), so that they are expected to be half-identical or better on 63/1024 or approximately 6.152%.
    • On the other hand, double second cousins who are children of double first cousins are expected to be half-identical on a quarter of the approximately 23.438% on which their parents are half-identical or better, in other words on approximately 5.859%.

Theoretical probabilities

The content of the following two tables is derived from Table 1 in the paper The probability that related individuals share some section of genome identical by descent by Kevin P Donnelly, Statistical Laboratory, Cambridge University, Cambridge, England. (Source: Theoretical Population Biology 1983: 23, 34-63)



How many cousins do we have?

Although there is only a low chance of sharing enough DNA with a specific distant cousin for the relationship to be detected, we have a large number of distant cousins and so many of these more distant cousins will appear in our match lists. The following table from the paper Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples by Henn et al (2012) shows the expected number of cousins at different degrees of relationship and the expected number of detectable cousins along with the expected amount of Identical by descent (IBD) sharing if the relationship is detected.

How many cousins.jpg

Mathematician and genetic genealogist Paul Rakow has done his own computer simulations on family sizes and has published the results in an essay on Counting cousins (published online 31 March 2016).

A study by AncestryDNA, based on British birth rates, census data, parliamentary research briefings[3] and other sources for the last 200 years, produced the following statistics on the number of cousins that the average British person would be expected to have.[4]

RelationshipNumber of cousins
First cousins5
Second cousins28
Third cousins175
Fourth cousins1,570
Fifth cousins17,300
Sixth cousins174,000

It is not clear if these statistics relate to the whole of the United Kingdom or just England and Wales.

From Debbie Kennett, Administrator of the Devon DNA Project

It is quite common to find that you have a match in a database that is not shared by your parents. This generally happens with the matches on smaller segments where it’s more difficult to predict the relationships and also because at Family Tree DNA and GedMatch we are dealing with unphased data. Phasing is the process of sorting the alleles onto the maternal and paternal chromosomes:

http://isogg.org/wiki/Phasing


The lack of phasing can produce false positive matches and can make a segment appear longer than it actually is.

The last time I checked about 22% of my matches at FTDNA did not match either of my parents, but these were all the small matches in the fifth to distant cousin range which are generally not worth pursuing. If you match someone at FTDNA and the largest segment is 9 cMs or less it is not declared a match unless the total cM sharing is 20 cMs or more. However, the 20 cM count is often made up of small pseudosegments under 5 cMs which is why a child will often get matches that a parent doesn’t.

 

http://isogg.org/wiki/Autosomal_DNA_match_thresholds


It’s best to ignore segments under 5 cMs as they are mostly just noise. It’s also best to work with matches where the longest shared segment is 10 cMs or larger. The vast majority of matches below this threshold will be too far back in time for you to find a genealogical relationship.
Subpages (1): Latest News
Comments