Create the SNP Variant Comparison Spreadsheet

by Philip Shaddock

This page describes a method I use to extract the named and unnamed SNP variants from the tables provided by FTDNA on the Big Y results page of accounts. While individuals can download the SNP lists from their own account, they have no access to other people's accounts. So the SNP Variant Comparison Spreadsheet is only useful for project administrators and those who can acquire the lists from other members of their family groups.

The variants are found under two tabs on the Big Y results page, the Named Variants tab and the Unnamed Variants tab.

You can see an example of the finished SNP Variant Comparison Spreadsheet by downloading my own project's version. (I have removed project member names to protect their privacy.)

Getting to the final spreadsheet has a few hurdles. 

One is capturing the list. FTDNA only presents a limited set of the variants, you have to scroll down to capture successive lists of numbers. It would have been helpful if FTDNA had provided these lists as a .csv spreadsheet file. It is a tedious exercise, and you have to combine the two named and unnamed variant lists. Then you have to strip all the extra information from the lists. 

A1207 Yes (+) Yes G T High
A2442 Yes (+) Yes C C High
A2470 Yes (+) Yes T T High
A2475 Yes (+) Yes T T High



The second problem is that the list of named and unnamed variants varies from one member of the group to another.

Philip       Ron
A1207     A1207
A2442     A2441
A2470     A2442
A2475     A2476

We want the lists to align the matching variants and leave spaces for the non-matching variants.

Philip       Ron
A1207     A1207
A2442     A2442

This makes discovering the SNPs they share in common much easier.

Fortunately there is an Excel marco that aligns multiple columns very quickly and easy. I found it here: Jerry Beucaire's macro handled the alignment of 34 members of my project with ease. But there is a limit to how much data this macro can process. It stopped processing on lists that had 29,000 entries. Since the variant lists in the Big Y results table are only about 900 entries long, this was not a problem. However I did find that the macro slowed down the more results I added to the spreadsheet. I found that if I exited from the spreadsheet and re-loaded it, the macro execution speed increased again.

The order of the columns is important. If your haplogroup is sub-divided into branches, organize them into sub-groups and enter them group by group. This will make comparisons between members of subgroups a lot easier. This will greatly aid the analysis process I describe in the How to Analyze Big Y Results.

The result is a spreadsheet that lists all the variants found in my haplogroup. I have replaced the names of the actual members with "Column A, Column B ..." to protect their privacy.

For those of you who have no experience with Excel macros, I have created a template spreadsheet with the sorting macro embedded in it. Download it here:

Make sure you paste the columns of SNP lists into the worksheet unformatted (right click on the cell, choose paste options, text only). And begin the first column at the first cell, under column A, row 1. Begin the second member's list of SNPs under Column B, row 1 and so on. The macro will not work properly otherwise. And delete all the columns except the namef and unnamed SNP column. Leave no gaps between columns.

To run the macro use the Alt-F8 key combination and click "Run." Magically the columns will be aligned.

The result is a spreadsheet that has all the people you are comparing in vertically aligned rows. To convert the lists from vertical to horizontal format, open a new spreadsheet, name it SNP Variant Comparison Spreadsheet. Capture the vertical columns of SNPs from the sorting spreadsheet, then right click in a cell of the new spreadsheet and chose the Transpose option.

The lists of SNPs will be transposed from a vertical to a horizontal format. Save the sorting spreadsheet for future additions.