The Use of DNA in Genealogy by Karen Chorney
At our PGS-CA meeting in January 2007, Marvin Blaski and I gave a 2-hour briefing on the use of DNA in genealogical research. Marvin documented his part of the briefing in the January 2007 bulletin in an article entitled “Two Blaszkowski/Blaski Families”. What follows is a summary of my part of this briefing. My part was based on portions that I had given before elsewhere, pieces of other people’s briefings, and my reading of various articles, websites and books. My interest in this topic began in 2001 when I read Bryan Sykes’s book, The Seven Daughters of Eve and had my DNA analyzed. More about that later.
What Exactly is DNA?
The nucleus of a cell contains chromosomes. Each chromosome in the nucleus of each of our cells is composed of a double strand of coiled DNA – the familiar double helix. Each human cell nucleus (except for the sperm and egg) contains 23 pairs of chromosomes – one of the pair from the father, one from the mother.
The egg and sperm each contain 23 chromosomes, not pairs. Hereditary information in transmitted in the chromosomes. The “human genome” refers to the complete set of DNA in all of the chromosomes. If uncoiled, the DNA in all of our chromosomes from a single cell would stretch to 6 feet. Within a cell’s cytoplasm, outside of the nucleus, are some structures called mitochondria. These mitochondria, which perform functions related to the energy for the cells, also contain some DNA. This DNA is useful for genealogical purposes and will be discussed later.
So what exactly is DNA? DNA (deoxyribo nucleic acid) is composed of four building block molecules called bases. These four bases are thymine (T), adenine (A), guanine (G), cytosine (C). Thymine contains 5 carbon atoms, 6 hydrogen atoms, 2 nitrogen atoms and 2 oxygen atoms. The other bases are also composed of just several of atoms of these four elements. DNA bases pair up with each other, A always with T and C always with G, to form units called base pairs, held together by hydrogen bonds. Each base is also attached to a sugar molecule and a phosphate molecule. Together, a base, sugar, and phosphate are called a nucleotide. Nucleotides are arranged in two long strands that form a spiral -the double helix. The structure of the double helix is somewhat like a ladder, with the base pairs forming the ladder’s rungs and the sugar and phosphate molecules forming the vertical sidepieces of the ladder.
The Human Genome Project identified the sequences of DNA bases of all of our chromosomes and determined that humans have about 3.2 billion bases, and more than 99.9 percent of those bases are the same in all people.
What Exactly are Genes?
The DNA sequence determines the information available for building and maintaining an organism, similar to the way in which letters of the alphabet appear in a certain order to form words and sentences. Consider A, T, C and G to be the letters of a 4-letter alphabet. Groups of letters form words. Words that are strung together and have meaning are sentences. Genes are like sentences. Random strings of words that do not have meaning are called non-coding DNA or “junk DNA”. So along the long stretches of DNA in our chromosomes, there are many, many random words and then, once is a while, there will be a meaningful sentence – a gene. Genes carry codes for making proteins and are the fundamental units of heredity. Scientists are still working at identifying our genes. The current estimate is that there may be 20,000-40,000 of them. So about 95% of our DNA does NOT contain genes but is non-coding DNA. Most of the .1% of DNA differences between humans (excluding identical twins) occurs in the non-coding DNA. It is the non-coding DNA that is used in DNA testing for genealogical purposes. The non-coding DNA records our ancestral history – as will be seen in the discussion which follows.
The Types of DNA Used in Genealogical DNA Tests
As mentioned above, we humans have 23 pairs of chromosomes. The first 22 pairs of chromosomes are called “autosomal” chromosomes. The 23rd pair contains the sex chromosomes, X and Y. The mother provides an X chromosome; the father (the sperm cell) can provide either an X or a Y. If the sperm provides an X, the child will be a female and have two X chromosomes in the 23rd pair. If the sperm cell provides a Y, the child will be a male and have one X chromosome and one Y chromosome in its 23rd pair. So, a male’s Y chromosome came from his father’s father’s father’s…father. The Y chromosome’s main function (and perhaps only function) is to determine the sex of the child. [If the Y had other functions, then females would lack some characteristics.] Current studies indicate that the Y chromosome has 78 genes within its approximately 58 million bases pairs and the X chromosome has more than 1000 genes within its approximately 155 million base pairs.
Because the Y chromosome follows the father’s father’s father’s ancestral line, i.e., the surname lineage, and because mutations occur in this chromosome in a favorable timeframe, Y-chromosome testing is used for genealogical purposes. Y-chromosome tests will be explained below. As mentioned above, mitochondria are structures in the cytoplasm of a cell and mitochondria contain DNA (referred to as mtDNA). The mtDNA has been sequenced and contains 16,569 base pairs. This sequence is called the Cambridge Reference Sequence and all mitochondrial tests performed are measured against the Cambridge Reference Sequence for mutations. Each cell contains hundreds and even thousands of mitochondria, all from the original egg cell. Sperm have no mitochondria after entering the egg cell. Therefore, all mitochondria are inherited from your mother’s mother’s mother’s…mother. This fact, along with its DNA mutation rate, makes mtDNA testing useful for genealogical purposes. To a much lesser extent, the DNA within the 22 pairs of autosomal chromosomes is also used for testing for genealogical purposes. If you picture your pedigree chart, the boxes running along the top (your father’s male line) correspond to Y-chromosome lineage. Similarly, if you look at the boxes running along the bottom of your pedigree chart, these correspond to your mitochondrial (mtDNA) DNA heritage through your mother’s maternal line.
A certain number of genetic definitions will be required to understand Y-chromosome and mtDNA testing. A marker is used to flag a particular sequence of nucleotides (bases) on a DNA molecule along the Y-chromosome. Each marker is designated by a number (known as DYS#), according to international conventions. The terms marker and locus are often used interchangeably. The marker is what is tested; the locus is where the marker is located on the chromosome. The change in a marker from generation to generation is what is used to identify families. These changes are also known as mutations or polymorphisms. These changes are often small, and on the Y-chromosome it is estimated that they occur about once every 500 generations per marker. In other words, any given marker has a .002 chance of mutating with each generation. Once a mutation on the Y-chromosome has occurred, it is passed down the male line and thus can help identify families and family branches.
There are three types of polymorphisms of interest to us: indels, single nucleotide polymorphisms (SNPs), and short tandem repeats (STRs – also known as microsatellites). Polymorphisms also occur in mitochondrial DNA, as will be discussed later. An indel inserts or deletes single nucleotides or whole sections of DNA from a sequence. Indels are uncommon in coding regions but common in non-coding regions. But an example of an indel in a coding region is the most common mutation causing cystic fibrosis. This indel is the deletion of three bases from a gene. [5,000 hereditary diseases have been identified thus far.] Single nucleotide polymorphisms (SNPs) occur when a nucleotide at a particular location is changed to another nucleotide, e.g., A is changed to G. Since SNPs are so rare, they are often called Unique Event Polymorphisms. They are useful for exploring deep ancestry, as will be seen in mtDNA testing. Short Tandem Repeats (STRs) are the polymorphisms of interest in Y-chromosome testing. A Short Tandem Repeat is a short pattern (often two to five bases in length) repeated a number of times in a row. For instance, GATAGATAGATA is three repeats of the 4-base word GATA. The differences in the number of repeats at a particular marker on the Y-chromosome are used in Y-chromosome testing to distinguish one individual and lineage from another. Some STR markers have about one mutation in every 15 to 50 generations, which make them useful for genealogical purposes. For, if mutations rarely occurred, then most males would have the same Y-chromosome. If mutations occurred frequently, males would have such differences in the Y-chromosome that it would be impossible to find relationships. The STR markers’ rate of mutations is “just right” and the timeframe is also just right with respect to the adoption of surnames in Europe (1100AD-1500AD).
As mentioned above, each marker has a number assigned to it (e/g/, DYS#391, DYS#439, GATA H4). The number of markers selected for Y-chromosome testing varies, based on the customer choice (and cost), and is usually runs from 12, 25, 37, up to 67. An allele is one of the versions of a marker that can exist. For example, marker DYSxxx can have 12, 13 or 14 repeats of “GATA”. A haplotype is a series of markers and their values for a test. To see if or how two males are related through their Y-chromosome tests, you compare their haplotypes. Genetic distance defines how closely related two haplotypes are. (The most popular reason for having a Y-chromosome test done is to determine whether you are related to someone with the same surname.)
|DYS Marker||Haplotype #1||Haplotype #2||Difference|
In this example, there is a genetic distance of 2 between Person 1 with haplotype 1 and Person 2 with haplotype 2. Person 1 has one more repetition of, say, GATA at marker DYS#19 than does Person 2. Person 2 has one more repetition at marker DYS#439 than does Person 1. If two people are closely related in their Y-chromosome tests, we can talk about their most recent common ancestor (MRCA). The MCRA is a statistical result based on frequency of mutation. Thus probabilistic statements can be made, such as “there is a 90% probability that the MRCA is 40 generations or less”. FamilyTreeDNA is a company that does testing and also has excellent tutorials on its website. Further understanding of the probability and statistics associated with testing can be found at this page: http://www.familytreedna.com/faq2.html. Also, the chart below is a quick glance of results that can be expected with various levels of testing (i.e., with various numbers of markers tested):
All companies who provide Y-chromosome testing will provide you with an excellent explanation of the results and what they mean.
Genealogical Questions that may be answered by a Y-Chromosome Test
Here, in summary form, are the reasons one would want to have a Y-chromosome test done. See also, pages 42-43 of the excellent book: Trace your Roots with DNA (see Bibliography below).
- Finding connections among males with the same surname
- Proving that families with the same surname are connected
- Ruling out others with the same surname as being related
- Name change/variation/alias: finding proof of surname modifications
- Rare surname: Proving that everyone with surname “X” is related
- Uncertain paternity: uncovering connections hidden by adoption, illegitimacy, etc.
- Famous roots: discovering if family tales of famous relatives are true
- Verification of traditional genealogical research
- Crossing the pond: trying to discover geographic origins in another country
The article “Sorting Relationships among Families with the Same Surname: An Irish-American DNA Study” (see Bibliography) provides a great example of the last item in the list above. This Journal containing this article is available at the Los Angeles Regional Family History Center, as is the Trace Your Roots with DNA book.
Marvin Blaski’s Y-chromosome testing (mentioned in the January 2007 Bulletin, “Two Blaszkowski/Blaski Families”) covered numbers 1, 2 and 4 above. Another excellent illustration of the use of Y-chromosome testing is at web page: http://www.ourfamilyorigins.com/mayflowerfullersdnaproject.htm. This study is connecting people with the surname Fuller to Edward and Samuel Fuller who came over on the Mayflower.
The Y-Haplogroup identifies a person’s major population group and provides information about the ancient origin of the male line. In a loose sense a Y-haplogroup can be viewed as a cluster of similar haplotypes. Single Nucleotide Polymorphisms are used in Y-haplogroup testing, as they mutate more slowly than STRs. Finding one’s haplogroup usually requires a different Y-chromosome test than the one used for haplotypes – except for one major exception. This major exception is for men of European descent. 90% of men of European descent can determine their haplogroup from their haplotype DNA test in this fashion. If you have DYS426=12 and have DYS392=11, then you are probably a member of haplogroup R1a1, which includes 5% of men of European origin. If you have DYS426=12 and DON’T have DYS392= 11 then you probably belong to haplogroup R1b, as do about 67% of men of European descent. If you have DYS426=11 and DYS455=8 then you are probably a member of haplogroup I1a, which includes about 20% of European men. The following website contains great maps of both the Y haplogroup and mtDNA haplogroups migrations: http://www.scs.uiuc.edu/~mcdonald/WorldHaplogroupsMaps.pdf.
Similarly, the following website provides great verbal descriptions of the Y-haplogroups: http://www.roperld.com/YbiallelicHaplogroups.htm.
For example, the description of the R1a haplogroup is:
- Haplogroup R1a is believed to have originated in the Eurasian Steppes north of the Black and Caspian Seas. This lineage is believed to have originated in a population of the Kurgan culture, known for the domestication of the horse (approximately 3000 B.C.E.). These people were also believed to be the first speakers of the Indo-European language group. This lineage is currently found in central and western Asia, India, and in Slavic populations of Eastern Europe.
Bryan Sykes, in his new book Saxons, Vikings, and Celts discusses the fact that there have been identified 21 Y-haplogroups or paternal “clans” worldwide, only eight of which occur in Europe.
Mitochondrial DNA (mtDNA) Testing
As previously mentioned, mtDNA testing is used to investigate your mother’s, mother’s, mothers’s, …mother’s line. There are two regions of interest in the mtDNA: the coding region, which is involved in protein processing and mutates very slowly, and the Hypervariable Region (HVR), which mutates faster than the coding region. The mutations here do not cause harm. They are of the SNP type (Single nucleotide polymorphism) and are slower than the STRs of the Y-chromosome. The HVR is used in mtDNA testing. The Cambridge Reference Sequence (CRS) by coincidence is the most common mtDNA sequence in Europe and is called the H haplogroup. Bryan Sykes published The Seven Daughters of Eve in 2001 and started Oxford Ancestors to offer mtDNA testing to the public (and also Y-chromosome testing). Ninety-five percent of all living people of European ancestry are descended from seven women in Europe who lived between 45,000 and 10,000 years ago. Mitochondrial DNA testing looks at anywhere from 340 to 1143 bases in the HVR (actually two regions in the HVR – HVR1 and HVR2). The original Oxford Ancestors testing looked at 400 bases in the HVR1. Most people have only a few differences (mutations) compared to the CRS. Many companies now offer mtDNA testing. Some companies list your results with your actual sequence; others just point out where you differ from the CRS. Testing results in a determination of your female Haplotype and Haplogroup. The Seven Daughters of Eve who predominate in European ancestry have been given names by Bryan Sykes that correspond to the letter of their Haplotype. Thus we have Helena (H haplotype), Jasmine (J), Ursula (U), Tara (T), Xenia (X), Katrine (K) and Velda (V).
Bryan Sykes was able to provide information about each Haplotype based on the age of the haplotype (number of mutations away from mitochondrial Eve), the archeological record, and other scientific factors. Helena’s clan represents 41% of people of European descent. She had her origin in the area between France and Spain about 20,000 years ago and her descendants spread across Europe and into England. Jasmine’s clan represents 12% of people of European descent. She originated in Syria about 10,000 years ago and her descendants spread to Anatolia, Greece, the Balkans, the Danube, Spain and Portugal. Ursula’s clan represents 11% of people of European descent. She originated in northern Greece about 45,000 years ago and her descendants spread to England, Spain and across Europe. Tara’s clan represents 10% of people of European descent. She originated in Tuscany about 17,000 years ago and her descendants spread to France, Ireland and northern Europe. Xenia’s clan represents 7% of people of European descent. She originated in the Caucasus about 25,000 years ago and her descendants spread to all of Europe and to North America. Katrine’s clan represents 10% of people of European descent. She originated in Venice about 15,000 years ago and her descendants spread to the Rhine Valley and to northern Europe. Velda’s clan represents 4% of people of European descent. She originated in Spain about 17,000 years ago and her descendants spread to France, Britain, Norway, Arctic Russia and Lapland. Recently Bryan Sykes added Ulrike to the European mix. She represents 2% of Europeans, came from Ukraine, and her descendants went into Scandinavia and the Balkan states.
Shortly after reading The Seven Daughters of Eve in 2001, I had my mtDNA analyzed by Oxford Ancestors and I persuaded members of my book club to do so also. My mtDNA is from the clan of Katrine. I have three differences with the Cambridge Reference Sequence. They are noted as: 16224C, 16311C, 16362C. The 5,000 year old Iceman found in the Alps a few years ago had the Katrine haplotype: 16224C, 16311C. I also had the mtDNA of my father’s sister’s daughter analyzed. Her mtDNA turned out to be identical to the CRS. That is, she was descended from Helena. Thus I, too, am descended from Helena (as well as from Katrine). Members of my book club turned out as: three Helenas, one Xenia, one Ursula and one Uma. Uma is not your typical European clan. Uma’s clan has been found mainly in Western Eurasia, particularly in Turkey, Armenia, Syria, and Iraq. Our club member who is from the clan of Uma is Jewish on her mother’s side and genealogically traces this maternal line to France. This was an interesting surprise and counterpoint to another of our members who thought she would not be among the seven clans because she was Armenian. Her mtDNA put her in the clan of Helena!
Other than the fun of knowing what your mtDNA haplotype is, what are the reasons a genealogist would want to have an mtDNA test performed?
- Because mtDNA survives well, it is often used in testing invulved with the dead. For example, the identities of the unknown Titanic baby and of the unknown Vietnam War soldier were discovered with mtDNA tests.
- MtDNA testing can identify a correct female ancestor if there is doubt. Comparing mtDNA test results from female line descendants of the ancestor with those of female line descendants of the mother or a maternal aunt in the proposed ancestral family could verify or disprove a relationship. This type of testing usually involves both HVR1 and HVR2.
- MtDNA testing can be used to assign daughters of different wives to their correct mothers.
- MtDNA testing can be used to prove that a female child was adopted.
Testing for Geographic Origins
While Y-chromosome testing and mtDNA testing are the two most prevalent types of tests performed today for genealogists, there is another type of DNA testing used less frequently.
It is called the BioGeographic Ancestry test. It uses autosomal DNA markers and is performed by the company DNAPrint Genomics. This test can answer the question: what percentage are you of the following four groupings: Indo-European, East Asian, Sub-Saharan African, and Native American? Because of the nature of autosomal DNA (a discussion beyond the scope of this article), this test is good for looking into recent ancestral lines. If your family legend has you descended from a Native American great-great-great-grandmother, this test may not be able to detect that far back. (But if that Native American great-great-great-grandmother was on your mtDNA ancestral line, your mtDNA test would verify it.)
If you are African-American, AfricanAncestry.com has a DNA database (for mtDNA and Y-chromosome) of 25,000 indigenous Africans representing 300 ethnic groups and 30 countries. Thus you can be placed into a tribe in a particular location in Africa.
The National Geographic Genographic Project is currently testing DNA in remote parts of the world to gain greater insight into haplogroup origins and migrations. The results of this project will be very interesting and are anxiously awaited by many genealogists. In the meantime, there are several web sites that show what is currently known about the migrations of the haplogroups. These are delineated in the Bibliography below.
How Much Does It Cost?
The briefing I gave in January 2007 included an elaborate chart of testing companies, products and prices. I am sure it is out of date by now. Prices are coming down, companies are consolidating, special deals are being offered, etc. So I will not include that old chart here. The most popular and respected companies that I am aware of are listed in the Bibliography, along with their web sites. They are: Oxford Ancestors, FamilyTreeDNA, Sorenson Molecular Genealogy Foundation, Relative Genetics, DNAPrint Genomics, African Ancestry, National Geographic Genographc Project. They provide both Y-chromosome testing and mtDNA testing. You will find “Notes on Privacy” on these sites. They know that you are concerned and they want to allay your fears about your DNA falling into the wrong hands. Regarding price, I would say most companies are pretty competitive with one another. A 12-marker Y-chromosome test is in the $100-$150 range. An HRV1 mtDNA test can be had for about $125.
Other Types of DNA Tests
The tests discussed above are those of interest to genealogists. There are other DNA tests you have probably heard of that are used by law enforcement and courts. Forensic tests are used to distinguish one person from another. There are 13 FBI “CODIS” markers. These markers are on the autosomal chromosomes in the non-coding regions. Each marker has a large number of alleles. The probability that any two people will match perfectly for all thirteen markers is one in 2,830,000,000,000,000. Paternity testing usually uses the same 13 markers, with DNA taken from the mother, child, and possible father. Genetic testing for diseases looks at the coding region of autosomal chromosomes.
There are many good DNA glossaries available on the Internet, so I have not included one in this article. Most of these glossaries are 5-10 pages long. One of my favorites is the one in the book that I consider the bible for genetic genealogists, Trace Your Roots with DNA (see the Bibliography below). Other good ones are at the FamilyTreeDNA website (see below) and at the web site for the Human Genome Project (also see below).
Bibliography - Useful Web Sites:
- African Ancestry performs Y-chromosome and mt DNA tests. If your ancestry is African, they will provide you with tribe and location information from their large database.
- AncestrybyDNA (aka DNAPrint Genomics, Inc.) will determine your genetic heritage among the four anthropological groups: Native American, East Asian, Sub-Saharan Africa, and European. This service will predict your Indo-European heritage among the following groups: Northern European, Southeastern European, Middle Eastern, and South Asian.
- http://www.baucum.org/GENDNAL/listPages.php Many articles listed about various DNA studies.
- Blair surname DNA project and some good tutorials.
- Cyndi's List - points to lots of good articles and web sites.
- Family Tree DNA A leader in Y-chromosome testing and other tests. Concentrates on Surname Projects. Excellent tutorials. Y-chromosome and mtDNA migration maps.
- http://www.doegenomes.org/ Genome programs of the U.S. Department of Energy Office of Science - founder of the Human Genome Project and leader in systems biology research.
- Genome glossary of the Human Genome Project.
- National Geographic's Genographic Project - a landmark study of the human journey.
- Oxford Ancestors World’s leading provider of DNA-based services for use in personal ancestry research. Brian Sykes - Seven Daughters of Eve and Adam's Curse. Y-chromosome and mtDNA migration maps.
- Relative Genetics is a testing company (both Y-chromosome and mtDNA). It sponsors surname projects. Used by Sorenson Molecular Genealogy Foundation for its coupon offer.
- The Sorenson Molecular Genealogy Foundation (SMGF) - a non-profit organization committed to developing the world's foremost database of correlated genetic and genealogical information, and making this information freely available to the public. They have collected over 60,000 DNA samples, together with four-generation pedigree charts, from volunteers in more than 100 countries.
- Genetics Module from the University of Alberta. An online course in classical genetics, Mendelian genetics, etc.
- GeoGene University of London Y-chromosome and mitochondrial DNA testing.
- World Families Network of Geographical, Regional and Ethnic Projects - e.g. Mapping the Icelandic genome, Johnson Co. Illinois.
Bibliography – Books, Articles:
- Megan Smolenyak Smolenyak and Ann Turner, Trace Your Roots with DNA (United States: Rodale, 2004)
- Bryan Sykes, The Seven Daughters of Eve (New York: W.W. Norton and Company, 2001)
- Bryan Sykes, Adam’s Curse (New York: W.W. Norton and Company, 2004)
- Donn Devine, “Sorting Relationships among Families with the Same Surname: An Irish-American DNA Study”, National Genealogical Society Quarterly Volume 93, Number 4 (December 2005): 283-293.
- Tony N. Frudakis, “Powerful but Requiring Caution: Genetic Tests of Ancestral Origins”, National Genealogical Society Quarterly Volume 93, Number 4 (December 2005): 260-268.
- Ugo A. Perego, Ann Turner, Jayne E. Ekins, and Scott R. Woodward, “The Science of Molecular Genealogy”, National Genealogical Society Quarterly Volume 93, Number 4 (December 2005): 245-259.
- Thomas H. Shawker, “Genetic Genealogy: Issues and Considerations”, National Genealogical Society Quarterly Volume 93, Number 4 (December 2005): 294-304.