Skip to main content

From Spit to Discoveries

From Spit to Discoveries: How we analyze your DNA

 

There’s a long process between participants collecting their saliva and researchers finding new genetic indicators of disease risk. It begins in the lab: DNA is extracted from each sample and then “sequenced,” meaning the genetic code is read. Not long ago, this was a painstaking process, but now it’s largely done by (expensive!) machines.

Once the genetic code is read, statistical geneticists can get to work. Human DNA is made up of four chemical “letters” (A,C,G,T), also known as nucleotides. Since we inherit DNA from both of our parents, at almost every location we have two copies. For example, the output from the sequencing machine for one individual will look something like AC, GT, GG, CC, CT, and so on.

Looking at just one location across six participants, we might find something like AC, AC, AA, AC, CC, CC. Now suppose that the first three participants have the disorder being studied (they are “cases”) while the next three do not (they are “controls”). We could summarize the data in a small table:

It looks like A is more common in cases, and C is more common in controls. If that’s not just due to chance, it would suggest that this location in the DNA is associated with the disease.

Now consider doing this across millions of locations within the 20,000 human genes. To avoid being misled by chance results, researchers apply strict criteria for identifying true associations; that’s one of the reasons why studies need thousands of participants (not just six!). They will also need to apply statistical adjustments for factors such as age. Even so, when an apparent association is found at a single location, it won’t be enough to convince researchers that the location is important. Scientists will look for associations at other locations in or near the same gene and will be more convinced if the change in letters alters the shape or amount of the protein that the gene determines. 

While this isn’t the only way to uncover disease-related genes—for example, the role of BRCA1 was found by looking at patterns of inheritance within families—this approach, known as genome-wide association study (GWAS),  has become the backbone of modern human genetics research. Once a location is strongly suspected to be involved in the disease, it can then be used in other types of analyses to test whether it might be a suitable drug target for new treatments.

Learn more: