The Wall Street Journal points out that China has collected 54 million DNA records, and the U.S. FBI holds 13 million. These are government agencies. 23andMe is a private firm that has over 2 million – providing both ancestry and health-related analysis. National Geographic and Ancestry.com are both doing analysis for ancestry to answer to the question “what are your roots?”? (I’m 4% Neanderthal according to this analysis.) A number of medical institutions are also starting to collect DNA data for identifying disease risks, targeting medications or treatments. For a given collector of DNA, it is unclear how many “snips” they sample. Few will sequence the entire 3 billion base pairs that constitute a full human genome.
There are a number of possible privacy concerns with biometric and genetic tracking.
However, there are also a number of interesting big data evaluations that could be pursued with access to these data, and even more if it can be cross connected to other individually specific data. Iceland has undertaken a comprehensive evaluation of their citizens. Iceland has many generations of genealogical data tracing the roots of many if not all residents back to their arrival. They also have universal health care, and a vested interest in taking full advantage of the information to minimize costs and maximize treatment effectiveness. No doubt a number of insights will emerge from their work; they also have a fairly isolated gene pool.
So here is the real question I have for our broader technology community: What questions would you suggest should be pursued with access to these massive databases? What additional information besides the gene snips would be needed? And of course, what social implications might result from this research?