Tonight I was at the Sensored Meetup #10: Data! APIs! Standards!. Besides some great thought provoking talks, there was some great discussions afterward that got me thinking more clearly about some stuff that has been bubbling in my brain.
Scott McNealy Was Right - Privacy: ‘Get Over It’
I really thought McNealy was wrong when he said way back in 1999 that consumer privacy issues are a “red herring.” “You have zero privacy anyway,” But today in conversation with one of the participants, Antoine Lizee, about how can we get people’s Medical data available to medical researchers I realized that McNealy was correct.
We both felt that if medical sensor and other data from millions of people could be made available in some open source form to researchers, huge breakthroughs in medical science would quickly emerge just from modern data mining, machine learning and statistical processing. Of course the issue of privacy came up almost immediately. But from his experience and from recent news that even anonymized DNA sequences can be traced back to an individual’s identity. So even with anonymization, its almost impossible to completely protect an individual’s identity in light of modern big data techniques.
I wondered, “Actually, what are people real fears about their Medical Data getting out?”. Is it any different than 5 or ten years ago when people said they would NEVER use a credit card on the Internet
Dilbert by Scott Adams January 11, 1996
How long and what would it take to have a similar change in mass mentality that would allow folks to not mind that their medical data might be use “on the Internet”?
Big Data Processing of Medical Sensors: A Solution to Rising Health Care Costs
I have long believed that if the data collected from medical sensors and the burgeoning world of the Quantified Self could be aggregated and made available to researchers (and not just “medical researchers”) we would enter a new golden era of medical breakthrough and real cures for major illnesses.
With sample sizes of MILLIONs instead of the 10 to 100 people in most modern medical studies, just using statistical processing on the billions of data samples patterns of health and illness will practically just apear. That alone would make it worthwhile for us to do it, the government (or insurance companies) to fund it and for individuals to feel there would be a value to allow their data to be aggregated. Even if it meant that their data may leak out.
And using similar Big Data techniques used today to sell more stuff on the Internet (like we did at my last company Runa), we could map some of those discoveries of patterns back to the real time processing of individual’s sensor data to let them know if their personal real time data stream indicates they are about to have a heart attack or something.
Just Do It
My conclusion was that we need to break the logjam and start some projects that demonstrate how powerful it will be to do open Big Data medical research using aggregated data. One way would be to get companies with silos of Quantified Self and similar data to make it available (with permission from the individuals) to open medical research. I’m sure there are other short term ways the community can come up with to show that this kind of research can have huge positive results.
Legal Protections are More Viable than Technical Protections
Mechanisms, that can be publicly audited, should be made to make the data a anonymized as possible but as mentioned earlier, the nature of medical data is inherently personally identifiable, especially if drawn from multiple sources and linked with other publicly available personal info (aka Facebook and the like).
There can be huge benefits of allowing at least some explicit linkage of personal data to the person. The primary one would be to allow the processor of the data to notify the individual if they found patterns that would indicate a medical problem or would predict a high probability of a future medical problem.
So we need the aggregators and users of this huge pool of data to be responsible and we need to make sure individuals don’t have to worry about discrimination or other negative impacts of their medical info leaking out.
This is much more a legal issue than a technical one. There already is The Genetic Information Nondiscrimination Act of 2008 and related laws that protects Americans from discrimination based on their genetic information in both health insurance (Title I) and employment (Title II). Just as the laws and policies of banks limits the risk of using your Credit Card on the Internet makes people much more comfortable, we need appropriate laws and corporate policies to allow people to feel comfortable sharing their medical and personal sensor data as well.
So if we could implement as much technical and legal means as possible combined with the huge individual and collective win of using machine learning and statistical processing on the huge corpus of personal & medical data that is already being collected by individuals, we could come up with major new cures and solutions to age old health problems and solve the US core economic problem (Health Care Costs) in one very low cost way.