"We now have the ability to gather huge amounts of data. This ability seems to carry with it certain cultural assumptions — that everything that can be measured should be measured; that data is a transparent and reliable lens that allows us to filter out emotionalism and ideology; that data will help us do remarkable things — like foretell the future."
Brooks goes on to show how data can be used to show that "our intuitive view of reality is wrong," for example, with regard to basketball foul shooting, the effectiveness of political campaign spending, and the relationship between the use of the world "I" and self-confidence.
However, Brooks nowhere mentions the vast amounts of data being accumulated by the life sciences.
I am privileged to work as an outside consultant to Compugen, which is revolutionizing the manner in which new drugs and diagnostics are discovered, and which has spent the past decade building an infrastructure of proprietary scientific understandings, predictive platforms, algorithms, and machine learning systems for the in silico (by computer) prediction and selection of product candidates.
In a recent press release (http://cgen.com/press-releases/189-compugen-announces-development-of-nexgen-infrastructure-platform-for-analysis-of-next-generation-sequencing-data), Compugen described a new infrastructure platform that it had created for the analysis of massive quantities of next generation sequencing data to enhance the company's infrastructure for predictive drug and drug target discovery:
"The high demand for low-cost sequencing has provided strong incentives to develop 'next generation sequencing' technologies. These technologies rely on various methods to parallelize the sequencing process, thus producing millions of sequences simultaneously, with exceptional speed, yield and specificity. Such high-throughput sequencing technologies, known as parallel deep sequencing technologies, are lowering the cost and increasing the output of DNA and RNA sequencing (RNA-Seq) by orders of magnitude beyond what is possible with standard dye-terminator methods. Furthermore, these technologies in theory should facilitate various types of analysis, such as transcriptome analyses, de-novo assembly of genomes, identification of single nucleotide DNA mutations, and more. However, the massive amount of raw data generated in this form also imposes huge analytical challenges to obtain meaningful and accurate information."
In short, the life sciences are creating oceans of data, and the primary challenge is no longer collection, but rather analysis.
New insights regarding foul shooting percentages? Quite honestly, I don't particularly care. Of far greater significance is our proximity to deciphering billions of years of evolution and the complexity of life, resulting in new medicines and diagnostics for unmet medical needs.
Now if we can only prevent ourselves from destroying the planet as this revolution in life sciences, premised upon unprecedented data collection and analytical capabilities, unfolds.
[As noted in prior blog entries, I am a Compugen shareholder, this blog entry is not a recommendation to buy or sell Compugen shares, and in September 2009 I began work as a part-time external consultant to Compugen. The opinions expressed herein are mine and are based on publicly available information. This blog entry has not been authorized, approved or reviewed prior to posting by Compugen.]