- Date published:
- Author:Brian Wood
The article below by Ryan McBride from FierceBiotech IT covers an original piece from Guy Cavet of Kaggle in Genetic Engineering & Biotechnology News (GEN).
We at AIS agree that bioinformatics is transforming the nature of genetic and pharmaceutical research — which is why we launched AIS ClearCompute to empower bioinformatics scientists to do their work in an affordable, secure, pay-as-you-go manner.
Data is necessary but not sufficient for scientific breakthroughs; it’s the analysis of said data that will yield patterns from gobbledygook and drive progress forward.
Emphasis in red added by me.
Brian Wood, VP Marketing
Why Big Data lacks the punch pharma needs
Pharma outfits have gobbled up Big Data sources on cancer genomes and transactional records, but intelligent use of data sets brings much more value to drugmakers, Guy Cavet writes for Genetic Engineering & Biotechnology News (GEN). Cavet, vice president of life sciences for Kaggle, covered how biopharma outfits have started to scratch the surface, analyzing and harnessing large amounts of data to discover, develop and market therapies.
Kaggle, where Cavet works, organizes online competitions in which researchers use data sets to come up with the best solutions to problems and compete for prizes. His organization is among a growing number of groups that are transforming the way biopharma companies tackle scientific and business conundrums with large data sets. But Big Data alone isn’t enough. Technological advances have drastically sped up the aggregation of scientific data, but this has not necessarily changed the slow and unpredictable process of bringing new therapies to patients.
“With drug development costs rising and approvals declining, new approaches are sorely needed,” Cavet writes. “It’s too simplistic to see ‘big data’ as a knight in shining armor, but the intelligent use of rich data, regardless of size, has the potential to help dramatically with problems from basic research to commercial operations.”
Cavet noted the following ways life sciences outfits have succeeded with intelligent use of data:
- Analytics. Banking the data is step one. The next step involves slicing and dicing the data to derive some value from it. While not mentioned in Cavet’s article, Mount Sinai School of Medicine and the software company Cloudera have been working together to develop tools for analysis of large and complex sources of patient data for research and discovery. One of the big ideas from the collaborators is to harness and analyze multiple sources of Big Data to guide and improve patient treatment.
- Competition/crowdsourcing. Scientists can be very competitive and most seem to love a challenge. Kaggle organized a competition sponsored by German drugmaker Boehringer Ingelheim that led to a better way of predicting small molecule safety. As Cavet notes, Netflix proved in 2006 that allowing outsiders to compete with each other resulted in an improved system for recommending movies to its customers. Kaggle is doing the same for science.
- Privacy. Whenever sensitive data are involved, security becomes a major concern. Cavet writes that there are ways for drugmakers to collaborate with outside groups on data-driven research without losing complete control of their data. In the case of Boehringer Ingelheim’s Kaggle competition, the pharma group was able to share data sets with the contestants without revealing the structures and activity profiles of the small molecules.
Big Data Won’t Save Pharma, But Smart Data Might
Data analytics has the potential to do much more if applied across the pharmaceutical enterprise
Intelligent use of large-scale data has become fundamental to other industries: finance, insurance—even sports. But despite its importance in areas of research, data analytics has the potential to do much more if applied across the pharmaceutical enterprise.
In the past 15 years, biology has been transformed by the availability of large-scale genetic and genomic data. The first ten years of work on the human genome yielded one draft genome. The last ten years have yielded over ten thousand. Advances in technology have enabled high-throughput gene expression profiling, cancer genome analysis, and other disciplines to change the way biology is studied. Cheminformatics allows companies like Numerate to screen millions of compounds for activity by purely computational prediction. However, there are much greater opportunities for data-driven transformation across the broader pharmaceutical enterprise.
These opportunities arise, in part, because of the broad trend toward data being tracked and recorded in new and far-reaching ways. Importantly, many of these are outside the pharma industry. Medical records are collected electronically on an unprecedented scale, driven in part by federal “meaningful use” programs. These records reveal how diseases manifest and how treatments are used in the real world. Social media also contains vast amounts of information on real patient experiences with both diseases and treatments. And in the sales and marketing of drugs, data on program effectiveness is collected in real-time by reps, and companies like Aktana are interpreting it to understand where physicians perceive value.
Getting data is only the first step. The true value arises from analytics that generate actionable insights. In many cases, this means predictive modeling: developing algorithms that reveal what drives an outcome of interest (such as response to therapy or drug choice) and allowing that outcome to be predicted in the future. The data scientists that can carry out this type of analysis are multidisciplinary experts with skills from statistics, computer science, biology, chemistry and other fields, and they are highly sought after.
The conventional ways to engage data analysts involve building internal teams of scientists or buying time from consultants. However, data analytics is also particularly well-suited to crowdsourcing, which opens up a problem for many people to address. It’s inevitable that most of the world’s experts in any domain are outside any single pharma company. Even with strong internal teams, as Bill Joy of Sun Microsystems insightfully noted, “Most of the smartest people work for someone else.” Crowdsourcing allows those people to be tapped in a highly flexible and cost-effective manner. A team of experts can coalesce around a problem, working on it only as long as necessary, and then move on.
Performance of the leading models over the course of a competition run by Boehringer Ingelheim to predict small molecule safety. Rate of improvement is typically rapid in the early part of a competition and then slows as competitors reach the limit of performance that the data will allow.
In 2006, Netflix used crowdsourcing to improve their ability to suggest movies to their customers. Rather than just inviting people to work in isolation, they set up an online competition in which people submitted entries in real-time and vied to come up with the best solution. This is a particularly effective approach to predictive modeling analytics. Seeing their rivals above them on a leaderboard drives people to continuously generate better results. In the Netflix competition, the company’s internal method was surpassed within six days, and the eventual winner was more than 10% better.The competition approach is equally applicable to pharmaceutical industry problems. For example, Boehringer Ingelheim sponsored a contest to develop methods to predict small molecule safety that resulted in a 25% improvement over an industry standard approach. In the Heritage Health Prize competition, methods are being developed to predict which patients will require hospitalization, and for how long, over the next twelve months. Other competitions have been used to predict patient outcomes, sales patterns, and clinical outcomes. In each case, the results were better than any methods that had previously existed.
The full potential of data analytics requires accessing and using data in creative ways. For example, after a drug launches, information about the drug is rapidly generated in the outside world through patient and physician experiences. This information is currently largely untapped. It is entered into electronic medical records, tweeted, posted on Facebook, and entered into community sites such as Patients Like Me. This data is often unstructured and very noisy, but companies such as Israeli startup Treato are beginning to systematically organize it. Despite the complexities of working with data like this, skilled data scientists can extract meaningful patterns about drug-drug interactions, what drives patients to start and stop medications, or which patients will not adhere to their prescriptions, to name a few.Predictive models even have the potential to tackle some of the most critical decisions in drug development, such as whether a clinical trial will be successful or whether a licensing deal will eventually lead to a drug. Billions of dollars rest on these decisions, but it is rare that all available relevant data is systematically employed to predict the probability of success. Of course, no algorithm can make such predictions with perfect accuracy, and no computation can replace a clinical trial. However, for an organization deciding between multiple costly development programs, having any improvement in ability to predict results is immensely valuable.
Putting data beyond the company firewall for outside experts to use may not be a natural step for organizations that are accustomed to carefully protecting their sensitive information. However, with the appropriate steps, the confidentiality and privacy of pharmaceutical and medical data can be carefully preserved. For example, when Boehringer Ingelheim sponsored a competition to predict small molecule activity, neither the structures of the molecules nor the specifics of the activity were revealed. In a competition to identify patients with type 2 diabetes using electronic medical records, the data was carefully de-identified to meet HIPAA standards. Privacy and confidentiality concerns can also be addressed by restricting access to trained and trusted individuals.
With drug development costs rising and approvals declining, new approaches are sorely needed. It’s too simplistic to see “big data” as a knight in shining armor, but the intelligent use of rich data, regardless of size, has the potential to help dramatically with problems from basic research to commercial operations.