Nov 04, 2015 | Big Data: The Promises and Problematics of Prediction


By virtue of big data, we are being offered a dizzying array of predictive possibilities unimaginable a generation ago. If a crime has occurred in such and such a place, it is probable that others will be committed in the same area (predictive policing). If a student presents with a given profile, it is likely that she will run into trouble within a year at university (educational data analytics). If an infant displays a particular genetic disposition, it is likely that he will become antisocial. In a world where correlation is cast as causation, a core political and philosophical task is to understand what it means to put our faith in the prophets of big data. In this talk, from the Council for Big Data, Ethics and Society, Geoffrey Bowker and Jacob Metcalf will explore with us the landscape of prediction in big data.

Geoffrey Bowker, Professor of Informatics, University of California, Irvine

Jacob Metcalf, Researcher, Data & Society Research Institute

November 4, 2015 | 4:00-6:00 PM | Physical Sciences Building 305

Jan 22, 2014 | Science and Justice in an Age of Big Data: Biomedical Privacy & Genomic Openness

Peter Yu, David Haussler and Jenny Reardon Discuss the Meeting of Biomedical Privacy and Genomic Openness

On January 22, 2014, the Science & Justice Working Group is hosting the first in a series of ongoing conversations about the unresolved issues raised by the recent push to expand efforts to collect and aggregate biological samples and data.  Jenny Reardon (Science & Justice Research Center Director and Associate Professor of Sociology) will facilitate a conversation between Peter Yu (incoming President of the American Society of Clinical Oncology and Director of Cancer Research (ASCO) at the Palo Alto Medical Foundation) and David Haussler (Director of the UCSC Center for Biomolecular Sciences and Engineering).  Peter Yu is a renowned medical oncologist and hematologist who has pioneered the advance of health information technology and its use to improve medical care.  The American Society of Clinical Oncology is the world’s leading professional organization representing physicians who care for people with cancer, and has played a lead role in erasing the stigma around cancer through developing and sharing knowledge that promotes cancer prevention and treatment.  In March of this year ASCO announced CancerLinQ, a major effort to collect data on hundreds of thousands of cancer patients to further advance cancer research and treatments. David Haussler is a pioneer in the field of bioinformatics whose group assembled and posted the first working draft of the human genome on the Internet, and is now innovating computer algorithms that will enable the use genomic data in the transformation of cancer care. In June of this year, Haussler and his colleagues announced a “Global Alliance” to foster the sharing of genomic and clinical data that CancerLinQ and other similar efforts require.  Yu and Haussler will discuss the challenges and opportunities raised by efforts to harness big data approaches to biomedical research.

As both Yu and Haussler are keenly aware, aggregating patient tissues and data raises entangled ethical and technical concerns. Finding the proper balance between personal privacy, medical and scientific autonomy, and equitable public benefits is at the heart of multiple recent controversies, including the sequencing and subsequent publication of Henrietta Lacks’ genome and neonatal blood biobanking.  These episodes make clear that the ability of informatic technologies to broaden and deepen the analysis of personal data raises issues that go to the heart of democratic governance. As a society, we have long associated personal control over our own bodies and privacy with full citizenship. Yet we also highly value transparency and knowledge sharing and view both as critical aspects of an open society, and as necessary components of scientific progress.  Today, as aggregated biomedical data become both more useful and more risky, we confront a difficult conflict between the value of privacy and the value of openness.

In an age of widespread social media usage, it is an increasingly familiar task to balance these values in our daily lives. Yet Science & Justice Research Center Director Jenny Reardon recently experienced this tension between privacy and openness in a surprising new way when she had an appointment with a physician at UC San Francisco (UCSF) and was asked to sign a UCSF Terms of Service Form.  That form told her that she “understood” that UCSF could use her tissues and/or medical data in research and that she had no property rights in these tissues/data. Despite being an expert in biomedicine, ethics and society, she found that she did not know what she was being told she “understood” in order that she might receive the medical services of UCSF. Reardon reflected on this experience in an article entitled “Should patients understand that they are research subjects?” that appeared on March 2, 2013 as the cover story for the San Francisco Chronicles Sunday Magazine Insight. This article circulated widely, and resulted in Yu contacting Reardon, establishing an ongoing conversation about the future of medical privacy, trust, and informatics.

At the heart of problem is a confusing mix of U.S. case law that denies ownership over one’s bodily tissues once they have left one’s body, medical privacy standards that require providers and researchers to inform you that they may use the tissues for research without directly requesting permission, and the speed at which medical advances are occurring. Given these conditions, it is more difficult than ever to know what one is agreeing to when one signs ubiquitous Terms of Service and informed consent forms.

The San Francisco Chronicle editorial board published an editorial along with Reardons article that suggested that the US Department of Health and Human Services revise its standards for medical consent. The editors proposed that the HHS standards foster full disclosure and clear communication with patients that more fully addressed questions of who will own and benefit from the collection and distribution of tissues and data.  They also published a response from UCSF’s Elizabeth A. Boyd, associate vice chancellor for ethics and compliance, and Daniel Dohan, associate professor of health policy and social medicine. Boyd and Dohan argued that the success of personalized medicine rests on relationships of trust between physicians, researchers and patients. They noted that UCSF supports revising consent standards, and cite the recent creation of EngageUC, an initiative on the part of UC physicians and faculty to develop new comprehensive guidelines.

Within a few weeks of this discussion in the San Francisco Chronicle, The New York Times published an Op-Ed by Rebecca Skloot (author of bestselling book The Immortal Life of Henrietta Lacks) that called for the development of international standards to protect the privacy of genetic data.  This call followed in the wake of the sequencing and publication of the HeLa cell line genome without the consent of the family of Henrietta Lacks, the African American woman whose cells were used to make the cell lines (again, without her consent). The New York Times followed with an article a few weeks later that discussed the concerns of Senator John D. Rockefeller IV, Chair of the Senate Commerce, Science and Transportation Committee, about patient privacy and the lack of transparency on who has access to patient health data. More recently the NIH’s chief Francis Collins personally worked with Lacks' family to develop a protocol for accessing the HeLa genome data that aimed to balance researchers’ needs with the family’s desire for privacy.

This discomfort bubbling up on the national stage has led to calls to change the Common Rule (the set of laws that govern federally-funded biomedical research in the U.S.), which currently allows for collection of tissues and data from patients as long as anonymity is maintained. It has become clear that anonymity in an age of openness is at best an uncertain policy instrument.  In addition to its technical limits, anonymity does not address the underlying concerns about who will be served by the mining of genomic and health data, and how concerns about privacy, property and justice can be addressed while fostering the creation of new knowledge needed to advance medical care.  Both Yu and Haussler are leading efforts that seek to do a better job fostering innovative research while attending to these fundamental ethical and policy issues, knowing that we need to do both if we are to advance cancer research and care.

The conversation between Peter Yu and David Haussler, facilitated by Jenny Reardon, will be the first of several dialogues planned by the Science & Justice Research Center that aim to help clarify these issues at stake in the evolving relationship between openness and privacy in the biomedical sciences.

Wednesday January 22, 2014 |4:00-6:00PM | Engineering 2, Room  599

"Science & Justice in the age of Big Data: A Conversation between Peter Yu and David Haussler"
SJWG Rapporteur Report
22 January 2014
Rapporteur Report by Lizzy Hare
This event was the first in a series of events on justice in an era of big data, one of the
Center’s themes for the year. The working group meeting was a conversation between Peter Yu
(incoming President of the American Society of Clinical Oncology and Director of Cancer
Research at the Palo Alto Medical Foundation) and David Haussler (Director of the UCSC
Center for Biomolecular Science and Engineering) about genome data and the future of cancer
research. Julie Harris (Assistant Adjunct Professor at UCSF, Institute on Health and Aging;
Staff Scientist at Kaiser Permanente Division of Research; and Associate Director of the Center
for Translational Genomics and Ethics) provided commentary. Science & Justice Center
Director Jenny Reardon moderated the conversation and introduced the panelists.

Reardon’s introduction provided an overview of some of the concerns that the working
group hopes to pursue with this series. Genome research is seen as powerful, and cancer
research can now studies the genomic changes that occur during the development of cancer. The
techniques that were developed in Haussler’s lab to understand the human genome are now being
used to think about cancer and evolution. This kind of genomic research would benefit greatly
from additional data that could be collected from cancer patients, but doing so raises ethical,
epistemological, and infrastructural questions. As a society, we have yet to figure out what to do
with big data. At present we mostly collect data of unknown significance, but there is no clear
precedent for who governs it, how to store it, or how to make sense of it. Who pays to store it?
Who gets to work with it and try to make sense of it? As the first of many working group
meetings to discuss this issue, the goal for this meeting was to outline which of these questions
needs further discussion.

Peter Yu spoke first, speaking from the perspective of a doctor practicing clinical
oncology. He described the efforts of the American Society of Clinical Oncology (ASCO) to
accelerate learning and analysis through computerized health care. They envision a rapid
learning system model, which would allow clinicians to generate new data and better models
while treating patients, by incorporating data from clinical practice instead of just clinical trials.
Such a system would require that patients and doctors be willing to share data, and it presents
problems for managing data, such as standardizing it and safeguarding it in a centralized
repository. The organization is still in the process of funding these efforts, but they are trying to
address these ethical and epistemological questions before they arise.

David Haussler followed up by first thanking Yu for his organization’s efforts, which he
sees as a tremendous boon to cancer care. Speaking from the point of view of a data scientist,
Haussler argues that big data is absolutely necessary for cancer research. Most mutations are
insignificant and very few are meaningful, so in order to establish a clear understanding of the
drivers of cancer, there needs to be a large number of genomes available to work with. These
numbers are unobtainable in the current system of clinical trials and academic research, but they
would be accessible if information could be captured from clinical practice. Haussler is hoping
that under Yu’s guidance the ASCO will be able to incorporate data collection into medical
practice. Yu agreed, saying that he believed the “holy grail” for cancer research would be a
healthcare system that engages people in research without sacrificing their rights.

Julie Harris provided her comments at this point, reminding us that the strong division
between research and clinical practice was established as a response to the Belmont report. At
the time the ability to distinguish between the two was useful, but times have changed. Big data
has brought new challenges, but a lack of community involvement in research continues to be a
concern. With Kaiser, she has been involved in a project to build a biorepository that members
can volunteer to donate samples to. The samples are linked to clinical records and environmental
databases so that they may be used to research gene-environment interactions. This project has
been successful so far. Harris attributes at least part of the success to a community advisory
panel that brings together diverse representatives of the public. According to Harris, many of the
participants don’t fully understand the program but trust Kaiser to use the information in a way
that may benefit them someday.

Trust was a primary concern during the question and answer session. Some audience
members were concerned that storage for big data is not secure; that that the information could
be accessed by governments or individuals with malicious intent. This is especially problematic
when the information could easily be de-identified using phenotypic information. Yu mentioned
that there had been a study on establishing and maintaining trust around research samples, and
that most people were more concerned with what the information was used for than who was
using it. This is troublesome for two reasons, one, because it is difficult to anticipate what the
information might be used for in the future, and two, it is not always clear who the “who” is that
might eventually use the data, as institutions are increasingly amorphous. Researchers often try
to maintain trust by assuring donors that the samples will be used for good, but the notion of
good is itself abstract and a part of the question of justice that the working group will continue to
explore at future events.