Conference hosted by the Science & Justice Working Group Conference
sponsored by the UCSC Office of Research, and the UCSC Cancer Genomic Hub
With a human genome sequenced and a map of variable sites in that genome created, governments and many other public and private actors now seek to make genomic data relevant to health, medicine and the society. However, to do so they must navigate the conjunction of two different approaches to data. Within the biomedical domain there are important, well-articulated infrastructures and commitments arising out of concerns about individual rights, patient privacy and the doctor-patient relationship that limit access to biomedical data. This stands in stark contrast to the culture of open access forged by those who worked on the Human Genome Project, and that has continued to be a central commitment of ongoing Human Genome research. Thus, architects of the genomic revolution face competing, complex technical and ethical challenges that arise from this meeting of these domains with substantially different ethos. Additionally, the rise of social media has led to a broad and contested discussion about the proper relationship between persons and data and who profits through access to it.
The workshop will map out the challenges of building and controlling genomic data architectures that are responsive to these conditions. Rather than suggesting that either openness or privacy is the answer, the workshop will ask which kinds of openness and privacy might be possible and adequate, and in which contexts? Further, who has the authority to decide? Who can/should authorize the flow of data and what forms of consent are required? What kinds of flow of data should be allowed (e.g., ones that lead back to persons, etc.)? Finally, the workshop will consider questions around where and how data should be accessed. Is “the cloud” a viable option? What other options exist to manage deluging data, and what ethical and material challenges do they present?
Hosted by Jenny Reardon, Associate Professor of Sociology, UCSC
Co-hosted by Bob Zimmerman, Program Director, UCSC Cancer Genomics Hub,
David Winickoff, Assoc. Professor of Bioethics and Society, UC Berkeley
Malia Fullerton, Assoc. Professor in the Department of Bioethics & Humanities, University of Washington School of Medicine
Mike Keller, Director of Technology and Software Development, Sage Bionetworks
Are You My Data?
SJWG Rapporteur Report
8 May 2012
Opening Remarks by Jenny Reardon
After acknowledging supporters of the meeting and thanking the speakers, Reardon
opened with a discussion of the new relations between data and bodies. We often
imagine that data makes bodies frictionless and movable, that by transforming tissue or
other material into information we can overcome the restraints on the movement of
physical material. Reardon suggested that this belief in the frictionless nature of data is
misplaced. Rather than resting on the generality of that belief, she pointed to the
unasked question of what kind of particularity about bodies do we lose when we turn
bodies into data, and what particularities do we want to keep?
We still live in a world where embodiment matters to governance, where bodies are
what matters most. The apparatuses built to protect data privacy are ultimately about
protecting ‘body’ privacy—it is the data attached to bodies that is the most sensitive,
largely because the histories of bioethics is directed toward protecting bodies. This
history hitches governance to place, which causes conflict when data is imagined to be
placeless and able to move without friction. Now we find that a lot of effort is put into
studying data and not highly-constrained bodies.
Data isn’t bounded by obvious physical limits, this ultimately ups the ante for dealing
with governance. We have millennia of thinking about ourselves as bodies, and only a
few years of thinking about ourselves as collections of data. Once the body is rendered
as 0’s and 1’s, should that data be able to go wherever the Internet goes, or does it
need to be made bounded and emplaced like the body is in order to make sense of it
under our current modes of governance?
This conflict is especially potent in biomedicine and bioinformatics. Early networkers are
much more likely to think of data as something freely moving wherever we wish.
Biomedical researchers tend to feel that the data is embodied somehow and expect it to
receive the same care granted to the bodies from which it originates. This leaves us
with the questions of how are we going to recontextualize data? Who has the power to
decide these things? What the spaces for making decisions, who would to turn toward
to ask them?
Panel 1: Collision of Privacy and Openness
Panel: David Winickoff (UC Berkeley, Environmental Sciences Policy and
Management), Bob Zimmerman (UCSC Cancer Genomics Hub)
Discussant: Greg Biggers (Genomera)
Reardon asked the panel to consider what happens when the culture and
infrastructures of medical privacy collide with the practices of openness found in
Winickoff recalled a conversation he had with Jamie Hayward, founder of Patients like
Me, a website that collects and shares patients’ experiences and outcomes with drug
regimens in a social networking-like format. Hayward told Winichkoff that, "Bioethicists
have killed more people in the past year than adverse drug effects" because of the
medical privacy policies that slow down research and prevent sharing of information.
His brother had ALS, and in order to address the slow pace of ALS research Hayward
set out to create online network of ALS sufferers to upload as much personal data as
they were comfortable with, and then connect with pharmaceutical companies and other
patients. Hayward strongly believes that there is a need to work around the biomedical
discovery system that limits connections between patients and researchers every step
of the way. Winickoff asked if physicians are important gatekeepers of medical records
and was laughed off by Hayward.
Winickoff questioned why people are now talking about privacy and openness of
medical records with much more intensity. The dominant way to protect data traditionally
has been anonymization and deidentiication, which is impossible to promise with any
certainty, at least given the current state of informatics. The options appear to be just
coping with non-anonymity or harden our systems for privacy. Typically our culture
frames science and technology as moving society forward, as if they are separate
entities with one acting on the other. But co-production or co-evolution model shows that
categories cross between the two, a hybridity of science and society. The very
categories of de-identification and anonymization have technical definitions that must be
articulated to understanding of what a person’s identity and what exactly anonymity
protects. Commonly, a human subject is defined as someone identifiable, and once
deidentified is no longer a subject and thus is in different legal category. While these
categories are socially and legally produced in the first place, technology makes them
fuzzier than expected.
New models of identity and privacy have been generated by patient advocacy. GWAS
has largely disappointed those who expected quick therapies. Now the idea seems to
be make ever larger data sets to make analysis more powerful. There are now many
efforts to use crowd sourcing to make databases bigger, but that requires moving more
'private' data across borders between informatic systems. This is co-emerging with
patient advocacy that leverages the control patients have over their own data and
contributing to discovery of disease. Some efforts have decided to flout privacy
explicitly, and reject research subject and protection apparatus in favor of speed, scaling
up, and netwoking. This puts pressure on traditional models of thinking about subjects
and privacy, ultimately shifting control over data in new ways. For instance, the Personal
Genomes Project, claims that the risks to medical privacy are overblown and has
recruited subjects who disclose everything and. This is not just an experiment in new
ways of making scientific knowledge, it is also an experiment in making the private
Now there is a question about these complex emergences. We care about protecting
subjects and privacy on some level, but when there is this complex interpenetration with
new tech, how do we hold the line on cherished ideals and maintain flexibility in the face
of change? In this new moment of co-evolution, we have choices to make in terms of
what we want to hold tight and what we want to let go—we could define things rigidly or
become extremely flexible. However it's hard to be rigid when the basis of cherished
ideas is undercut.
A second issue is the political economy of data. We should see data and network
architecture as a field of power, where resources are distributed in particular ways. We
imagine privacy as shielding, openness as flatland. Openness and privacy allocate
power as they are embodied in policies. So it's not a question of 'openness or not'.
Things are open in certain ways, closed in others and we need to track continuum of
Furthermore, openness is as much a function of property as it is of privacy. Both privacy
and property are about control and access. To what extent do researchers have
property in data? Protected under law, and constrained by confidentiality agreements.
But property rights are allocated in this stuff. Another form of romanticism around the
tainting of the moral economy of sciences. Despite common narratives of science being
founded on radical openness and common access to knowledge, there has always
been secrecy in science historically and secrecy creates incentives for discovery. It is
still going on now, but now we have architectures of sharing that far outstrip our ability
or desire to share. Since the mid-90s patient groups have assembled their own data. By
restricting access to databases, they create higher value and incentivize research by
pharmaceutical companies. This should complicate our moral intuitions about openness
because closedness might be a tool that we as individuals can leverage on our behalf.
Although there is a lot of pushback against their regulatory role, we should recognize
the importance of mediating institutions. Currently, patients can choose not to
participate, or else participate completely on the powerful organizations’ terms. So the
pressing questions is how to make it possible to do research on patient's terms. When
we think consent happens at a single point of time that authorizes all future research, it
is necessary to have someone paying attention to what is done with data and be
accountable to it.
Zimmerman started by arguing that sharing is fundamental to evidence-based medicine.
Sharing data is essential to create feedback loops to evaluate the long term
effectiveness of experimental therapies. To judge the effectiveness and safety of
therapies, we need more than a snapshot to understand a disease and it is necessary
to look at longer terms processes of disease. For this, research efforts can't just have
individual patients and need to see people in clusters by disease or organs.
The recently launched Cancer Genomics Hub is hosting data from three large sources.
It then analyzes of how cleanly and clearly tumor types are differentiated on a genetic
level. This kind of research produces new ways of differentiating tumor types. Looking at
the genomic basis of the disease, you see very clearly delineated types and different
successes with therapies. Among the early discoveries is that there are surprisingly
varied forms of breast cancer.
In order to do this kind of comparative work, we need to share data to cobble together
large enough data pools. This requires figuring out how to move beyond the culture of
shielded privacy that is enshrined in medicine and create a culture of sharing and
participation that has openness and trust. It is obvious at this point that legislation alone
is not enough. We can legislate protections, but there are much larger problems with
cultures and values, which is almost as challenging to understand and intervene in as
disease processes themselves. Genomics looks promising for analyses based on facts
and statistical analysis. But how do we develop that culture of trust? Data sharing of
mutations is necessary for treating disease processes, but to really get robust therapies
we will need more than single points of data when the patient comes in for treatment.
If we look at history of science, there has been much secrecy, but there is also deep
roots of sharing, such as foundations of journals. The commons cannot protect itself, it
must be maintained and cultured and protected from abuse. When we start to think
about patients in the future we should be guided by Brian Gibson’s insight that, “The
future is already here, it is just not evenly distributed.” This is evidenced by the fact that
we have seen several prominent scientists share publicly their own personal ‘omics’
profiles, but these intensive efforts are not currently available outside of their well-funded
labs (see http://med.stanford.edu/ism/2012/march/snyder.html and http://
One particularly tricky area to navigate is the difference between a donor and a partner.
Two places we see this being worked out is Stand Up to Cancer and the US Office of
Research Protection’s proposed changes to the Common Rule (the standard privacy
protection protocol that guides all medical research). Most scientists don't have time and
access to legislative analysts, but for however hard they think it is, patients find it harder
to deal with.
The overly ambitious goal of public health has been how do we reduce disease,
disability, and untimely death. That's what genomics is really about. As we are starting
to open the box of medical genomics data, we need to have access to the patient data
to fulfill this and really understand how environment influences gene expression. There
is an emerging awareness of cancer as not just a genetic disease, but a failure of body's
regulatory systems to deal with mutations and their consequences. We need social
collaboration networks with raw data collaboration networks and interdisciplinary work.
Zimmerman offered three closing questions:
1) As we try to aggregate data, how do we try to protect individuals. What fears are real
and what fears are bogey men?
2) How can we acquire and analyze enough data to improve public health and human
capital? Researchers want and need open data culture and we are seeing early
moments of self-monitoring and sharing in the ‘self-quants.’
3) We need better means for sharing and consent. How do we build these effectively?
What are the best ways of building culture of sharing and trust? If we can harness
science to build better understanding of disease, we can have much better lives.
Biggers spends most of his time as entrepreneur, yet many of these ideas are always
on his mind. There is a new tide of health research that is ultimately asking questions
about equity as it affects patients, access, public health. At the same time, there is a
collision of privacy and openness. When it comes to collisions, there will always be a
person investigating has to figure out fault, or who is the collider and who is the colidee?
Thus it is important to get into the locus of control of these conflicts. "Complex
emergences" is good phrase for how this tide has medicine going from policy-driven to
patient-driven research. The last 15-20 years of health rhetoric have been about
protecting privacy, but shouldn't we also be helping people express rights to property?
Anxiety over sharing and openness is largely about the arrival of new technologies for
sharing. At the core of the problem is the locus of control, whose will is being expressed,
setting up a conflict between an act of protection and an act of expression. Trust is a
much bigger issue than an action that takes place at time of consent. The Kaiser case
study shows trust as operationalization of a concept, not a single event. We should not
back into the corner of "how do we talk people out of their data?" Does the power of
data make us feel compelled to grab at the new data? Much of the difficulty the world
has right now about opacity is about replication, a core value of scientific practices.
Openness allows many more people to get into replication of results. Bench to bedside,
to bench, to bedside, to bench, to carside, to pharmacy, to bench. etc.
Ted Goldstein: Where is the sense of disgust that people are being prevented from
BZ: Pace of sharing is very frustrating. The fact that only 20% of reimbursable
procedures have gone through double blind study should be scary. Doctors as they are
now trained cannot deal with people who want real information. And reimbursement
structures cannot support it.
DW: By working in a regime of total exchange, where everything goes in, there are
other actors who can take advantage of my data in ways that I never could. The
commons doesn't actually benefit people equally, people have differential capabilities to
extract value from commons.
GB: This problems is less about control of information than it is about control of value.
Commons is based in real estate, so need to be careful. Tissue is corporeal, and the
commons is often extra-corporeal. Have we gone too far toward the individual?
Emergence of new kind of engagement and collaboration that dissolves some of the
boundaries between researchers, collaborators, subjects, and participants.
Second Panel: Creating and Sustaining Trust
Panel: Malia Fullerton (University of Washington School of Medicine) and Mike Kellen
Discussant: Warren Sack (UCSC)
Jacob Metcalf introduced the second panel and asked the panel to consider what
practices and infrastructures are necessary to create and sustain trust over time in data-intensive
Fullerton argued that facilitating respectful ongoing engagement in research process will
be important moving forward. There is a widespread assumption that to be ethical in
science is to share widely and be open. But this must be tempered by the knowledge
that we are sharing things that belong to people who are largely absent from daily lab
In this regard, bioethics has perhaps overemphasized the concepts of beneficence and
non-malfeasance, neglecting other aspects of the process of producing medically useful
knowledge, such as how we convey courtesy and respect as research moves forward.
There have been significant consequences to the preoccupation with de-identification
as a proxy for ethical treatment of biomedical research subjects. With contemporary
medical research scandals we are seeing a common thread of patients reacting with the
feeling of, "I was participating in something and now things have changed and I wasn't
aware of the changes." Key examples of this are the Havasupai genetic research case
in Arizona, the discussions swirly around the HeLa cell line, and the Texas biobank that
misled the parents of infants whose blood samples were banked. Thus we are seeing a
pushback against the idea that de-identifiers are a solution to all problems.
Fullerton suggested that rather than simply removing ‘identifiers’ and moving forward
with the research, it is necessary to gather written consent on an on-going basis or that
explicitly clears the samples for other kinds of research. She cited a bioethics study she
co-authored (Ludman et al., 2010, http://www.ncbi.nlm.nih.gov/pubmed/20831417) that
demonstrated patients tend to have a strong preference to being re-consented when
their de-identified samples might be used for other forms of research. The less onerous
alternatives to affirmative re-consent—opting-out and notification-only—were largely
considered inadequate. The authors concluded that the best practices for re-consenting
the use of biomedical data treated the participants as stakeholders in the research,
including methods to keep the participants informed, ways of providing access to
information on how samples were being used on an individual and study-wide basis,
create transparent and accountable oversight processes, and provide opportunities to
Fullerton also cited a study by Kaye et al. (2012, http://www.ncbi.nlm.nih.gov/pubmed/
22473380) that identified the major challenge of consent in bioinformatics as making
'visible' research participants, whose DNA and health information are essential for
meaningful progress. Their research suggests that research participants do care about
how their data are used and wish to be kept informed (which is different from control).
This will require sustained investment in keeping in touch with patients.
Kellen introduced SAGE Bionetworks as a response to the concern that research was
being blocked by lack of access. SAGE’s founder, Steve Friend, was a reseracher at
Merck and found that he could not get innovations out fast enough despite all the
investments and power of the pharmaceutical industry. Working off the principle that we
will all be patients some day, he wanted to accelerate the pace of the medical discovery
process and generate more innovations. Thus SAGE has sought to pilot new ways of
doing research using the values of openness and transparency.
Kellen asked who is the privacy for? Is privacy for the patients' benefits? When privacy
becomes a technical question driven by paperwork, we can lose sight of the fact that
privacy is at its root a matter of people being concerned with dignity and respect.
Experience indicates respect and dignity matters more than privacy for people with
chronic disease. In many cases, privacy and consent procedures are built around the
physicians’ needs, particularly their desire to keep research proprietary, not the interests
of the patients.
SAGE operates on the assumption that speed in research systems is improved by
openness. John Willbanks, one of their directors, has focussed on the problem of
portable consent. Often data can only be used in one study due to the legal status of the
consent forms, and thus the full utility of data generated is not met. Researchers are
usually only trying to answer one question and do not build future utility of their data into
the experimental design and ethics procedures. Portable consent would be a key part of
any system in which participating in one trial results in data being shared with other
trials and/or be placed in the public domain. Widespread portable consent will require
consistent legal language that can be dropped into informed consent forms.
SAGE is aiming at treating portable-consented biomedical data like an open-source
software system. With the Synapse project they are attempting to track the history of
who has done what with which data. Synapse is modeled on GitHub (https://
github.com/), a central tool in open source coding that tracks versioning. Fast paced
biomedical research cannot wait for work to be distributed as papers, which are too
granular and take too long to write, review, and publish. Instead, Synapse aims to get
down to level of individual steps and small analyses. This helps establish trust with
patients because they can see that their data is well used and is transparent. It is easy
to wonder why patients won't share, yet they have more to lose so scientists need to
share more to also be at risk.
Kellen argued that we need a system of reward for researchers that does not encourage
keeping data proprietary and secret. The challenge of medical discovery should be like
Tour de France, with stage victories and not just a single race. Every step along the way
should be designed to build on each success. This is the model of the Breast Cancer
Predictive Modeling, which puts all the data in the public domain and asks who can
make the best software for predicting disease process. Each attempt is transparently
available to other designers.
SAGE’s forthcoming Bridge Project (http://sagebridge.org/about) is aimed at creating
tools to keep patients actively involved by engaging the patient community to provide
researchers an agenda. Patients are also participants who self-report data. This
changes incentives for people who do the research—this model is not about who ‘wins’
a research race, but enabling others to see what the techniques are and use them in
Playing off of the pun and logo used for the symposium, Sack claimed that the Personal
Genome Project is like the ‘Snort’ character in the classic children’s book Are You My
Mother? It reunites us with our family. That is one story we could tell about this field, it is
a set of tools for personal discovery. Sack suggested that in these discussions, the arts
could play the role of identifying the many positions of actors in personal genomics.
There is a wider diversity of people involved than we typically discuss: there are
funders, regulators, advocates, patients, researchers, families, undertakers, etc. Part of
the effort to generate respectful engagement could include the ‘game-i-fication’ of
research—the informatic technologies should be joined with narratives that make the
research meaningful to people’s lives. There must be engagement with research
subjects in a capacity as something more than a source for data. The challenge at hand
is how do people find a common cause to become a public? There are always many
different publics. Personal genomics stories have been personal stories about celebrity
genomes, such as Steven Pinker writing about his participation in the PGP. In the next
stage, there needs to be stories about the public good in order to create a public. The
arts should have a role in this.
Concluding Discussion: Future Directions
Bob Zimmerman pointed to the multi-dimensionality of all the issues and that tackling
these problems will require that we make a habit of locking everyone in a room together
for discussion. We will eventually be doing studies that need far better patient data
about a variety of things, like lifestyle, drug compliance, environmental exposures. By
the time it gets to us, very downstream of aggressive de-identification. Only looking at
500 tumors in each organ category are they actually able to make progress.
David Winickoff asked if we really need to get rid of privacy. It's used as a placeholder
for 'bioethics' but does not capture the rich possibilities for relationships between
people, researchers, and data. Can there be a shared kind of control? Privacy remains
important because the people who are comfortable with medicine and research want to
get rid of privacy protections. People who feel threatened shouldn’t be subjected to all
this terrible freedom. He pointed to the example of Iceland- people with mental illness
Jenny Reardon asked how do we get past an us vs them framework of ethics? Defining
characteristics of these conflicts is mistrust of large institutions. How do we re-narrate
the story to create common cause? How are we telling this story getting past big actors
and little actors?
Ted Goldstein argued that bioethicists have focussed on certain stories as policy
motivators, such as HIV and HIPAA. Bioethicists have reasoned from case studies
rather than large scale quantification. But there is little evidence of harm coming from
privacy breaches. If we don't actively correct racial bias in our genetic knowledge we will
further cement the bias in our medical system. We need to actively engineer the society
we want rather than just protect against possible harms. There is a responsibility to
share data with people we share genes with, everyone must be willing to give up a little
bit of risk in order to help others.
Greg Biggers reminded us that all medicine is experimental. There are fundamental
epistemic issues at play within the development of data-intensive biomedicine—how do
we know what we thing we know? We need to receive feedback from everyone.
Malia Fullerton pointed to widespread public narrative about personal genomics.
People have had conversations with their families who didn’t want them to participate.
Participation is getting negotiated on a family by family basis. Regulatory structures
unable to deal these relationships. She also warned against sloppily sliding back and
forth between patients and research participants. These categories are experienced
differently whether or not the research benefits you, particularly given the difference
between medical care and experiment.
Mike Kellen said we need a better sense of what incentives drive researcher behavior.
At some point health data will leak and what happens with the inherent power
differentials between researchers and subjects? How do you align ethical behavior with
Warren Sack noted that there is a big PR problem in the sense that it isn't clear how big
data mining project leads to helping my Uncle Joe. People trust a group where they
have a particular role to play. What is the public to which one belongs that you have
Reardon closed by suggesting three primary topical areas for future discussion:
Privacy and Property
She suggested that we need to rethink genomics away from cures and re-articulate it to
broader meanings. How do we create space for that discussion, about what it means
now and not just 30 years from now? Do people feel empowered as citizens in the world
that we are creating? We are not going to cure cancer tomorrow, and so need to
address how I live my mortal life now with respect. We especially need scientists to take
these question as integral to the scientific endeavor, not just side projects that come
after their research is done.