Human Genomic Data: Considerations for Next Generation

H UMAN G ENOMIC D ATA
Considerations for Next-Generation Computational Services
CASC
31 March 2015
Matthew Trunnell
Broad Institute
31 March 2015
1000 Genomes
A Deep Catalog of Human Genetic Variation
The goal of the 1000 Genomes Project is to find most gene<c variants that have frequencies of at least 1% in the popula<ons studied. hBp://www.1000genomes.org/ The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the applica<on of genome analysis technologies, including large-­‐scale genome sequencing. hBp://cancergenome.nih.gov/ Human Microbiome Project (HMP) [has] the mission of genera<ng resources enabling comprehensive characteriza<on of the human microbiota and analysis of its role in human health and disease. hBp://nihroadmap.nih.gov/hmp/ Cost to sequence a genome
$1,000,000,000
$10,000,000
~100,000x
$100,000
$1,000
$10
Network storage capacity: 21 petabytes
Computational resource: 8000 cores
Considerations for genomic data
 Regulatory
 Ethical
issues
issues
 Technical
issues
The regulatory landscape is complex
¡  General data privacy laws
§  e.g., EU Data Protection Directive
¡  Protection
of personal health information
§  e.g., HIPAA
¡  Protection
of human subjects in research
§  e.g., the “Common Rule” (45 CFR 46 subpart A.)
None of these directly addresses genomic data.
Regulatory Issues
Ethical Issues
Technical Issues
Human genomic data is special
¡  Your
genome encodes a lot of information
about you.
§  Physical attributes
§  Risks of disease
§  Ancestry
§  …
¡  Your
genome contains information about
your parents.
¡  Your
genome contains information about
your offspring.
Regulatory Issues
Ethical Issues
Technical Issues
Genomic data is not generally PHI
Divorced from all of the HIPAA identifiers
and other patient information, genomic
data is considered be de-identifiable.
Regulatory Issues
Ethical Issues
Technical Issues
Genomic data is not de-identifiable
Regulatory Issues
Ethical Issues
Technical Issues
Ethical protection of human subjects
¡  1946
Nuremburg Code
¡  1975 Declaration of Helsinki
¡  1979 Belmont Report
Key principles governing research with human subjects
•  Respect for persons
•  Beneficence
•  Justice
Regulatory Issues
Ethical Issues
Technical Issues
Informed consent
¡  The
consent process allows research subjects to
place restrictions on how their data may be used.
¡  Data
use may be limited to
§  Specific diseases
§  Non-commercial use
§  Special populations
Regulatory Issues
Ethical Issues
Technical Issues
Use of genomic data is context-dependent
•  Permissions: business rules –  “Since the trial is s<ll ongoing, I don’t want anyone to see it.” •  Consent: ethical rules –  “The donor wants her data used only for non-­‐profit cancer research.” Regulatory Issues
Ethical Issues
Technical Issues
Tracking consent is important
Regulatory Issues
Ethical Issues
Technical Issues
Maintaining genomic privacy may be important
Regulatory Issues
Ethical Issues
Technical Issues
Understanding one genome requires many genomes.
Regulatory Issues
Ethical Issues
Technical Issues
Approaches to maintaining data privacy
Homomorphic encryption
Compute directly on encrypted data:
operation(n) == decrypt(operation’(encrypt(n)))
Secure multi-party computation
Perform a joint computation across
distributed private data without sharing
those data.
http://www.humangenomeprivacy.org/2015/about.html
Regulatory Issues
Ethical Issues
Technical Issues
Files must give way to APIs
At large scale, the file/folder model for managing data
on computers becomes ineffective as a human
interface, and eventually a hindrance to programmatic
access. The solution: object storage + metadata.
Regulatory Issues
Ethical Issues
Technical Issues
Standards are needed for genomic data
“The mission of the Global Alliance for Genomics
and Health is to accelerate progress in human
health by helping to establish a common framework
of harmonized approaches to enable effective and
responsible sharing of genomic and clinical data,
and by catalyzing data sharing projects that drive
and demonstrate the value of data sharing.”
Regulatory Issues
Ethical Issues
Technical Issues
Considerations for computing services
¡  Provide
training in human subjects for IT and
research personnel
¡  Maintain nformation security controls
commensurate with HIPAA
¡  Invest in data engineering to develop new
capabilities
¡  Develop pproaches to access control and auditing
that address data use restrictions
¡  Create data services that transcend filesystems
T HANK Y OU