J. Hesse - Computing & Communications

Three Critical Ideas for UC Health
Sciences Cyber Infrastructure
Joe Hesse
-
[email protected]
Director of Innovation, UCSF Memory and Aging Center
Technical Lead, UCSF Neuroscience Knowledge Network
HPC Cluster Administrator, UCSF Institute for Human Genetics
3/23/2015
Driving “Cyber” Needs for Health Science
In discovering causes and developing treatments for
disease; in promoting health, encouraging prevention, and
delivering care; we fundamentally need:
1.  To reason, compute, and discover as health professionals,
clinical researchers, social, and basic scientists, in any
combination of roles, at any time.
2.  To harness agile, cost-effective, and easy to use computational
infrastructure throughout the full lifecycle of our research and
clinical activities.
3.  To interact with colleagues through ubiquitous and collaborative
“data science” environments characterized by rich data and
method annotations, secure and audited sharing, and
transformational communication methods.
2
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Idea One: Develop and Deliver Regulatory Compliant
Service Layers for Research Cyber Infrastructure.
A Fundamental Problem:
 We practically force clinical researchers to abandon their clinical
role (and access to their patient’s data) before they use most
research computational infrastructure.
 Asking medical professionals to really “de-identify” data to meet
compliance standards (w/ associated legal liabilities) is
impractical. Results in a lot of “don’t ask, don’t tell behavior”.
A Key Opportunity:
 Current technology trends (e.g. agile dev-ops, platforms as a
service, software-defined everything) and the maturing open
source tools (often with enterprise options) makes it practical to
develop and deliver complex security and monitoring to research
infrastructure at commodity prices using existing university IT
capabilities.
3
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Delivering Regulatory Compliance through Mgmt & Orchestration
HPC Pilot Project FY 2015/16
(funding decision pending)
Designing a new unified management
and orchestration layer and providing
a single portal for access to three
distinct high performance computing
clusters (each currently serving
distinct user communities).
Opportunity to intentionally design
security, monitoring and auditing
layers with regulatory compliance as
a target.
Many benefits, but reduce user and
administration costs by standardizing
the most complex aspects of the
environments is a key driver.
4
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
High Performance Computing: Simple Schematic of Layers.
Common tools and service layers to support distinct HPC workloads and hardware
5
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Idea Two
Create Continuum of Research Cyber Infrastructure to
Support the Complete User Investigatory Experience
Fundamental Barriers / Problems:
 Most research computational infrastructure organized is around the
technology stacks that aim to meet cohering subsets of user needs,
but that fall short of addressing the complete analytic and discovery
lifecycle needs of complex investigations.
 Users often find it difficult to use idiosyncratically developed
technology solutions; find it impossible to navigate between these
infrastructural siloes; are usually unable to apply their funding in an
agile manner across support organizations; and frequently lack
knowledge about the most appropriate tools.
Key Opportunity
 Extend common, regulatory compliant, management and orchestration
layers to the full continuum of research cyber technologies.
6
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Vision of Continuum of Research Cyber Infrastructure
Common portal, access, and billing tools streamline user experience. E.g. Using a
secure reporting station to 1) query the EMR or research database for correlative
variables to 2) drive an exploratory neuroimaging analysis using a large virtual
workstation that will 3) become a pipelined HPC or GPU cluster analytic applied
retrospectively to 1000’s of image studies is not only possible but commonplace.
7
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Idea Three
Use Novel Collaboration and Data Science Environments
to Bridge the huge Data, Method and Knowledge Divides
Fundamental Problems / Barriers:
 With increasing size, complexity, and privacy / ethical concerns of our
health sciences / biomedical data, siloed research infrastructure stacks
become like true islands without any chance of meaningful integration
for users. Data portability is extremely difficult and frequently
insecure.
 Reproducibility of results and cleanly annotated analytic provenance of
derived data products remains elusive.
 Enormous startup costs to simply adopt methods and tools from other
labs or collaborators.
Key Opportunity:
 Support and prioritize development of novel ubiquitous data
environments designed for research and collaborative science.
8
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Collaboration and Data Workspace Environments
KBase (www.kbase.us)
is the first large-scale
bioinformatics system
that enables users to
upload their own data,
analyze it (along with
collaborator and public
data), build increasingly
realistic models, and
share and publish their
workflows and
conclusions.
KBase aims to provide
a knowledgebase: an
integrated environment
where knowledge and
insights are created and
multiplied.
9
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Collaboration and Data Workspace Environments
KNECT, (inspired by
KBase ) is a prototype
knowledge network
environment for
precision medicine.
KNECT aims to provide
a common data
workspace with:
•  richly typed and
annotated data
objects,
•  tightly integrated
scalable data science
cluster and service
technologies (e.g.
Spark, Docker)
•  clinically compliant
security and auditing
frameworks.
10
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Vision for Complete Data Science Environments
•  Collaboration spaces / knowledge networks / Open sciences environments layer on top of
unified continuum of technologies to connect investigators and investigations.
•  Ubiquitous Hyper Converged Data Environments underneath the full technology stack
enables complete data portability and scientific agility.
11
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Summary of Ideas for Cyber Infrastructure
Health Science Priorities
1) Build regulatory compliance at the foundation of
research infrastructure.
2) Emphasize a unified user experience across the
continuum of research computational tools needed for
translational health science discovery and delivery.
3) Prioritize support for novel collaboration and data
environments that connect investigators and
investigations.
12
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
3/24/15
Acknowledgements and Appreciation
Funding Sources and Key Supporters
Dr. Keith Yamamoto
UCSF Vice Chancellor for Research
Dr. Bruce Miller
Director, UCSF Memory and Aging Center
Tau Consortium
www.tauconsortium.com
Dr. Neil Risch
Director, UCSF Institute for Human Genetics
Colleagues and Inspiration
Dr. Kate Rankin
UCSF Memory and Aging
Brad Dispensa
UCSF Institute for Human Genetics
Dr. Adam Arkin
LBNL, UC Berkeley, KBase
Michael Schaffer
Dir. of Tech. UCSF Memory and Aging
Joe Bengfort
UCSF Chief Information Officer
Contact Info
Joe Hesse – [email protected]
Office: 415-502-0590
Mobile: 415-819-1054
13
Three Critical Ideas for UC Health Sciences Cyber Infrastructure - Joe Hesse
UCSF Sandler Neurosciences Center
3/24/15