BioScience on the TeraGrid Daniel S. Katz [email protected] Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University of Chicago & Argonne National Laboratory Affiliate Faculty, Center for Computation & Technology, LSU Adjunct Associate Professor, Electrical and Computer Engineering, LSU [email protected] What is the TeraGrid • World’s largest distributed cyberinfrastructure for open scientific research, supported by US NSF • Integrated high performance computers (>2 PF HPC & >27000 HTC CPUs), data resources (>2 PB disk, >60 PB tape, data collections), visualization, experimental facilities (VMs, GPUs, FPGAs), network at 11 Resource Provider sites • Allocated to US researchers and their collaborators through national peer-review process • DEEP: provide powerful computational resources to enable research that can’t otherwise be accomplished • WIDE: grow the community of computational science and make the resources easily accessible • OPEN: connect with new resources and institutions • Integration: Single {portal, sign-on, help desk, allocations process, advanced user support, EOT, campus champions} http://www.teragrid.org/ [email protected] Governance • 11 Resource Providers (RPs) funded under separate agreements with NSF – – – – Different Different Different Different start and end dates goals agreements funding models • 1 Coordinating Body – Grid Infrastructure Group (GIG) – – – – University of Chicago/Argonne National Laboratory Subcontracts to all RPs and six other universities 7-8 Area Directors Working groups with members from many RPs • TeraGrid Forum with Chair [email protected] Who Uses TeraGrid (2009) (2008) [email protected] How TeraGrid Is Used Use Modality Batch Computing on Individual Resources Exploratory and Application Porting Workflow, Ensemble, and Parameter Sweep Science Gateway Access Remote Interactive Steering and Visualization Tightly-Coupled Distributed Computation Community Size (rough est. - number of users) 850 650 250 500 35 10 2006 data [email protected] How One Uses TeraGrid RP 1 RP 2 POPS (for now) User Portal Science Gateways TeraGrid Infrastructure Accounting, … (Accounting, Network,Network, Authorization,…) Command Line RP 3 Compute Service Viz Service Data Service [email protected] User Portal: portal.teragrid.org http://portal.teragrid.org/ [email protected] Science Gateways • A natural extension of Internet & Web 2.0 • Idea resonates with Scientists – Researchers can imagine scientific capabilities provided through familiar interface • Mostly web portal or web or client-server program • Designed by communities; provide interfaces understood by those communities – Also provide access to greater capabilities (back end) – Without user understand details of capabilities – Scientists know they can undertake more complex analyses and that’s all they want to focus on – TeraGrid provides tools to help developer • Seamless access doesn’t come for free – Hinges on very capable developer Nancy Wilkins-Diehr [email protected] TeraGrid -> XD Future • Current RP agreements end in March 2011 – Except track 2 centers (current and future) • TeraGrid XD (eXtreme Digital) starts in April 2011 – Era of potential interoperation with OSG and others – New types of science applications? • Current TG GIG continues through July 2011 – Allows four months of overlap in coordination – Probable overlap between GIG and XD members • Blue Waters (track 1) production in 2011 [email protected] Grid Enabled Neurosurgical Imaging Using Simulation (GENIUS) Model large-scale patient-specific cerebral blood flow in clinically-relevant time scale • Provide simulation support within the operating theatre for neuroradiologists • Provide new information to surgeons for patient management and therapy: 1. Diagnosis and risk assessment 2. Predictive simulation in therapy • Provide patient-specific information to help plan embolisation of arterio-venous malformations, coiling of aneurysms, etc. Clinical workflow: • Book computing resources in advance or use preemption • Shift imaging data around quickly over high-bandwidth lowlatency dedicated links • Interactive simulations and realtime visualization for immediate feedback Peter Coveney, University College London [email protected] OLSGW Gadgets •OLSGW Integrates bio-informatics applications •BLAST, InterProScan, CLUSTALW , MUSCLE, PSIPRED, ACCPRO, VSL2 •454 Pyrosequencing service under development •Four OLSGW gadgets have been published in the iGoogle gadget directory. Search for “TeraGrid Life Science”. Wenjun Wu, Thomas Uram, Michael Papka, ANL [email protected] Multiscale Simulation of Arterial Tree Arterioles/venules 50 microns activated platelets Platelet diameter is 2-4 µm Normal platelet concentration in blood is 300,000/mm3 Functions: activation, adhesion to injured walls, and other platelets Need to combine multi-scale models: 1D (arteries), 3D Navier Stokes (organs, arterial junctions, etc.), Dissipative Particle Dynamics (capillaries, venules, arterioles, blood cells, etc.), Molecular Dynamics (blood cells, platelets, molecular adhesion, etc.) NIH/NSF-IMAG project: George Em Karnaidakis, Brown [email protected] Expressed Sequence Tag (EST) Pipeline • ESTs are a collection of random cDNA sequences, sequenced from a cDNA library or sequencing devices – Typical inputs are O(Million) sequences – Newer 454 devices from higher volume, are relatively easy to obtain and operate – Stored using FASTA format • ESTs are clustered and assembled to form contigs • Contigs then used to identify potential unknown genes, by Blasting against known protein database • Goal: Use TeraGrid for backend computing, with existing software, and a gateway frontend RepeatMasker PaCE CAP3 BLAST • Cleaning sequences •Clustering •Assembly •Identification • Serial execution on split input, e.g., 1000 jobs for 2 million sequences •1 MPI job, runtime of several hours •Exponential growth in time with growth in input data; scales well •Serial runs on clusters generated by PaCE – Clusters can be combined •Varied sizes with varied resource requirements (run times: ms – days) •Serial – Takes CAP3 results. Number of jobs controlled by adjusting number of sequences per job. Initial results – run that took 5 days on local cluster done in 2 days – more opt. underway A. Kulshrestha, S. L. Pallickara, K. N. Muthuram, C. Kong, Q. Dong, M. Pierce, H. Tang, IU [email protected] Multiscale Computer Simulation of the Immature HIV-1 Virion Experimental structures Coarse-grained (CG) model development CG simulation Wright, Schooler, Ding, Kieffer, Fillmore, Sundquist, Jenson, EMBO, 26, 2218, 2007 CG model refinement Atomic-level simulation Key CG interactions New CG Interactions from MD An iterative modeling approach combining experimental imaging (cryo-electron tomography), coarse-grained (CG) simulation, and atomic-level molecular dynamics (MD) G. A. Voth, U. of Chicago [email protected] CIPRES Portal: A New Science Gateway for Systematics • Systematics: study of diversification of life and relationships among living things through time • CIPRES: a flexible web application that can be sustained by the community at minimal cost even beyond the funding period of the project • Tools include parallel versions of MrBayes, RAxML, GARLI • User requirements include: – – – – Access to most or all native command line options Add new tools quickly Provide personal user space for storing results Use TeraGrid resources to quickly provide results • Cited in at least 35 publications, including Nature, PNAS, Cell – Examples: New Family Tree for Arthropoda, Genome Sequence of a Transitional Eukaryote, Co-evolution of Beetles and Flowering Plants • Used routinely in at least 5 undergraduate classes • Use 77% US (incl. 17 EPSCoR states), 23% 33 other countries Mark Miller, SDSC [email protected] Patient-Specific HIV Drug Therapy HIV-1 Protease is a common target for HIV drug therapy • Enzyme of HIV responsible for protein maturation • Target for anti-retroviral Inhibitors • Example of structure assisted drug design • 9 FDA inhibitors of HIV-1 protease So what’s the problem? • Emergence of drug resistant mutations in protease • Render drug ineffective • Drug resistant mutants have emerged for all FDA inhibitors • Too many mutations to be interpreted by a clinician Solution: build a Binding Affinity Calculator (BAC) • Provide tools that allow simulations to be used in clinical context, including lightweight client – User only needs specify enzyme, mutations relative to wildtype, drug • Others options can be specified but begin as default • Requires large number of simulations to be constructed and run automatically (across distributed HPC resources) – To investigate generalisation – Automation is critical for clinical use • Turn-around time scale of around a week is required • Trade off between accuracy and time-to-solution Initial results – ensemble MD calculations for lopinavir vs wildtype & five mutants – appear promising; excellent relative ranking in binding free energies Peter Coveney, University College London [email protected] Scripting Protein Structure Prediction int nSim = 1000; int maxRounds = 3; Protein pSet[ ] <ext; exec="Protein.map">; float startTemp[ ] = [ 100.0, 200.0 ]; float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ]; foreach p, pn in pSet { foreach t in startTemp { foreach d in delT { ItFix(p, nSim, maxRounds, t, d); } } } 1000 predict() calls … Analyze() ItFix() { foreach sim in [1:nSim] { (structure[sim], log[sim]) = predict(p, t, d); } result = analyze(structure) } 10 proteins x 1000 simulations x 3 rounds x 2 temps x 5 delta-T’s = 300K application runs T. Sosnick, K. Freed, G. Hocky, J. DeBartolo, A. Adhikari, J. Xu, W. Wilde, U. Chicago [email protected]
© Copyright 2024