Accelerating Big Data Processing with Hadoop, Spark

Accelera'ng Big Data Processing with Hadoop, Spark and Memcached Talk at HPC Advisory Council Switzerland Conference (Mar ‘15) by Dhabaleswar K. (DK) Panda The Ohio State University E-­‐mail: [email protected]­‐state.edu h<p://www.cse.ohio-­‐state.edu/~panda Introduc'on to Big Data Applica'ons and Analy'cs •  Big Data has become the one of the most important elements of business analyFcs •  Provides groundbreaking opportuniFes for enterprise informaFon management and decision making •  The amount of data is exploding; companies are capturing and digiFzing more informaFon than ever •  The rate of informaFon growth appears to be exceeding Moore’s Law •  Commonly accepted 3V’s of Big Data •  Volume, Velocity, Variety Michael Stonebraker: Big Data Means at Least Three Different Things, hSp://www.nist.gov/itl/ssd/is/upload/NIST-­‐stonebraker.pdf
•  5V’s of Big Data – 3V + Value, Veracity HPCAC Switzerland Conference (Mar '15) 2 Data Management and Processing on Modern Clusters
•  SubstanFal impact on designing and uFlizing modern data management and processing systems in mulFple Fers –  Front-­‐end data accessing and serving (Online) •  Memcached + DB (e.g. MySQL), HBase –  Back-­‐end data analyFcs (Offline) •  HDFS, MapReduce, Spark Front-end Tier
Internet
Web
Web
Server
Web
Server
Server
Data Accessing
and Serving
HPCAC Switzerland Conference (Mar '15) Memcached
Memcached
+ DB
(MySQL)
Memcached
+ DB
(MySQL)
+ DB (MySQL)
NoSQL DB
NoSQL DB
(HBase)
NoSQL DB
(HBase)
(HBase)
Back-end Tier
Data Analytics Apps/Jobs
MapReduce
Spark
HDFS
3 Overview of Apache Hadoop Architecture •  Open-­‐source implementaFon of Google MapReduce, GFS, and BigTable for Big Data AnalyFcs •  Hadoop Common UFliFes (RPC, etc.), HDFS, MapReduce, YARN •  h<p://hadoop.apache.org Hadoop 1.x Hadoop 2.x MapReduce (Data Processing) MapReduce (Cluster Resource Management & Data Processing) Hadoop Distributed File System (HDFS) Hadoop Common/Core (RPC, ..) HPCAC Switzerland Conference (Mar '15) Other Models (Data Processing) YARN (Cluster Resource Management & Job Scheduling) Hadoop Distributed File System (HDFS) Hadoop Common/Core (RPC, ..) 4 Spark Architecture Overview •  An in-­‐memory data-­‐processing framework – 
– 
– 
– 
IteraFve machine learning jobs InteracFve data analyFcs Scala based ImplementaFon Standalone, YARN, Mesos •  Scalable and communicaFon intensive –  Wide dependencies between Resilient Distributed Datasets (RDDs) –  MapReduce-­‐like shuffle operaFons to reparFFon RDDs –  Sockets based communicaFon HPCAC Switzerland Conference (Mar '15) Worker
Zookeeper
Worker
Worker
HDFS
Driver
SparkContext
Worker
Master
Worker
hSp://spark.apache.org 5 Memcached Architecture High
Performance
Networks
!"#$%"$#&
Internet
High
Performance
Networks
Main
memory
SSD
Main
memory
CPUs
SSD
HDD
CPUs
HDD
...
(Database Servers)
Web Frontend Servers
(Memcached Clients)
...
High Performance
Networks
Main
memory
CPUs
SSD
HDD
Main
memory
CPUs
SSD
HDD
Main
memory
CPUs
SSD
HDD
...
(Memcached Servers)
•  Three-­‐layer architecture of Web 2.0 –  Web Servers, Memcached Servers, Database Servers •  Distributed Caching Layer –  Allows to aggregate spare memory from mulFple nodes –  General purpose •  Typically used to cache database queries, results of API calls •  Scalable model, but typical usage very network intensive HPCAC Switzerland Conference (Mar '15) 6 Presenta'on Outline •  Challenges for AcceleraFng Big Data Processing •  The High-­‐Performance Big Data (HiBD) Project •  RDMA-­‐based designs for Apache Hadoop and Spark –  Case studies with HDFS, MapReduce, and Spark –  Sample Performance Numbers for RDMA-­‐Hadoop 2.x 0.9.6 Release •  RDMA-­‐based designs for Memcached and HBase –  RDMA-­‐based Memcached with SSD-­‐assisted Hybrid Memory –  RDMA-­‐based HBase •  Challenges in Designing Benchmarks for Big Data Processing –  OSU HiBD Benchmarks •  Conclusion and Q&A HPCAC Switzerland Conference (Mar '15) 7 Drivers for Modern HPC Clusters
•  High End CompuFng (HEC) is growing dramaFcally –  High Performance CompuFng –  Big Data CompuFng •  Technology Advancement –  MulF-­‐core/many-­‐core technologies and accelerators –  Remote Direct Memory Access (RDMA)-­‐enabled networking (InfiniBand and RoCE) –  Solid State Drives (SSDs) and Non-­‐VolaFle Random-­‐
Access Memory (NVRAM) –  Accelerators (NVIDIA GPGPUs and Intel Xeon Phi) Tianhe – 2 Titan HPCAC Switzerland Conference (Mar '15) Stampede Tianhe – 1A 8 Interconnects and Protocols in the Open Fabrics Stack Applica'on / Middleware Applica'on / Middleware Interface Sockets Verbs Protocol Kernel
Space
TCP/IP RSockets SDP TCP/IP RDMA RDMA IPoIB Hardware Offload User Space RDMA User Space User Space User Space InfiniBand Adapter Ethernet Adapter InfiniBand Adapter InfiniBand Adapter iWARP Adapter RoCE Adapter InfiniBand Adapter Ethernet Switch InfiniBand Switch Ethernet Switch InfiniBand Switch InfiniBand Switch Ethernet Switch Ethernet Switch InfiniBand Switch 1/10/40 GigE IPoIB 10/40 GigE-­‐
TOE RSockets SDP iWARP RoCE IB Na've TCP/IP Ethernet Driver Adapter Ethernet Adapter Switch HPCAC Switzerland Conference (Mar '15) 9 Wide Adop'on of RDMA Technology •  Message Passing Interface (MPI) for HPC •  Parallel File Systems –  Lustre –  GPFS •  Delivering excellent performance: –  < 1.0 microsec latency –  100 Gbps bandwidth –  5-­‐10% CPU uFlizaFon •  Delivering excellent scalability HPCAC Switzerland Conference (Mar '15) 10 Challenges in Designing Communica'on and I/O Libraries for Big Data Systems Applica'ons Benchmarks Big Data Middleware (HDFS, MapReduce, HBase, Spark and Memcached) Programming Models (Sockets) Upper level Changes? Other RDMA Protocols?
Protocol Communica'on and I/O Library Point-­‐to-­‐Point Communica'on Threaded Models and Synchroniza'on Virtualiza'on I/O and File Systems QoS Fault-­‐Tolerance Commodity Compu'ng System Architectures (Mul'-­‐ and Many-­‐core architectures and accelerators) Storage Technologies (HDD and SSD) Networking Technologies (InfiniBand, 1/10/40GigE and Intelligent NICs) HPCAC Switzerland Conference (Mar '15) 11 Can Big Data Processing Systems be Designed with High-­‐
Performance Networks and Protocols? Our Approach Current Design Applica'on Applica'on OSU Design Sockets Verbs Interface 1/10 GigE Network 10 GigE or InfiniBand •  Sockets not designed for high-­‐performance –  Stream semanFcs onen mismatch for upper layers –  Zero-­‐copy not available for non-­‐blocking sockets HPCAC Switzerland Conference (Mar '15) 12 Presenta'on Outline •  Challenges for AcceleraFng Big Data Processing •  The High-­‐Performance Big Data (HiBD) Project •  RDMA-­‐based designs for Apache Hadoop and Spark –  Case studies with HDFS, MapReduce, and Spark –  Sample Performance Numbers for RDMA-­‐Hadoop 2.x 0.9.6 Release •  RDMA-­‐based designs for Memcached and HBase –  RDMA-­‐based Memcached with SSD-­‐assisted Hybrid Memory –  RDMA-­‐based HBase •  Challenges in Designing Benchmarks for Big Data Processing –  OSU HiBD Benchmarks •  Conclusion and Q&A HPCAC Switzerland Conference (Mar '15) 13 Overview of the HiBD Project and Releases •  RDMA for Apache Hadoop 2.x (RDMA-­‐Hadoop-­‐2.x) •  RDMA for Apache Hadoop 1.x (RDMA-­‐Hadoop) •  RDMA for Memcached (RDMA-­‐Memcached) •  OSU HiBD-­‐Benchmarks (OHB) •  hSp://hibd.cse.ohio-­‐state.edu •  Users Base: 95 organizaFons from 18 countries •  More than 2,900 downloads •  RDMA for Apache HBase and Spark will be available in near future HPCAC Switzerland Conference (Mar '15) 14 RDMA for Apache Hadoop 2.x Distribu'on •  High-­‐Performance Design of Hadoop over RDMA-­‐enabled Interconnects –  High performance design with naFve InfiniBand and RoCE support at the verbs-­‐level for HDFS, MapReduce, and RPC components –  Enhanced HDFS with in-­‐memory and heterogeneous storage –  High performance design of MapReduce over Lustre –  Easily configurable for different running modes (HHH, HHH-­‐M, HHH-­‐L, and MapReduce over Lustre) and different protocols (naFve InfiniBand, RoCE, and IPoIB) •  Current release: 0.9.6 –  Based on Apache Hadoop 2.6.0 –  Compliant with Apache Hadoop 2.6.0 APIs and applicaFons –  Tested with •  Mellanox InfiniBand adapters (DDR, QDR and FDR) •  RoCE support with Mellanox adapters •  Various mulF-­‐core plarorms •  Different file systems with disks and SSDs and Lustre –  hSp://hibd.cse.ohio-­‐state.edu HPCAC Switzerland Conference (Mar '15) 15 RDMA for Memcached Distribu'on •  High-­‐Performance Design of Memcached over RDMA-­‐enabled Interconnects –  High performance design with naFve InfiniBand and RoCE support at the verbs-­‐
level for Memcached and libMemcached components –  Easily configurable for naFve InfiniBand, RoCE and the tradiFonal sockets-­‐based support (Ethernet and InfiniBand with IPoIB) –  High performance design of SSD-­‐Assisted Hybrid Memory •  Current release: 0.9.3 –  Based on Memcached 1.4.22 and libMemcached 1.0.18 –  Compliant with libMemcached APIs and applicaFons –  Tested with • 
• 
• 
• 
Mellanox InfiniBand adapters (DDR, QDR and FDR) RoCE support with Mellanox adapters Various mulF-­‐core plarorms SSD –  hSp://hibd.cse.ohio-­‐state.edu HPCAC Switzerland Conference (Mar '15) 16 OSU HiBD Micro-­‐Benchmark (OHB) Suite -­‐ Memcached
•  Released in OHB 0.7.1 (ohb_memlat) •  Evaluates the performance of stand-­‐alone Memcached •  Three different micro-­‐benchmarks –  SET Micro-­‐benchmark: Micro-­‐benchmark for memcached set operaFons –  GET Micro-­‐benchmark: Micro-­‐benchmark for memcached get operaFons –  MIX Micro-­‐benchmark: Micro-­‐benchmark for a mix of memcached set/get operaFons (Read:Write raFo is 90:10) •  Calculates average latency of Memcached operaFons •  Can measure throughput in TransacFons Per Second HPCAC Switzerland Conference (Mar '15) 17 Presenta'on Outline •  Challenges for AcceleraFng Big Data Processing •  The High-­‐Performance Big Data (HiBD) Project •  RDMA-­‐based designs for Apache Hadoop and Spark –  Case studies with HDFS, MapReduce, and Spark –  Sample Performance Numbers for Hadoop 2.x 0.9.6 Release •  RDMA-­‐based designs for Memcached and HBase –  RDMA-­‐based Memcached with SSD-­‐assisted Hybrid Memory –  RDMA-­‐based HBase •  Challenges in Designing Benchmarks for Big Data Processing –  OSU HiBD Benchmarks •  Conclusion and Q&A HPCAC Switzerland Conference (Mar '15) 18 Accelera'on Case Studies and In-­‐Depth Performance Evalua'on •  RDMA-­‐based Designs and Performance EvaluaFon –  HDFS –  MapReduce –  Spark HPCAC Switzerland Conference (Mar '15) 19 Design Overview of HDFS with RDMA •  Design Features Applica'ons HDFS Others Java Socket Interface Write Java Na've Interface (JNI) OSU Design Verbs 1/10 GigE, IPoIB Network RDMA Capable Networks (IB, 10GE/ iWARP, RoCE ..) –  RDMA-­‐based HDFS write –  RDMA-­‐based HDFS replicaFon –  Parallel replicaFon support –  On-­‐demand connecFon setup –  InfiniBand/RoCE support •  Enables high performance RDMA communicaFon, while supporFng tradiFonal socket interface •  JNI Layer bridges Java based HDFS with communicaFon library wri<en in naFve code HPCAC Switzerland Conference (Mar '15) 20 Communica'on Times in HDFS Communica'on Time (s) 25 10GigE IPoIB (QDR) OSU-­‐IB (QDR) 20 15 Reduced by 30% 10 5 0 2GB 4GB •  Cluster with HDD DataNodes 6GB File Size (GB) 8GB 10GB –  30% improvement in communicaFon Fme over IPoIB (QDR) –  56% improvement in communicaFon Fme over 10GigE •  Similar improvements are obtained for SSD DataNodes N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy and D. K. Panda , High Performance RDMA-­‐Based Design of HDFS over InfiniBand , Supercompu'ng (SC), Nov 2012 Islam, X. Lu, W. Rahman, and D. K. Panda, SOR-­‐HDFS: A SEDA-­‐based Approach to Maximize Overlapping in N. RDMA-­‐Enhanced HDFS, HPDC '14, June 2014 HPCAC Switzerland Conference (Mar '15) 21 Enhanced HDFS with In-­‐memory and Heterogeneous Storage •  Design Features Applica'ons –  Three modes Triple-­‐H Data Placement Policies Evic'on/ Hybrid Replica'on Promo'on Heterogeneous Storage RAM Disk SSD HDD •  Default (HHH) •  In-­‐Memory (HHH-­‐M) •  Lustre-­‐Integrated (HHH-­‐L) –  Policies to efficiently uFlize the heterogeneous storage devices •  RAM, SSD, HDD, Lustre –  EvicFon/PromoFon based on data usage pa<ern –  Hybrid ReplicaFon Lustre
–  Lustre-­‐Integrated mode: •  Lustre-­‐based fault-­‐tolerance N. Islam, X. Lu, M. W. Rahman, D. Shankar, and D. K. Panda, Triple-­‐H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture, CCGrid ’15, May 2015 HPCAC Switzerland Conference (Mar '15) 22 Enhanced HDFS with In-­‐memory and Heterogeneous Storage – Three Modes
•  HHH (default): Heterogeneous storage devices with hybrid replicaFon schemes –  I/O operaFons over RAM disk, SSD, and HDD –  Hybrid replicaFon (in-­‐memory and persistent storage) –  Be<er fault-­‐tolerance as well as performance •  HHH-­‐M: High-­‐performance in-­‐memory I/O operaFons –  Memory replicaFon (in-­‐memory only with lazy persistence) –  As much performance benefit as possible •  HHH-­‐L: Lustre integrated –  Take advantage of the Lustre available in HPC clusters –  Lustre-­‐based fault-­‐tolerance (No HDFS replicaFon) –  Reduced local storage space usage HPCAC Switzerland Conference (Mar '15) 23 Performance Improvement on TACC Stampede (HHH) Total Throughput (MBps) 5000 IPoIB (FDR) OSU-­‐IB (FDR) Increased by 7x 300 Increased by 2x Execu'on Time (s) 6000 4000 3000 2000 250 150 100 50 0 0 read TestDFSIO OSU-­‐IB (FDR) Reduced by 3x 200 1000 write IPoIB (FDR) 8:30 16:60 32:120 Cluster Size:Data Size RandomWriter •  For 160GB TestDFSIO in 32 nodes •  For 120GB RandomWriter in 32 –  Write Throughput: 7x improvement nodes over IPoIB (FDR) –  3x improvement over IPoIB (QDR) –  Read Throughput: 2x improvement over IPoIB (FDR) HPCAC Switzerland Conference (Mar '15) 24 Execu'on Time (s) Performance Improvement on SDSC Gordon (HHH-­‐L) 500 450 400 350 300 250 200 150 100 50 0 HDFS-­‐IPoIB (QDR) Lustre-­‐IPoIB (QDR) OSU-­‐IB (QDR) 20 40 Data Size (GB) Storage Used (GB) Reduced by 54% HDFS-­‐IPoIB (QDR) 360
Lustre-­‐IPoIB (QDR) 120
OSU-­‐IB (QDR) 240
60 Sort Storage space for 60GB Sort •  For 60GB Sort in 8 nodes –  24% improvement over default HDFS –  54% improvement over Lustre –  33% storage space saving compared to default HDFS HPCAC Switzerland Conference (Mar '15) 25 Evalua'on with PUMA and CloudBurst (HHH-­‐L/HHH) Execu'on Time (s) 2500 Reduced by 17% 2000 HDFS-­‐IPoIB (QDR) Lustre-­‐IPoIB (QDR) OSU-­‐IB (QDR) 1500 HDFS-­‐IPoIB (FDR) OSU-­‐IB (FDR) 60.24 s 48.3 s Reduced by 29.5% 1000 500 0 SequenceCount PUMA •  PUMA on OSU RI Grep CloudBurst •  CloudBurst on TACC Stampede –  SequenceCount with HHH-­‐L: 17% benefit over Lustre, 8% over HDFS –  With HHH: 19% improvement over HDFS –  Grep with HHH: 29.5% benefit over Lustre, 13.2% over HDFS HPCAC Switzerland Conference (Mar '15) 26 Accelera'on Case Studies and In-­‐Depth Performance Evalua'on •  RDMA-­‐based Designs and Performance EvaluaFon –  HDFS –  MapReduce –  Spark HPCAC Switzerland Conference (Mar '15) 27 Design Overview of MapReduce with RDMA •  Design Features Applica'ons MapReduce Job Tracker Java Socket Interface Map Task Tracker Reduce Java Na've Interface (JNI) OSU Design Verbs 1/10 GigE, IPoIB Network RDMA Capable Networks –  RDMA-­‐based shuffle –  Prefetching and caching map output –  Efficient Shuffle Algorithms –  In-­‐memory merge –  On-­‐demand Shuffle Adjustment –  Advanced overlapping •  map, shuffle, and merge •  shuffle, merge, and reduce –  On-­‐demand connecFon setup –  InfiniBand/RoCE support (IB, 10GE/ iWARP, RoCE ..) • 
Enables high performance RDMA communicaFon, while supporFng tradiFonal socket interface • 
JNI Layer bridges Java based MapReduce with communicaFon library wri<en in naFve code HPCAC Switzerland Conference (Mar '15) 28 Advanced Overlapping among different phases Default Architecture •  A hybrid approach to achieve maximum possible overlapping in MapReduce across all phases compared to other approaches –  Efficient Shuffle Algorithms –  Dynamic and Efficient Switching –  On-­‐demand Shuffle Adjustment Enhanced Overlapping Advanced Overlapping HPCAC Switzerland Conference (Mar '15) M. W. Rahman, X. Lu, N. S. Islam, and D. K. Panda, HOMR: A Hybrid Approach to Exploit Maximum Overlapping in MapReduce over High Performance Interconnects, ICS, June 2014. 29 Performance Evalua'on of Sort and TeraSort IPoIB (QDR) UDA-­‐IB (QDR) 1000 OSU-­‐IB (QDR) Reduced by 40% Job Execu'on Time (sec) Job Execu'on Time (sec) 1200 800 600 400 200 0 IPoIB (FDR) UDA-­‐IB (FDR) OSU-­‐IB (FDR) 700 600 Reduced by 38% 500 400 300 200 100 0 Data Size: 60 GB Data Size: 120 GB Data Size: 240 GB Data Size: 80 GB Data Size: 160 GB Data Size: 320 GB Cluster Size: 16 Cluster Size: 32 Cluster Size: 64 Cluster Size: 16 Cluster Size: 32 Cluster Size: 64 Sort in OSU Cluster •  For 240GB Sort in 64 nodes (512 cores) –  40% improvement over IPoIB (QDR) with HDD used for HDFS HPCAC Switzerland Conference (Mar '15) TeraSort in TACC Stampede •  For 320GB TeraSort in 64 nodes (1K cores) –  38% improvement over IPoIB (FDR) with HDD used for HDFS 30 Evalua'ons using PUMA Workload 10GigE IPoIB (QDR) OSU-­‐IB (QDR) Reduced by 49% Reduced by 50% Normalized Execu'on Time 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 AdjList (30GB) SelfJoin (80GB) SeqCount (30GB) WordCount (30GB) InvertIndex (30GB) Benchmarks •  50% improvement in Self Join over IPoIB (QDR) for 80 GB data size •  49% improvement in Sequence Count over IPoIB (QDR) for 30 GB data size HPCAC Switzerland Conference (Mar '15) 31 Op'mize Hadoop YARN MapReduce over Parallel File Systems
Compute Nodes
App Master
Map
Reduce
Lustre Client
•  HPC Cluster Deployment MetaData Servers
–  Hybrid topological soluFon of Beowulf architecture with separate I/O nodes –  Lean compute nodes with light OS; more memory space; small local storage –  Sub-­‐cluster of dedicated I/O nodes with parallel file systems, such as Lustre •  MapReduce over Lustre Object Storage Servers
Lustre Setup
HPCAC Switzerland Conference (Mar '15) –  Local disk is used as the intermediate data directory –  Lustre is used as the intermediate data directory 32 Design Overview of Shuffle Strategies for MapReduce over Lustre •  Design Features Map 1 Map 2 Map 3 –  Two shuffle approaches •  Lustre read based shuffle •  RDMA based shuffle Intermediate Data Directory Lustre Lustre Read / RDMA Reduce 1 In-­‐memory merge/sort reduce Reduce 2 In-­‐memory merge/sort reduce –  Hybrid shuffle algorithm to take benefit from both shuffle approaches –  Dynamically adapts to the be<er shuffle approach for each shuffle request based on profiling values for each Lustre read operaFon –  In-­‐memory merge and overlapping of different phases are kept similar to RDMA-­‐
enhanced MapReduce design M. W. Rahman, X. Lu, N. S. Islam, R. Rajachandrasekar, and D. K. Panda, High Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA, IPDPS, May 2015. HPCAC Switzerland Conference (Mar '15) 33 Performance Improvement of MapReduce over Lustre on TACC-­‐Stampede •  Local disk is used as the intermediate data directory 1200 IPoIB (FDR) OSU-­‐IB (FDR) 800 600 400 IPoIB (FDR) 450 Job Execu'on Time (sec) Job Execu'on Time (sec) 1000 500 Reduced by 44% OSU-­‐IB (FDR) Reduced by 48% 400 350 300 250 200 150 100 200 50 0 300 400 500 0 Data Size (GB) •  For 500GB Sort in 64 nodes 20 GB 40 GB Cluster: 4 Cluster: 8 80 GB 160 GB 320 GB 640 GB Cluster: 16 Cluster: 32 Cluster: 64 Cluster: 128 •  For 640GB Sort in 128 nodes –  44% improvement over IPoIB (FDR) –  48% improvement over IPoIB (FDR) M. W. Rahman, X. Lu, N. S. Islam, R. Rajachandrasekar, and D. K. Panda, MapReduce over Lustre: Can RDMA-­‐based Approach Benefit?, Euro-­‐Par, August 2014. HPCAC Switzerland Conference (Mar '15) 34 Case Study -­‐ Performance Improvement of MapReduce over Lustre on SDSC-­‐Gordon •  Lustre is used as the intermediate data directory Job Execu'on Time (sec) 800 700 600 IPoIB (QDR) OSU-­‐Lustre-­‐Read (QDR) OSU-­‐RDMA-­‐IB (QDR) OSU-­‐Hybrid-­‐IB (QDR) 900 Reduced by 34% 800 Job Execu'on Time (sec) 900 500 400 300 200 700 600 IPoIB (QDR) OSU-­‐Lustre-­‐Read (QDR) OSU-­‐RDMA-­‐IB (QDR) OSU-­‐Hybrid-­‐IB (QDR) Reduced by 25% 500 400 300 200 100 100 0 0 40 60 40 80 •  For 120GB TeraSort in 16 nodes –  34% improvement over IPoIB (QDR) HPCAC Switzerland Conference (Mar '15) 120 Data Size (GB) Data Size (GB) •  For 80GB Sort in 8 nodes 80 –  25% improvement over IPoIB (QDR) 35 Accelera'on Case Studies and In-­‐Depth Performance Evalua'on •  RDMA-­‐based Designs and Performance EvaluaFon –  HDFS –  MapReduce –  Spark HPCAC Switzerland Conference (Mar '15) 36 Design Overview of Spark with RDMA
•  Design Features Spark Applications
(Scala/Java/Python)
Task
Task
Task
Task
Spark
(Scala/Java)
BlockManager
Java NIO
Shuffle
Server
(default)
Netty
Shuffle
Server
(optional)
BlockFetcherIterator
RDMA
Shuffle
Server
(plug-in)
Java NIO
Shuffle
Fetcher
(default)
Netty
Shuffle
Fetcher
(optional)
RDMA
Shuffle
Fetcher
(plug-in)
Java Socket
RDMA-based Shuffle Engine
(Java/JNI)
1/10 Gig Ethernet/IPoIB (QDR/FDR)
Network
Native InfiniBand
(QDR/FDR)
–  RDMA based shuffle –  SEDA-­‐based plugins –  Dynamic connecFon management and sharing –  Non-­‐blocking and out-­‐of-­‐
order data transfer –  Off-­‐JVM-­‐heap buffer management –  InfiniBand/RoCE support •  Enables high performance RDMA communicaFon, while supporFng tradiFonal socket interface •  JNI Layer bridges Scala based Spark with communicaFon library wri<en in naFve code X. Lu, M. W. Rahman, N. Islam, D. Shankar, and D. K. Panda, Accelera'ng Spark with RDMA for Big Data Processing: Early Experiences, Int'l Symposium on High Performance Interconnects (HotI'14), August 2014 HPCAC Switzerland Conference (Mar '15) 37 Preliminary Results of Spark-­‐RDMA Design -­‐ GroupBy
Reduced by 18% 9 8 8 7 7 GroupBy Time (sec)
GroupBy Time (sec)
9 6 5 4 3 2 1 6 Reduced by 20% 10GigE IPoIB RDMA 5 4 3 2 1 0 4 6 8 10 Data Size (GB)
Cluster with 4 HDD Nodes, GroupBy with 32 cores 0 8 12 16 20 Data Size (GB)
Cluster with 8 HDD Nodes, GroupBy with 64 cores •  Cluster with 4 HDD Nodes, single disk per node, 32 concurrent tasks –  18% improvement over IPoIB (QDR) for 10GB data size •  Cluster with 8 HDD Nodes, single disk per node, 64 concurrent tasks –  20% improvement over IPoIB (QDR) for 20GB data size HPCAC Switzerland Conference (Mar '15) 38 Presenta'on Outline •  Challenges for AcceleraFng Big Data Processing •  The High-­‐Performance Big Data (HiBD) Project •  RDMA-­‐based designs for Apache Hadoop and Spark –  Case studies with HDFS, MapReduce, and Spark –  Sample Performance Numbers for Hadoop 2.x 0.9.6 Release •  RDMA-­‐based designs for Memcached and HBase –  RDMA-­‐based Memcached with SSD-­‐assisted Hybrid Memory –  RDMA-­‐based HBase •  Challenges in Designing Benchmarks for Big Data Processing –  OSU HiBD Benchmarks •  Conclusion and Q&A HPCAC Switzerland Conference (Mar '15) 39 Performance Benefits – TestDFSIO in TACC Stampede (HHH) 7000 Increased by 7.8x 6000 300 IPoIB (FDR) OSU-­‐IB (FDR) 5000 4000 3000 2000 Reduced by 2.5x IPoIB (FDR) 250 Execu'on Time (s) Total Throughput (MBps) 8000 OSU-­‐IB (FDR) 200 150 100 50 1000 0 0 80 100 120 Data Size (GB) 80 100 120 Data Size (GB) Cluster with 32 Nodes with HDD, TestDFSIO with a total 128 maps •  For Throughput, –  6-­‐7.8x improvement over IPoIB for 80-­‐120 GB file size HPCAC Switzerland Conference (Mar '15) •  For Latency, –  2.5-­‐3x improvement over IPoIB for 80-­‐120 GB file size 40 Performance Benefits – RandomWriter & TeraGen in TACC-­‐Stampede (HHH) IPoIB (FDR) OSU-­‐IB (FDR) 200 250 Reduced by 3x Execu'on Time (s) Execu'on Time (s) 250 150 100 50 0 200 IPoIB (FDR) OSU-­‐IB (FDR) Reduced by 4x 150 100 50 0 80 100 120 80 Data Size (GB) 100 120 Data Size (GB) TeraGen RandomWriter Cluster with 32 Nodes with a total of 128 maps •  RandomWriter –  3-­‐4x improvement over IPoIB for 80-­‐120 GB file size HPCAC Switzerland Conference (Mar '15) •  TeraGen –  4-­‐5x improvement over IPoIB for 80-­‐120 GB file size 41 Performance Benefits – Sort & TeraSort in TACC-­‐Stampede (HHH) Execu'on Time (s) 800 IPoIB (FDR) 600 OSU-­‐IB (FDR) IPoIB (FDR) 500 700 Reduced by 52% 600 Execu'on Time (s) 900 500 400 300 200 OSU-­‐IB (FDR) Reduced by 44% 400 300 200 100 100 0 80 100 120 Data Size (GB) Cluster with 32 Nodes with a total of 128 maps and 57 reduces •  Sort with single HDD per node –  40-­‐52% improvement over IPoIB for 80-­‐120 GB data HPCAC Switzerland Conference (Mar '15) 0 80 100 120 Data Size (GB) Cluster with 32 Nodes with a total of 128 maps and 64 reduces •  TeraSort with single HDD per node –  42-­‐44% improvement over IPoIB for 80-­‐120 GB data 42 30000 IPoIB (FDR) 25000 OSU-­‐IB (FDR) Increased by 28x 20000 15000 10000 5000 0 60 80 100 Data Size (GB) TACC Stampede Cluster with 32 Nodes with HDD, TestDFSIO with a total 128 maps •  TestDFSIO Write on TACC Stampede –  28x improvement over IPoIB for 60-­‐100 GB file size HPCAC Switzerland Conference (Mar '15) 9000 Total Throughput (MBps) Total Throughput (MBps) Performance Benefits – TestDFSIO on TACC Stampede and SDSC Gordon (HHH-­‐M) 8000 IPoIB (QDR) OSU-­‐IB (QDR) Increased by 6x 7000 6000 5000 4000 3000 2000 1000 0 60 80 Data Size (GB) 100 SDSC Gordon Cluster with 16 Nodes with SSD, TestDFSIO with a total 64 maps •  TestDFSIO Write on SDSC Gordon –  6x improvement over IPoIB for 60-­‐100 GB file size 43 Performance Benefits – TestDFSIO and Sort in SDSC-­‐Gordon (HHH-­‐L) 14000 HDFS Lustre OSU-­‐IB Increased by 9x Increased by 29% Execu'on Time (s) Total Throughput (MBps) 16000 12000 10000 8000 6000 4000 2000 0 Write TestDFSIO 500 450 400 350 300 250 200 150 100 50 0 HDFS OSU-­‐IB 40 Read Cluster with 16 Nodes •  TestDFSIO for 80GB data size Reduced by 50% Lustre 60 Data Size (GB) Sort 80 •  Sort –  Write: 9x improvement over HDFS –  up to 28% improvement over HDFS –  Read: 29% improvement over Lustre –  up to 50% improvement over Lustre HPCAC Switzerland Conference (Mar '15) 44 Presenta'on Outline •  Challenges for AcceleraFng Big Data Processing •  The High-­‐Performance Big Data (HiBD) Project •  RDMA-­‐based designs for Apache Hadoop and Spark –  Case studies with HDFS, MapReduce, and Spark –  Sample Performance Numbers for Hadoop 2.x 0.9.6 Release •  RDMA-­‐based designs for Memcached and HBase –  RDMA-­‐based Memcached with SSD-­‐assisted Hybrid Memory –  RDMA-­‐based HBase •  Challenges in Designing Benchmarks for Big Data Processing –  OSU HiBD Benchmarks •  Conclusion and Q&A HPCAC Switzerland Conference (Mar '15) 45 Memcached-­‐RDMA Design Sockets Client 1 2 1 RDMA Client Master Thread 2 Sockets Worker Thread Sockets Worker Thread Verbs Worker Thread Verbs Worker Thread Shared Data Memory Slabs Items … •  Server and client perform a negoFaFon protocol –  Master thread assigns clients to appropriate worker thread •  Once a client is assigned a verbs worker thread, it can communicate directly and is “bound” to that thread •  All other Memcached data structures are shared among RDMA and Sockets worker threads •  NaFve IB-­‐verbs-­‐level Design and evaluaFon with –  Server : Memcached (h<p://memcached.org) –  Client : libmemcached (h<p://libmemcached.org) –  Different networks and protocols: 10GigE, IPoIB, naFve IB (RC, UD) HPCAC Switzerland Conference (Mar '15) 46 Memcached Performance (FDR Interconnect) Memcached GET Latency OSU-­‐IB (FDR) Time (us) IPoIB (FDR) 100 10 Latency Reduced by nearly 20X 1 1 2 4 8 16 32 64 128 256 512 1K 2K 4K Message Size Thousands of Transac'ons per Second (TPS) 1000 Memcached Throughput 700 600 500 400 2X 300 200 100 0 16 32 64 128 256 512 1024 2048 4080 No. of Clients Experiments on TACC Stampede (Intel SandyBridge Cluster, IB: FDR) •  Memcached Get latency –  4 bytes OSU-­‐IB: 2.84 us; IPoIB: 75.53 us –  2K bytes OSU-­‐IB: 4.49 us; IPoIB: 123.42 us •  Memcached Throughput (4bytes) –  4080 clients OSU-­‐IB: 556 Kops/sec, IPoIB: 233 Kops/s –  Nearly 2X improvement in throughput HPCAC Switzerland Conference (Mar '15) 47 Micro-­‐benchmark Evalua'on for OLDP workloads Memcached-­‐IPoIB (32Gbps) Memcached-­‐IPoIB (32Gbps) Reduced by 66% 3500 Memcached-­‐RDMA (32Gbps) 3000 Throughput (Kq/s) Latency (sec) 8 6 4 2 0 64 96 128 160 No. of Clients 320 400 Memcached-­‐RDMA (32Gbps) 2500 2000 1500 1000 500 0 64 96 128 160 No. of Clients 320 400 •  IllustraFon with Read-­‐Cache-­‐Read access pa<ern using modified mysqlslap load tesFng tool •  Memcached-­‐RDMA can -  improve query latency by up to 66% over IPoIB (32Gbps) -  throughput by up to 69% over IPoIB (32Gbps) D. Shankar, X. Lu, J. Jose, M. W. Rahman, N. Islam, and D. K. Panda, Can RDMA Benefit On-­‐Line Data Processing Workloads with Memcached and MySQL, ISPASS’15 HPCAC Switzerland Conference (Mar '15) 48 Average latency (us) 500 400 IPoIB (32Gbps) RDMA-­‐Mem (32Gbps) RDMA-­‐Hyb (32Gbps) 300 200 100 0 Throughput (million trans/sec) Performance Benefits on SDSC-­‐Gordon – OHB Latency & Throughput Micro-­‐Benchmarks IPoIB (32Gbps) RDMA-­‐Mem (32Gbps) RDMA-­‐Hybrid (32Gbps) 10 9 8 7 6 5 4 3 2 1 0 64 128 Message Size (Bytes) 2X 256 512 No. of Clients 1024 •  ohb_memlat & ohb_memthr latency & throughput micro-­‐benchmarks •  Memcached-­‐RDMA can -  improve query latency by up to 70% over IPoIB (32Gbps) -  improve throughput by up to 2X over IPoIB (32Gbps) -  No overhead in using hybrid mode when all data can fit in memory HPCAC Switzerland Conference (Mar '15) 49 900 800 700 600 500 400 300 200 100 0 125 RDMA-­‐Mem (32Gbps) RDMA-­‐Mem-­‐Uniform 115 RDMA-­‐Hybrid (32Gbps) RDMA-­‐Hybrid 105 Success Rate Latency with penalty (us) Performance Benefits on OSU-­‐RI-­‐SSD – OHB Micro-­‐benchmark for Hybrid Memcached 53% 95 85 75 65 55 512 2K 8K 32K Message Size (Bytes) 128K 1 1.15 1.3 Spill Factor 1.45 ohb_memhybrid – Uniform Access Pa<ern, single client and single server with 64MB •  Success Rate of In-­‐Memory Vs. Hybrid SSD-­‐Memory for different spill factors –  100% success rate for Hybrid design while that of pure In-­‐memory degrades •  Average Latency with penalty for In-­‐Memory Vs. Hybrid SSD-­‐Assisted mode for spill factor 1.5. –  up to 53% improvement over In-­‐memory with server miss penalty as low as 1.5 ms HPCAC Switzerland Conference (Mar '15) 50 Presenta'on Outline •  Challenges for AcceleraFng Big Data Processing •  The High-­‐Performance Big Data (HiBD) Project •  RDMA-­‐based designs for Apache Hadoop and Spark –  Case studies with HDFS, MapReduce, and Spark –  Sample Performance Numbers for Hadoop 2.x 0.9.6 Release •  RDMA-­‐based designs for Memcached and HBase –  RDMA-­‐based Memcached with SSD-­‐assisted Hybrid Memory –  RDMA-­‐based HBase •  Challenges in Designing Benchmarks for Big Data Processing –  OSU HiBD Benchmarks •  Conclusion and Q&A HPCAC Switzerland Conference (Mar '15) 51 HBase-­‐RDMA Design Overview Applica'ons HBase Java Socket Interface Java Na've Interface (JNI) OSU-­‐IB Design IB Verbs 1/10 GigE, IPoIB Networks RDMA Capable Networks (IB, 10GE/ iWARP, RoCE ..) •  JNI Layer bridges Java based HBase with communicaFon library wri<en in naFve code •  Enables high performance RDMA communicaFon, while supporFng tradiFonal socket interface HPCAC Switzerland Conference (Mar '15) 52 HBase – YCSB Read-­‐Write Workload 10000 7000 9000 6000 8000 7000 Time (us) Time (us) 5000 4000 3000 6000 5000 IPoIB (QDR) 4000 10GigE 3000 2000 OSU-­‐IB (QDR) 2000 1000 1000 0 0 8 16 32 64 96 128 8 No. of Clients Read Latency 16 32 64 96 128 No. of Clients Write Latency •  HBase Get latency (Yahoo! Cloud Service Benchmark) –  64 clients: 2.0 ms; 128 Clients: 3.5 ms –  42% improvement over IPoIB for 128 clients •  HBase Put latency –  64 clients: 1.9 ms; 128 Clients: 3.5 ms –  40% improvement over IPoIB for 128 clients X. Ouyang, J. Jose, M. W. Rahman, H. Wang, M. Luo, H. Subramoni, Chet Murthy, and D. K. Panda, J. Huang, High-­‐Performance Design of HBase with RDMA over InfiniBand, IPDPS’12 HPCAC Switzerland Conference (Mar '15) 53 Presenta'on Outline •  Challenges for AcceleraFng Big Data Processing •  The High-­‐Performance Big Data (HiBD) Project •  RDMA-­‐based designs for Apache Hadoop and Spark –  Case studies with HDFS, MapReduce, and Spark –  Sample Performance Numbers for Hadoop 2.x 0.9.6 Release •  RDMA-­‐based designs for Memcached and HBase –  RDMA-­‐based Memcached with SSD-­‐assisted Hybrid Memory –  RDMA-­‐based HBase •  Challenges in Designing Benchmarks for Big Data Processing –  OSU HiBD Benchmarks •  Conclusion and Q&A HPCAC Switzerland Conference (Mar '15) 54 Designing Communica'on and I/O Libraries for Big Data Systems: Solved a Few Ini'al Challenges Applica'ons Benchmarks Big Data Middleware (HDFS, MapReduce, HBase, Spark and Memcached) Programming Models (Sockets) Upper level Changes? Other RDMA Protocols?
Protocol Communica'on and I/O Library Point-­‐to-­‐Point Communica'on Threaded Models and Synchroniza'on Virtualiza'on I/O and File Systems QoS Fault-­‐Tolerance Commodity Compu'ng System Architectures (Mul'-­‐ and Many-­‐core architectures and accelerators) Storage Technologies (HDD and SSD) Networking Technologies (InfiniBand, 1/10/40GigE and Intelligent NICs) HPCAC Switzerland Conference (Mar '15) 55 Are the Current Benchmarks Sufficient for Big Data Management and Processing?
•  The current benchmarks provide some performance behavior •  However, do not provide any informaFon to the designer/
developer on: –  What is happening at the lower-­‐layer? –  Where the benefits are coming from? –  Which design is leading to benefits or bo<lenecks? –  Which component in the design needs to be changed and what will be its impact? –  Can performance gain/loss at the lower-­‐layer be correlated to the performance gain/loss observed at the upper layer? HPCAC Switzerland Conference (Mar '15) 56 OSU MPI Micro-­‐Benchmarks (OMB) Suite •  A comprehensive suite of benchmarks to –  Compare performance of different MPI libraries on various networks and systems –  Validate low-­‐level funcFonaliFes –  Provide insights to the underlying MPI-­‐level designs •  Started with basic send-­‐recv (MPI-­‐1) micro-­‐benchmarks for latency, bandwidth and bi-­‐
direcFonal bandwidth •  Extended later to – 
– 
– 
– 
– 
MPI-­‐2 one-­‐sided CollecFves GPU-­‐aware data movement OpenSHMEM (point-­‐to-­‐point and collecFves) UPC •  Has become an industry standard •  Extensively used for design/development of MPI libraries, performance comparison of MPI libraries and even in procurement of large-­‐scale systems •  Available from h<p://mvapich.cse.ohio-­‐state.edu/benchmarks •  Available in an integrated manner with MVAPICH2 stack HPCAC Switzerland Conference (Mar '15) 57 Challenges in Benchmarking of RDMA-­‐based Designs Applica'ons Benchmarks Big Data Middleware (HDFS, MapReduce, HBase, Spark and Memcached) Programming Models (Sockets) Current Benchmarks Correla'on? Other RDMA PProtocols?
rotocols Communica'on and I/O Library Point-­‐to-­‐Point Communica'on Threaded Models and Synchroniza'on Virtualiza'on I/O and File Systems QoS Fault-­‐Tolerance Commodity Compu'ng System Architectures (Mul'-­‐ and Many-­‐core architectures and accelerators) Storage Technologies (HDD and SSD) Networking Technologies (InfiniBand, 1/10/40GigE and Intelligent NICs) HPCAC Switzerland Conference (Mar '15) No Benchmarks 58 Itera've Process – Requires Deeper Inves'ga'on and Design for Benchmarking Next Genera'on Big Data Systems and Applica'ons Applica'ons Benchmarks Big Data Middleware (HDFS, MapReduce, HBase, Spark and Memcached) Programming Models (Sockets) Applica'ons-­‐
Level Benchmarks Other RDMA PProtocols?
rotocols Communica'on and I/O Library Point-­‐to-­‐Point Communica'on Threaded Models and Synchroniza'on Virtualiza'on I/O and File Systems QoS Fault-­‐Tolerance Commodity Compu'ng System Architectures (Mul'-­‐ and Many-­‐core architectures and accelerators) Storage Technologies (HDD and SSD) Networking Technologies (InfiniBand, 1/10/40GigE and Intelligent NICs) HPCAC Switzerland Conference (Mar '15) Micro-­‐
Benchmarks 59 OSU HiBD Micro-­‐Benchmark (OHB) Suite -­‐ HDFS •  Evaluate the performance of standalone HDFS •  Five different benchmarks –  SequenFal Write Latency (SWL) –  SequenFal or Random Read Latency (SRL or RRL) –  SequenFal Write Throughput (SWT) –  SequenFal Read Throughput (SRT) –  SequenFal Read-­‐Write Throughput (SRWT) Benchmark File Name File Size HDFS Parameter SWL √ √ √ SRL/RRL √ √ √ SWT √ √ SRT √ √ √ SRWT √ √ √ HPCAC Switzerland Conference (Mar '15) N. S. Islam, X. Lu, M. W. Rahman, J. Jose, and D. K. Panda, A Micro-­‐benchmark Suite for Evalua'ng HDFS Opera'ons on Modern Clusters, Int'l Workshop on Big Data Benchmarking (WBDB '12), December 2012. Readers Writers Random/ SequenFal Read Seek Interval √ √ (RRL) √ √ 60 OSU HiBD Micro-­‐Benchmark (OHB) Suite -­‐ RPC •  Two different micro-­‐benchmarks to evaluate the performance of standalone Hadoop RPC –  Latency: Single Server, Single Client –  Throughput: Single Server, MulFple Clients •  A simple script framework for job launching and resource monitoring •  Calculates staFsFcs like Min, Max, Average •  Network configuraFon, Tunable parameters, DataType, CPU UFlizaFon Component Network Address Port Data Type Min Msg Size Max Msg Size No. of IteraFons lat_client √ √ √ √ √ √ lat_server √ √ Component Handlers Verbose √ √ Network Address Port Data Type Min Msg Size Max Msg Size No. of IteraFons thr_client √ √ √ √ √ √ thr_server √ √ √ No. of Clients √ Handlers Verbose √ √ √ √ X. Lu, M. W. Rahman, N. Islam, and D. K. Panda, A Micro-­‐Benchmark Suite for Evalua'ng Hadoop RPC on High-­‐
Performance Networks, Int'l Workshop on Big Data Benchmarking (WBDB '13), July 2013. HPCAC Switzerland Conference (Mar '15) 61 OSU HiBD Micro-­‐Benchmark (OHB) Suite -­‐ MapReduce
•  Evaluate the performance of stand-­‐alone MapReduce •  Does not require or involve HDFS or any other distributed file system •  Considers various factors that influence the data shuffling phase –  underlying network configuraFon, number of map and reduce tasks, intermediate shuffle data pa<ern, shuffle data size etc. •  Three different micro-­‐benchmarks based on intermediate shuffle data pa<erns –  MR-­‐AVG micro-­‐benchmark: intermediate data is evenly distributed among reduce tasks. –  MR-­‐RAND micro-­‐benchmark: intermediate data is pseudo-­‐randomly distributed among reduce tasks. –  MR-­‐SKEW micro-­‐benchmark: intermediate data is unevenly distributed among reduce tasks. D. Shankar, X. Lu, M. W. Rahman, N. Islam, and D. K. Panda, A Micro-­‐Benchmark Suite for Evalua'ng Hadoop MapReduce on High-­‐Performance Networks, BPOE-­‐5 (2014). HPCAC Switzerland Conference (Mar '15) 62 Future Plans of OSU High Performance Big Data Project •  Upcoming Releases of RDMA-­‐enhanced Packages will support –  Spark –  HBase –  Plugin-­‐based designs •  Upcoming Releases of OSU HiBD Micro-­‐Benchmarks (OHB) will support –  HDFS –  MapReduce –  RPC •  ExploraFon of other components (Threading models, QoS, VirtualizaFon, Accelerators, etc.) •  Advanced designs with upper-­‐level changes and opFmizaFons HPCAC Switzerland Conference (Mar '15) 63 Concluding Remarks •  Presented an overview of Big Data processing middleware •  Discussed challenges in acceleraFng Big Data middleware •  Presented iniFal designs to take advantage of InfiniBand/RDMA for Hadoop, Spark, Memcached, and HBase •  Presented challenges in designing benchmarks •  Results are promising •  Many other open issues need to be solved •  Will enable Big processing community to take advantage of modern HPC technologies to carry out their analyFcs in a fast and scalable manner HPCAC Switzerland Conference (Mar '15) 64 Personnel Acknowledgments Current Senior Research Associates Current Students –  A. Awan (Ph.D.) –  M. Li (Ph.D.) –  K. Hamidouche –  A. Bhat (M.S.) –  M. Rahman (Ph.D.) –  X. Lu –  S. Chakraborthy (Ph.D.) –  D. Shankar (Ph.D.) –  C.-­‐H. Chu (Ph.D.) –  A. Venkatesh (Ph.D.) –  N. Islam (Ph.D.) –  J. Zhang (Ph.D.) Current Post-­‐Doc –  H. Subramoni Current Programmer –  J. Perkins –  J. Lin Current Research Specialist –  M. Arnold Past Students –  P. Balaji (Ph.D.) –  W. Huang (Ph.D.) –  M. Luo (Ph.D.) –  G. Santhanaraman (Ph.D.) –  D. BunFnas (Ph.D.) –  W. Jiang (M.S.) –  A. Mamidala (Ph.D.) –  A. Singh (Ph.D.) –  S. Bhagvat (M.S.) –  J. Jose (Ph.D.) –  G. Marsh (M.S.) –  J. Sridhar (M.S.) –  L. Chai (Ph.D.) –  S. Kini (M.S.) –  V. Meshram (M.S.) –  S. Sur (Ph.D.) –  B. Chandrasekharan (M.S.) –  M. Koop (Ph.D.) –  S. Naravula (Ph.D.) –  H. Subramoni (Ph.D.) –  N. Dandapanthula (M.S.) –  R. Kumar (M.S.) –  R. Noronha (Ph.D.) –  K. Vaidyanathan (Ph.D.) –  V. Dhanraj (M.S.) –  S. Krishnamoorthy (M.S.) –  X. Ouyang (Ph.D.) –  A. Vishnu (Ph.D.) –  T. Gangadharappa (M.S.) –  K. Kandalla (Ph.D.) –  S. Pai (M.S.) –  J. Wu (Ph.D.) –  K. Gopalakrishnan (M.S.) –  P. Lai (M.S.) –  S. Potluri (Ph.D.)
–  W. Yu (Ph.D.) –  J. Liu (Ph.D.) –  R. Rajachandrasekar (Ph.D.) Past Post-­‐Docs – 
H. Wang –  E. Mancini – 
X. Besseron –  S. Marcarelli – 
H.-­‐W. Jin –  J. Vienne – 
M. Luo HPCAC Switzerland Conference (Mar '15) Past Research Scien8st Past Programmers – 
S. Sur –  D. Bureddy 65 Thank You! [email protected]­‐state.edu Network-­‐Based CompuFng Laboratory h<p://nowlab.cse.ohio-­‐state.edu/
The MVAPICH2/MVAPICH2-­‐X Project h<p://mvapich.cse.ohio-­‐state.edu/
HPCAC Switzerland Conference (Mar '15) The High-­‐Performance Big Data Project h<p://hibd.cse.ohio-­‐state.edu/
66 Call For Par'cipa'on Interna'onal Workshop on High-­‐Performance Big Data Compu'ng (HPBDC 2015) In conjuncFon with InternaFonal Conference on Distributed CompuFng Systems (ICDCS 2015) In Hilton Downtown, Columbus, Ohio, USA, Monday, June 29th, 2015 h<p://web.cse.ohio-­‐state.edu/~luxi/hpbdc2015 HPCAC Switzerland Conference (Mar '15) 67 Mul'ple Posi'ons Available in My Group •  Looking for Bright and EnthusiasFc Personnel to join as –  Post-­‐Doctoral Researchers –  PhD Students –  Hadoop/Big Data Programmer/Sonware Engineer –  MPI Programmer/Sonware Engineer •  If interested, please contact me at this conference and/or send an e-­‐mail to [email protected]­‐state.edu HPCAC China, Nov. '14 68