SuperVessel: Enabling Spark as a Service with OpenStack and Docker Guan Cheng (G.C.) Chen IBM Research - China weibo.com/parallellabs SuperVessel Cloud • Public cloud built on the POWER7/POWER8 servers with OpenStack • It provides free access for students, researchers, developers across the world, and helps grow OpenPOWER ecosystem (used in 30+ universiNes now) • It provides advanced technology services such as Spark as a Service, Docker Services, CogniNve CompuNng Service, IoT Service, Accelerator as a Service (FPGA and GPU) 4/18/15 IBM Research -‐ China 2 Spark as a Service Try it at: www.ptopenlab.com 3 steps to launch a Spark cluster, easy! Step 1: Login 4/18/15 Step 2: Create IBM Research -‐ China Step 3: Ready! 3 Why OpenStack? • Most popular IaaS soXware • Supports Docker* • Heat can orchestra docker containers easily – good for provision a Spark cluster Picture source: h-p://www.qyjohn.net/?p=3801 4/18/15 IBM Research -‐ China 4 Why Docker? • Less resource consumpNon than KVM – We can provision more Spark clusters! • Boot faster than KVM – Users like fast provision! • Incrementally build, revert and reuse your container – We love Git and AUFS! • However, Docker is not officially supported in OpenStack yet – Nova Docker is an external component of OpenStack • Port Docker to POWER Architecture (ppc64 and ppc64le) – Ubuntu 15.04 includes Docker for POWER8 ppc64le 4/18/15 Picture source: h-p://www.zdnet.com/ar@cle/what-‐is-‐docker-‐and-‐why-‐is-‐it-‐so-‐ darn-‐popular/ IBM Research -‐ China 5 Why Spark? • Fast • Unified • Ecosystem PorNng to POWER – Bugfix submieed to the community – Spark 1.3 works smoothly! J 4/18/15 IBM Research -‐ China 6 Why not Sahara? Sahara is a component for Hadoop/Spark as a Service in OpenStack • We started from OpenStack Icehouse … • DockerizaNon – Beeer service deployment and isolaNon for Big Data dashboard server • CustomizaNon – WaiNng for Sahara’s improvements is somehow *SLOW* • Docker, user analyNcs, Spark 1.4, Spark IDE, scheduling, data visualizaNon etc. 4/18/15 IBM Research -‐ China 7 Architecture Design Spark Master Billing&Auth NameNode Glance Spark Driver Docker Image Big Data Dashboard Container 1 3 2 1 Heat Nova Spark Worker DataNode Nova Docker Container 2 2 Keystone Neutron Cinder/ Manila Spark Worker DataNode Container 3 Spark Cluster 4/18/15 IBM Research -‐ China 8 Dockerize everything! • We use containers to – run applica<ons – run other containers – run OpenStack python daemons – run OpenStack services inside OpenStack (with a special trunk link in neutron/OVS) – run OpenStack inside OpenStack • Good for mulN-‐site expansion 4/18/15 IBM Research -‐ China 9 Heat template design Heat is a component for orchestration in OpenStack • Parameters – Cinder/Malina/Neutron uuid – Size of Cinder/Malina resources • Resources – Master/slave node – Neutron – Cinder/Manila • Need to modify nova-‐docker to mount the cinder/ manila resources when booNng the docker container 4/18/15 IBM Research -‐ China 10 Spark Docker Image • Built from Ubuntu 14.04.1 • All Spark nodes use the same image, with different iniNalizaNon scripts by using cloudinit • IniNalizaNon scripts will – Sync /etc/hosts across all nodes – Set HDFS and Spark configuraNons accordingly – Format HDFS and launch HDFS – Launch Spark 4/18/15 IBM Research -‐ China 11 Big Data Dashboard Development • 2 Developers (frond end + backend) – Online in 2 months – Separates Heat related stuff and Dashboard – Reskul API (for billing and authenNcaNon etc) – Dockerize the big data dashboard server – Separates development and producNon environment 4/18/15 IBM Research -‐ China 12 Where should I put the data? Shared File System for Cloud and Spark as a Service Cloud Infrastructure Service Big Data Service Select Big data compuNng framework (Mapreduce, SPARK • Select cluster size • Select data folder size • HEAT template for big data cluster OpenStack controller Servers Servers GPFS FPO GPFS FPO KeyStone Horizon Nova HEAT Glance Cinder Manila Neutron User A User A Docker Docker Docker (Symphon (Symphon (Symphon y) y)y) KVM/ Docker POWER7/POWER8 (Web app) GPFS FPO • • • 4/18/15 Folder A User B Docker Docker Docker (SPARK) (Symphon (Symphon y)y) Folder B POWER7/POWER8 HEAT will orchestrate docker instances, subnet and data folder based on user’s request Manila provides the NFS service using GPFS as backend, and the folder will be mounted via nova-‐docker (with -‐v support) Folder created by Manila could be accessed by the KVM/docker instances created for big data and other purpose IBM Research -‐ China 13 SuperVessel Services Roadmap Super Marketplace (Online) SuperVessel Cloud Service 1. VM and container service 2. Storage service 3. Network service 4. Accelerator as service (Online) SuperVessel Big Data and HPC Service 1. Big Data: MapReduce (Symphony), SPARK 2. Performance tuning service OpenPOWER Enablement Service 1. X-‐to-‐P migraNon: AutoPort tool 2. OpenPOWER new system test service (Online) Super Class Service (Preparing) Super Project Team Service 1. On-‐line video courses 2. Teacher course management 3. User contribuNon management 1. Project management service 2. DevOps automaNon 5. Image service Docker SuperVessel Cloud Infrastructure Storage 4/18/15 IBM POWER servers OpenPOWER server IBM Research -‐ China FPGA/GPU 14 Summary • Spark + OpenStack + Docker works very well on OpenPOWER servers • Dockerized services made DevOps easier • Docker issues – Zombie process – Can’t dynamically aeach volume – Commit, aeach operaNons is not supported on Nova-‐docker • Monitoring everything (API, resources etc) – key for operaNng a cloud • TODO: Spark 1.4, Spark IDE, SWIFT, Data VisualizaNon 4/18/15 IBM Research -‐ China 15 Join Us! SuperVessel WeChat group www.ptopenlab.com QQ group: SuperVessel @冠诚 [email protected] 4/18/15 IBM Research -‐ China 16
© Copyright 2024