SuperVessel: Enabling Spark as a Service with

SuperVessel: Enabling Spark as a
Service with OpenStack and Docker
Guan Cheng (G.C.) Chen
IBM Research - China
weibo.com/parallellabs
SuperVessel Cloud
•  Public cloud built on the POWER7/POWER8 servers with OpenStack •  It provides free access for students, researchers, developers across the world, and helps grow OpenPOWER ecosystem (used in 30+ universiNes now) •  It provides advanced technology services such as Spark as a Service, Docker Services, CogniNve CompuNng Service, IoT Service, Accelerator as a Service (FPGA and GPU) 4/18/15
IBM Research -­‐ China
2  Spark as a Service
Try it at: www.ptopenlab.com
3 steps to launch a Spark cluster, easy!
Step 1: Login
4/18/15
Step 2: Create
IBM Research -­‐ China
Step 3: Ready!
3  Why OpenStack?
•  Most popular IaaS soXware •  Supports Docker* •  Heat can orchestra docker containers easily – good for provision a Spark cluster Picture source: h-p://www.qyjohn.net/?p=3801
4/18/15
IBM Research -­‐ China
4  Why Docker?
•  Less resource consumpNon than KVM –  We can provision more Spark clusters! •  Boot faster than KVM –  Users like fast provision! •  Incrementally build, revert and reuse your container –  We love Git and AUFS! •  However, Docker is not officially supported in OpenStack yet –  Nova Docker is an external component of OpenStack •  Port Docker to POWER Architecture (ppc64 and ppc64le) –  Ubuntu 15.04 includes Docker for POWER8 ppc64le
4/18/15
Picture source: h-p://www.zdnet.com/ar@cle/what-­‐is-­‐docker-­‐and-­‐why-­‐is-­‐it-­‐so-­‐
darn-­‐popular/
IBM Research -­‐ China
5  Why Spark?
•  Fast •  Unified •  Ecosystem PorNng to POWER –  Bugfix submieed to the community –  Spark 1.3 works smoothly! J 4/18/15
IBM Research -­‐ China
6  Why not Sahara?
Sahara is a component for Hadoop/Spark as a Service in OpenStack
•  We started from OpenStack Icehouse … •  DockerizaNon –  Beeer service deployment and isolaNon for Big Data dashboard server •  CustomizaNon –  WaiNng for Sahara’s improvements is somehow *SLOW* •  Docker, user analyNcs, Spark 1.4, Spark IDE, scheduling, data visualizaNon etc. 4/18/15
IBM Research -­‐ China
7  Architecture Design
Spark Master
Billing&Auth
NameNode
Glance
Spark Driver
Docker
Image
Big Data
Dashboard
Container 1
3
2
1
Heat
Nova
Spark Worker
DataNode
Nova
Docker
Container 2
2
Keystone
Neutron
Cinder/
Manila
Spark Worker
DataNode
Container 3
Spark Cluster
4/18/15
IBM Research -­‐ China
8  Dockerize everything!
•  We use containers to –  run applica<ons –  run other containers –  run OpenStack python daemons –  run OpenStack services inside OpenStack (with a special trunk link in neutron/OVS) –  run OpenStack inside OpenStack •  Good for mulN-­‐site expansion 4/18/15
IBM Research -­‐ China
9  Heat template design
Heat is a component for orchestration in OpenStack
•  Parameters –  Cinder/Malina/Neutron uuid –  Size of Cinder/Malina resources •  Resources –  Master/slave node –  Neutron –  Cinder/Manila •  Need to modify nova-­‐docker to mount the cinder/
manila resources when booNng the docker container 4/18/15
IBM Research -­‐ China
10  Spark Docker Image
•  Built from Ubuntu 14.04.1 •  All Spark nodes use the same image, with different iniNalizaNon scripts by using cloudinit •  IniNalizaNon scripts will –  Sync /etc/hosts across all nodes –  Set HDFS and Spark configuraNons accordingly –  Format HDFS and launch HDFS –  Launch Spark
4/18/15
IBM Research -­‐ China
11  Big Data Dashboard Development
•  2 Developers (frond end + backend) –  Online in 2 months –  Separates Heat related stuff and Dashboard –  Reskul API (for billing and authenNcaNon etc) –  Dockerize the big data dashboard server –  Separates development and producNon environment
4/18/15
IBM Research -­‐ China
12  Where should I put the data?
Shared File System for Cloud and Spark as a Service
Cloud  Infrastructure  Service  Big  Data  Service  Select Big data compuNng framework (Mapreduce, SPARK •  Select cluster size •  Select data folder size
• 
HEAT template for big data cluster
OpenStack controller
Servers
Servers
GPFS FPO
GPFS FPO
KeyStone
Horizon
Nova
HEAT
Glance
Cinder
Manila
Neutron
User A
User A
Docker
Docker
Docker
(Symphon
(Symphon
(Symphon
y)
y)y)
KVM/
Docker
POWER7/POWER8
(Web app)
GPFS FPO
• 
• 
• 
4/18/15
Folder A
User B
Docker
Docker
Docker
(SPARK)
(Symphon
(Symphon
y)y)
Folder B
POWER7/POWER8
HEAT will orchestrate docker instances, subnet and data folder based on user’s request Manila provides the NFS service using GPFS as backend, and the folder will be mounted via nova-­‐docker (with -­‐v support) Folder created by Manila could be accessed by the KVM/docker instances created for big data and other purpose IBM Research -­‐ China
13  SuperVessel Services Roadmap
Super Marketplace
(Online)
SuperVessel Cloud Service
1.  VM and container service 2.  Storage service 3.  Network service 4.  Accelerator as service (Online)
SuperVessel Big Data and HPC Service
1.  Big Data: MapReduce (Symphony), SPARK 2.  Performance tuning service
OpenPOWER Enablement Service
1.  X-­‐to-­‐P migraNon: AutoPort tool 2.  OpenPOWER new system test service
(Online)
Super Class Service
(Preparing)
Super Project Team Service
1.  On-­‐line video courses 2.  Teacher course management 3.  User contribuNon management 1.  Project management service 2.  DevOps automaNon
5.  Image service Docker
SuperVessel Cloud Infrastructure Storage
4/18/15
IBM POWER servers
OpenPOWER server
IBM Research -­‐ China
FPGA/GPU
14  Summary
•  Spark + OpenStack + Docker works very well on OpenPOWER servers •  Dockerized services made DevOps easier •  Docker issues –  Zombie process –  Can’t dynamically aeach volume –  Commit, aeach operaNons is not supported on Nova-­‐docker •  Monitoring everything (API, resources etc) –  key for operaNng a cloud •  TODO: Spark 1.4, Spark IDE, SWIFT, Data VisualizaNon 4/18/15
IBM Research -­‐ China
15  Join Us!
SuperVessel WeChat group
www.ptopenlab.com
QQ group: SuperVessel
@冠诚  [email protected]  4/18/15
IBM Research -­‐ China
16