Cloud Computing Open source cloud infrastructures Keke Chen

Cloud Computing
Open source cloud infrastructures
Keke Chen
Outline
 Project 3
 Eucalyptus
 OpenStack
Project 3: using AWS
 Tasks (work from nimbus17)
 Create AWS account and setup the
environment
 Try basic EC2 commands
 Start a hadoop cluster on EC2, using the
hadoopEC2 tool
 Read the code of hadoopEC2 to understand
how to interact with EC2 in shell scripts
Starting hadoop cluster on EC2
 Read
 http://wiki.apache.org/hadoop/AmazonEC2
 Setup
 Check src/contrib/ec2/bin/hadoop-ec2env.sh
 You don’t need to change anything there
 You should setup your own environment
variables in .profile, .login, or .bashrc
 AWS_ACCOUNT_ID,
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY
Starting hadoop on EC2
 copy $HADOOP_HOME/src/contrib/ec2 to
your own directory
 % bin/hadoop-ec2 launch-cluster yourcluster-name #ofslaves
 % bin/hadoop-ec2 login your-cluster-name
 Test your cluster
 /usr/local/hadoop-*
 Hadoop fsck /
 Diagnose problems (understand the hadoop
setup)
 http://www.michael-noll.com/tutorials/runninghadoop-on-ubuntu-linux-single-node-cluster/
Read the source of the EC2 tool
 Check the script hadoop-ec2 and learn
how to
 automatically launch instances
 Pass initialization scripts to instances
 Change Hadoop configuration
Use Boto
 Implement some functions with the boto
library and python
Most popular open-source AWS
equivalence
 Eucalyptus
 Started by UCSB researchers
 OpenStack
 Started by NASA
Eucalyptus
 Compatible to AWS APIs (EC2, S3,
mainly)
 Thus, Boto library can be used, too
 A good example for understanding how AWS
works
 Paper “The Eucalyptus Open-source
Cloud-computing System”
 How VM instances are managed
 How to provide virtual network (like elastic
IP)
 How to provide data storage (like S3)
 A very brief description, but we can get
something
System Design
Data center
CLC: cloud controller
CC: cluster controller
Walrus: storage controller similar to S3
NC: node controller
Components: Node Controller
 Make queries to discover physical resources




# of cores
Size of memory
Available disk space
State of VM instances
 Propagate the information to Cluster
Controller
 DescribeResource
 DescribeInstances
 Run/terminate instances
 CLCCC NC  hypervisor (Xen)
Node controller
 Start an instance
 Copy instance image from walrus or local cache
 Create endpoint in the virtual network overlay
 Instruct hypervisor to boot the instance
 Stop an instance
 Instruct hypervisor to terminate the VM
 Tear down the virtual network endpoint
 Clean up the files associated with the instance
Cluster Controller
 Gather/report information of NCs
 Through the interface provided by NCs
 Report the summary to CLC
 Schedule incoming instance “run”
requests to specific NCs
 Control the virtual network overlay
Virtual network overlay
 VM instance interconnectivity (between
different nodes/networks)
 Not very well mentioned in Xen
 Connectivity, isolation and performance
 At least one of a set of VMs be exposed
externally
 Map the public IP to that instance
 Restricted communication
 VMs in the same set can talk to each other
 VMs from different sets should be isolated
 Performance
Virtual network overlay
•Each VM has a private IP; one
VM in the set also has a public IP
•VLAN tag defines the subnet – to
isolate sets of VMs
•Cluster Controller serves as the
router between VM subnets
- CC uses Linux iptable
control traffics
- Use iptable Network
Address Translation (NAT) to
define the map from
Public IP to private IP
Storage Controller (Walrus)
 Provide SOAP/REST interfaces
 Compatible with S3 – you can use S3 tools
 Use Walrus to stream data in/out of the
cloud
 Store VM images (same as AMI)
 Root file system, kernel image, ramdisk
image
 No locking for object writes
 Conflict writes – late write overwrites the
earlier
 Provides the same tool Amazon uses
 Generate AMI
 Maintains a cache of images
 Authentication is applied when NC
accesses images
Cloud Controller
 A collection of web services
 Resource services
 Data services
 Interface services
Cloud Controller: resource
services
 Receive user requests
 Interact with CCs to allocate/deallocate
 System Resource State (SRS) is
maintained by querying CCs
 CCs will collect information from NCs
 Follows a “transactional” operation
 Reservation, VM creation  commit
 Or errors  rollback
 Realizing SLAs
Cloud Controller: data services
 Handles the creation, modification,
interrogation, and storage of stateful
system and user data
 There is a system database…
 Users can query the services
 Discover resource info (images, clusters)
 Manipulate abstract parameters(keypairs,
security groups, network definitions)
 Recall some of AWS interfaces…
Cloud Controller: interface
services
 User-visible interfaces
 Programmatic interfaces (SOAP/REST)
 Web interface
 Handling authentication
 Provide system management tools
OpenStack
OpenStack
 Originated at NASA, with Rackspace
 Driven by an open community process
 Multiple hypervisors: Xen, KVM, ESXi,
Hyper-V
 First release: Oct 2010
Components







Nova – Compute (equivalent to EC2)
Swift – object storage (S3)
Image service (AMI)
Networking (virtual network)
Block storage (Elastic block storage)
Identify
Dashboard (AWS web console)
-- mostly implemented with python
Fastest Growing Global
Open Source Community
COMPANIES
COUNTRIES
231
INDIVIDUAL MEMBERS
10,149
TOTAL CONTRIBUTORS
AVERAGE MONTHLY
CONTRIBUTORS
1,036 238
121
CODE CONTRIBUTIONS
70,137
As of July 2013
Global Community
Countries with members
Developer Growth
Contributors per month (ohloh)
1 Million+ Lines of Code
Lines of code (ohloh)
Ecosystem Growth
Participating Companies
250
200
150
100
50
0
Launch
Austin
Bexar
Cactus
Diablo
Essex
2-year
anniversary
Grizzly