Private Cloud at Wipro Cloud computing based on Condor

Private Cloud at Wipro
Cloud computing based on Condor
© 2009 Wipro Ltd - Confidential
Agenda
1 Background
2 Wipro Private Cloud
3 System architecture
4 Use of Condor
2
© 2009 Wipro Ltd - Confidential
Background
Need:
•
•
•
•
Share physical infrastructure between multiple projects and CoE's (Center of
Excellence) to reduce server sprawl and number of physical labs
Provide an environment for evaluating new technologies, developing solutions and
enabling collaboration between multiple labs
Centralize infrastructure procurement and management
Reduce infrastructure cost of CoE's by enabling multiple development
environments
Solution:
•
•
•
•
3
Setup a private cloud for virtual compute and application infrastructure
Build a self-service portal for on-demand provisioning to reduce process overheads
Support multiple types of virtualization software
Reuse existing physical infrastructure, procure minimal new infrastructure
© 2009 Wipro Ltd - Confidential
Wipro Private Cloud
4
© 2009 Wipro Ltd - Confidential
Wipro Private Cloud
Wipro Users
SaaS
User
Intranet
Developers
SaaSApp
SaaS Mgmt
Virtual Lab
SaaS Enablers
Managed
Network
Managed
Network
Wipro Cloud Portal / Web Services API Layer
Wipro Cloud Core
• Automated Provisioning
• Multi-tenancy & Isolation
• Cloud Accounting & Auditing
• Performance & Fault Monitoring
• Automated Network & Security
Physical Resource Pool
- Servers, Storage, Network
5
© 2009 Wipro Ltd - Confidential
Cloud
OA&M Portal
• Virtual Machines
• Shared Storage
• Virtual Appliances
• Application Services
Cloud
Admin
Cloud Services catalogue
Service Element
Service Feature
Virtual desktop – equivalent to 1.2GHz,512MB RAM,10GB HDD,25Mbps N/w
Compute Servers
Low End Server – equivalent to 2x1.2GHz, 2GB RAM, 20GB HDD, 25Mbps N/w
High End Server – equivalent to 4x1.2GHz, 4GB RAM, 40GB HDD, 25Mbps N/w
OS types
Storage
Public images/
appliances
6
Linux (CentOS, RHEL) and Windows XP/Server on Intel x86, x86_64 architecture
ISCSI (RAID 5), NFS and CIFS
Data persistence across power-off, suspend & resume of VM’s
Ready-to-use public images
 RHEL 5, Windows XP, LAMP (CentOS 5.2, Apache, Axis, Tomcat, MySQL, PHP,
Python)
Preconfigured Software load balancer, firewall appliances
Network
Isolation between CoE's resources
IPSec, SSL based VPN
Public and Private IP Addresses with NAT support
Private images
Can Upload VMware Server,VMware ESX and Xen Virtual Machine Image formats
Reports
Reporting on CPU, Storage and memory usage back to user
© 2009 Wipro Ltd - Confidential
Levels of Service
• L1 - Virtual Servers on demand
• Virtual servers, desktops, storage
• Migration assistance
• Self-service portal
• L2 – Application infrastructure on demand
• Appliances of standard software
• Managed backup, proactive monitoring and help-desk
• Itemized billing and charge-back
• L3 – Business service infrastructure on demand
• Scalable business services
• Multi-tenant application infrastructure (content management,
identity management, database, load balancer, firewall, ...)
7
© 2009 Wipro Ltd - Confidential
System Architecture
8
© 2009 Wipro Ltd - Confidential
Private Cloud – in Action
Virtual M/c design
•Standardize
•Automate
•Agile
•Caching
•Appliances
Network Control
Service
Layer
Load
Balancer
Service
LB - Active
LB - passive
App
Layer
Inst 1
Inst 2
Inst n
Virtual
Machine
Layer
VM 1
VM 2
VM n
Provisioning
•Resource mgmt
•Workload mgmt
•Auto recovery
•Task & Process
Automation
Bare-metal design
•Standardize
•Automate
•Re-provisioning
9
Bare-metal
Layer
© 2009 Wipro Ltd - Confidential
•Configuration &
Change mgmt
Alarms
Monitoring
•Performance
•Availability
•Alarms
•Billing
Cloud Management
•SLA’s, Policies, rules,
priorities
•Packaging
•Custom agents
•Shared Services
•Billing parameters
OA&M Portal &
Web Service
Gateway
Operations
Monitoring
•Design, Test
•Package, Deploy
Service design
Customer
OA & M Access
Business Users
Provisioning
Developers
Management
•Service Governor
•Policy enforcement
•Incident mgmt
•Optimizer
•Contention
Architecture & Service layers
Cloud service
10
© 2009 Wipro Ltd - Confidential
System Components
Web Service
Gateway
Customer Portal
Charge-back
Service Governor
Metrics Monitor
Grid Scheduler
VM Caching
Workflow
Manager
Cloud State
N/W Plugin
Storage
Plug-in
Bare-metal
Plug-in
VM Plug-in
Nagios
plug-in
N/W
provisioning
Storage
provisioning
Bare-metal
provisioning
VM
provisioning
N/W (nagios)
Monitoring
Legend:
11
Alerts
Developed in Wipro
In Development
© 2009 Wipro Ltd - Confidential
3rd Party components
VM Repo
Identity
Management
Deployment Example
Router, firewall
VPN Server, IPS, IDS, NAT
Project X
192.168.5.0/24
Project Y
192.168.6.0/24
VM
Virtual
Storage
VM
Cloud Backbone
10.201.72.0/24
Virtual
Machines
Project Z
192.168.7.0/24
VM
Storage
Isolated
network per
project
Mgmt Server
HA Pair
Cloud physical systems
Cloud Mgmt
192.168.3.0/24
12
© 2009 Wipro Ltd - Confidential
Switch Fabric
Use of Condor
13
© 2009 Wipro Ltd - Confidential
Why Condor?
• Trusty old features
–
–
–
–
Flexibility – ClassAd mechanism, configurations and policies
Web Services API
High availability
Resource utilization of jobs
• Newer features we like
–
–
–
–
VM Universe
Partitionable Slots
Lease management
Integration with Amazon EC2 (public cloud)
• Proven in large scale deployments
• Condor-users and condor-admin support
• Open source
14
© 2009 Wipro Ltd - Confidential
How are we using Condor?
•
•
•
•
•
Mostly standard configuration
A few custom class ads in jobs and machines
Schedd and Collector configured in HA mode
Condor spool for VM persistence
Virtual machine provision request handled by Condor
–
VM job to physical machine match-making, file transfer
• Partitionable slots for dynamic partitioning of physical machine
resources
• Customized condor_vm_* files for configuring and starting VM's
–
VLAN control, Swap disk and additional storage creation, ...
• Lease management for limiting the number of running instances
of a licensed image
15
© 2009 Wipro Ltd - Confidential
Observations, Workarounds, Wish list
Working with Condor:
–
With advanced Condor skills, a lot can be achieved without modifying
condor code
Workarounds:
–
–
–
–
Passing number of virtual CPUs to VMware
Patch to pass proxy username and password to gSOAP for EC2
integration
Patch to get VM resource usage details on ESX
Special configuration to handle 2 hour delay in detecting a few execute
node failures (Thanks Todd!)
Feature wish list:
–
–
16
Remote IWD support for VM universe, to avoid any file transfer
Live migration of VM jobs
© 2009 Wipro Ltd - Confidential
Thank You
[email protected]
[email protected]
© 2009 Wipro Ltd - Confidential