Cloud Native Open Source at Netflix May 2013 Adrian Cockcroft

Cloud Native Open Source at
Netflix
May 2013
Adrian Cockcroft
@adrianco #netflixcloud @NetflixOSS
http://www.linkedin.com/in/adriancockcroft
Cloud Native
NetflixOSS – Cloud Native On-Ramp
Netflix Open Source Cloud Prize
We are Engineers
We solve hard problems
We build amazing and complex things
We fix things when they break
We strive for perfection
Perfect code
Perfect hardware
Perfectly operated
But perfection takes too long…
So we compromise
Time to market vs. Quality
Utopia remains out of reach
Where time to market wins big
Web services
Agile infrastructure - cloud
Continuous deployment
How Soon?
Code features in days instead of months
Hardware in minutes instead of weeks
Incident response in seconds instead of hours
Tipping the Balance
Utopia
Dystopia
A new engineering challenge
Construct a highly agile and highly
available service from ephemeral and
often broken components
Netflix Streaming
A Cloud Native Application
Netflix Member Web Site Home Page
Personalization Driven – How Does It Work?
How Netflix Streaming Works
Consumer
Electronics
User Data
Web Site or
Discovery API
AWS Cloud
Services
Personalization
CDN Edge
Locations
DRM
Customer Device
(PC, PS3, TV…)
Streaming API
QoS Logging
OpenConnect
CDN Boxes
CDN
Management and
Steering
Content Encoding
Real Web Server Dependencies Flow
(Netflix Home page business transaction as seen by AppDynamics)
Each icon is
three to a few
hundred
instances
across three
AWS zones
Start Here
Three Personalization movie group
choosers (for US, Canada and Latam)
Cassandra
memcached
Web service
S3 bucket
Cloud Native Architecture
Clients
Things
Autoscaled Micro
Services
JVM
JVM
JVM
Autoscaled Micro
Services
JVM
JVM
Memcached
Distributed Quorum
NoSQL Datastores
Cassandra
Cassandra
Cassandra
Zone A
Zone B
Zone C
Non-Native Cloud Architecture
Agile Mobile
Mammals
iOS/Android
Cloudy
Buffer
App Servers
Datacenter
Dinosaurs
MySQL
Legacy Apps
New Anti-Fragile Patterns
Micro-services and Chaos engines
Highly available systems composed
from ephemeral components
Open Source is the default
Cloud Native
Master copies of data are cloud resident
Everything is dynamically provisioned
All services are ephemeral
How to get to Cloud Native
Freedom and Responsibility for Developers
Decentralize and Automate Ops Activities
Integrate DevOps into the Business Organization
Netflix BusDevOps Organization
Chief Product
Officer
VP Product
Management
VP UI
Engineering
VP Discovery
Engineering
VP Platform
Directors
Product
Directors
Development
Directors
Development
Directors
Platform
Code, independently updated
continuous delivery
Developers +
DevOps
Developers +
DevOps
Developers +
DevOps
Denormalized, independently
updated and scaled data
UI Data
Sources
Discovery
Data Sources
Platform
Data Sources
AWS
AWS
AWS
Cloud, independently updated
and scaled infrastructure
Four Transitions
• Management: Integrated Roles in a Single Organization
– Business, Development, Operations -> BusDevOps
• Developers: Denormalized Data – NoSQL
– Decentralized, scalable, available, polyglot
• Responsibility from Ops to Dev: Continuous Delivery
– Decentralized small daily production updates
• Responsibility from Ops to Dev: Agile Infrastructure - Cloud
– Hardware in minutes, provisioned directly by developers
Decentralized Deployment
Asgard
http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Ephemeral Instances
• Largest services are autoscaled
• Average lifetime of an instance is 36 hours
Autoscale Up
Autoscale Down
P
u
s
h
Leveraging Public Scale
1,000 Instances
Public
Startups
100,000 Instances
Grey
Area
Netflix
Private
Google
How big is Public?
AWS Maximum Possible Instance Count 3.7 Million
Growth >10x in Three Years, >2x Per Annum
AWS upper bound estimate based on the number of public IP Addresses
Every provisioned instance gets a public IP by default
Managing Multi-Region Availability
AWS
Route53
DynECT
DNS
UltraDNS
Denominator
Regional Load Balancers
Regional Load Balancers
Zone A
Zone B
Zone C
Zone A
Zone B
Zone C
Cassandra Replicas
Cassandra Replicas
Cassandra Replicas
Cassandra Replicas
Cassandra Replicas
Cassandra Replicas
Denominator – manage traffic via multiple DNS providers
Cloud Native Big Data
Netflix Dataoven
From cloud
Services
~100 Billion
Events/day
From C*
Terabytes of
Dimension
data
RDS
Ursula
Metadata
Aegisthus
Data Pipelines
Data Warehouse
Over 2 Petabytes
Gateways
Hadoop Clusters – AWS EMR
1300 nodes
800 nodes
Multiple 150 nodes Nightly
Tools
A Cloud Native Open Source Platform
Inspiration
Netflix Platform Evolution
2009-2010
Bleeding Edge
Innovation
2011-2012
Common
Pattern
2013-2014
Shared
Pattern
Netflix ended up several years ahead of the
industry, but it’s becoming commoditized now
Establish our
solutions as Best
Practices / Standards
Hire, Retain and
Engage Top
Engineers
Goals
Build up Netflix
Technology Brand
Benefit from a
shared ecosystem
How does it all fit together?
Example Application – RSS Reader
Boosting the @NetflixOSS Ecosystem
Judges
Aino Corry
Program Chair for Qcon/GOTO
Werner Vogels
CTO Amazon
Simon Wardley
Strategist
Joe Weinman
SVP Telx, Author “Cloudonomics”
Martin Fowler
Chief Scientist Thoughtworks
Yury Izrailevsky
VP Cloud Netflix
Github
Registration
Opened
March 13
Github
Apache
Licensed
Contributions
Github
Close Entries
September 15
AWS
Re:Invent
Six Judges
Award
Ceremony
Dinner
November
Winners
$10K cash
$5K AWS
Netflix
Engineering
Ten Prize
Categories
Trophy
AWS
Re:Invent
Tickets
Entrants
Conforms to
Rules
Nominations
Categories
Working
Code
Community
Traction
Functionality and scale now, portability coming
Moving from parts to a platform in 2013
Netflix is fostering a cloud native ecosystem
Rapid Evolution - Low MTBIAMSH
(Mean Time Between Idea And Making Stuff Happen)
Takeaway
NetflixOSS makes it easier for everyone to become Cloud Native
Open Source is not just the default, it’s a strategic weapon
@adrianco #netflixcloud @NetflixOSS