Cloud Native Open Source at Netflix May 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft Cloud Native NetflixOSS – Cloud Native On-Ramp Netflix Open Source Cloud Prize We are Engineers We solve hard problems We build amazing and complex things We fix things when they break We strive for perfection Perfect code Perfect hardware Perfectly operated But perfection takes too long… So we compromise Time to market vs. Quality Utopia remains out of reach Where time to market wins big Web services Agile infrastructure - cloud Continuous deployment How Soon? Code features in days instead of months Hardware in minutes instead of weeks Incident response in seconds instead of hours Tipping the Balance Utopia Dystopia A new engineering challenge Construct a highly agile and highly available service from ephemeral and often broken components Netflix Streaming A Cloud Native Application Netflix Member Web Site Home Page Personalization Driven – How Does It Work? How Netflix Streaming Works Consumer Electronics User Data Web Site or Discovery API AWS Cloud Services Personalization CDN Edge Locations DRM Customer Device (PC, PS3, TV…) Streaming API QoS Logging OpenConnect CDN Boxes CDN Management and Steering Content Encoding Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics) Each icon is three to a few hundred instances across three AWS zones Start Here Three Personalization movie group choosers (for US, Canada and Latam) Cassandra memcached Web service S3 bucket Cloud Native Architecture Clients Things Autoscaled Micro Services JVM JVM JVM Autoscaled Micro Services JVM JVM Memcached Distributed Quorum NoSQL Datastores Cassandra Cassandra Cassandra Zone A Zone B Zone C Non-Native Cloud Architecture Agile Mobile Mammals iOS/Android Cloudy Buffer App Servers Datacenter Dinosaurs MySQL Legacy Apps New Anti-Fragile Patterns Micro-services and Chaos engines Highly available systems composed from ephemeral components Open Source is the default Cloud Native Master copies of data are cloud resident Everything is dynamically provisioned All services are ephemeral How to get to Cloud Native Freedom and Responsibility for Developers Decentralize and Automate Ops Activities Integrate DevOps into the Business Organization Netflix BusDevOps Organization Chief Product Officer VP Product Management VP UI Engineering VP Discovery Engineering VP Platform Directors Product Directors Development Directors Development Directors Platform Code, independently updated continuous delivery Developers + DevOps Developers + DevOps Developers + DevOps Denormalized, independently updated and scaled data UI Data Sources Discovery Data Sources Platform Data Sources AWS AWS AWS Cloud, independently updated and scaled infrastructure Four Transitions • Management: Integrated Roles in a Single Organization – Business, Development, Operations -> BusDevOps • Developers: Denormalized Data – NoSQL – Decentralized, scalable, available, polyglot • Responsibility from Ops to Dev: Continuous Delivery – Decentralized small daily production updates • Responsibility from Ops to Dev: Agile Infrastructure - Cloud – Hardware in minutes, provisioned directly by developers Decentralized Deployment Asgard http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html Ephemeral Instances • Largest services are autoscaled • Average lifetime of an instance is 36 hours Autoscale Up Autoscale Down P u s h Leveraging Public Scale 1,000 Instances Public Startups 100,000 Instances Grey Area Netflix Private Google How big is Public? AWS Maximum Possible Instance Count 3.7 Million Growth >10x in Three Years, >2x Per Annum AWS upper bound estimate based on the number of public IP Addresses Every provisioned instance gets a public IP by default Managing Multi-Region Availability AWS Route53 DynECT DNS UltraDNS Denominator Regional Load Balancers Regional Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Denominator – manage traffic via multiple DNS providers Cloud Native Big Data Netflix Dataoven From cloud Services ~100 Billion Events/day From C* Terabytes of Dimension data RDS Ursula Metadata Aegisthus Data Pipelines Data Warehouse Over 2 Petabytes Gateways Hadoop Clusters – AWS EMR 1300 nodes 800 nodes Multiple 150 nodes Nightly Tools A Cloud Native Open Source Platform Inspiration Netflix Platform Evolution 2009-2010 Bleeding Edge Innovation 2011-2012 Common Pattern 2013-2014 Shared Pattern Netflix ended up several years ahead of the industry, but it’s becoming commoditized now Establish our solutions as Best Practices / Standards Hire, Retain and Engage Top Engineers Goals Build up Netflix Technology Brand Benefit from a shared ecosystem How does it all fit together? Example Application – RSS Reader Boosting the @NetflixOSS Ecosystem Judges Aino Corry Program Chair for Qcon/GOTO Werner Vogels CTO Amazon Simon Wardley Strategist Joe Weinman SVP Telx, Author “Cloudonomics” Martin Fowler Chief Scientist Thoughtworks Yury Izrailevsky VP Cloud Netflix Github Registration Opened March 13 Github Apache Licensed Contributions Github Close Entries September 15 AWS Re:Invent Six Judges Award Ceremony Dinner November Winners $10K cash $5K AWS Netflix Engineering Ten Prize Categories Trophy AWS Re:Invent Tickets Entrants Conforms to Rules Nominations Categories Working Code Community Traction Functionality and scale now, portability coming Moving from parts to a platform in 2013 Netflix is fostering a cloud native ecosystem Rapid Evolution - Low MTBIAMSH (Mean Time Between Idea And Making Stuff Happen) Takeaway NetflixOSS makes it easier for everyone to become Cloud Native Open Source is not just the default, it’s a strategic weapon @adrianco #netflixcloud @NetflixOSS
© Copyright 2025