“DevOps for Big Data”

“DevOps for Big Data”
โดย คุณศุภเกศ วงศ์คำภู
Solution Architect at Enersys.co.th
สัมมนา Big Data & Analytics โดย ดาต้า คิวบ์ (facebook.com/datacube.th)
DevOps for Big Data
Software Every Thing @ Enersys
•
FICO (Thailand) (Past)
•
DST (Thailand) (Past)
•
Thomson Reuter (Thailand) (Past)
•
Meta Genesis Development (Past)
@Supaket
http://facebook.com/supaket
https://www.linkedin.com/in/supaket
DevOps for Big Data by @Supaket
4 April 2015
Software Engineering practice
Time to market
Dev
build faster
test in production like
reduce time to test
virtualization dev & test
deploy faster
deploy often
increase coverage
Ops
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket
4 April 2015
What is DevOps? - In Simple English
http://www.youtube.com/watch?v=_I94-tJlovg
DevOps for Big Data by @Supaket
4 April 2015
DevOps
DevOps
(a portmanteau of "development" and "operations")
is a concept dealing with, among other things: software development,
operations, and services. It emphasises communication, collaboration,
and integration between software developers and information technology
(IT) operations personnel.
en.wikipedia.org/wiki/DevOps
DevOps for Big Data by @Supaket
4 April 2015
DevOps
Culture
Tools
Mind Set of Culture, Process and Tools adoption to
make software more quality, faster develop/test/
release, for speed up time to market
Process
supaket
DevOps for Big Data by @Supaket
4 April 2015
2014 State of DevOps report
Strong IT performance is a competitive advantage. Firms with high-performing IT
organisations were twice as likely to exceed their profitability, market share and
productivity goals
DevOps for Big Data by @Supaket
4 April 2015
2014 State of DevOps report
DevOps practices improve IT performance. IT performance strongly correlates
with well-known DevOps practices such as use of version control and continuous
delivery. The longer an organization has implemented — and continues to improve
upon — DevOps practices, the better it performs. And better IT performance
correlates to higher performance for the entire organization.
DevOps for Big Data by @Supaket
4 April 2015
2014 State of DevOps report
Organizational culture matters. Organizational culture is one of the strongest
predictors of both IT performance and overall performance of the organization. Hightrust organizations encourage good information flow, cross-functional collaboration,
shared responsibilities, learning from failures and new ideas; they are also the most
likely to perform at a high level. These cultural practices and norms found in hightrust organizations are also at the heart of DevOps, which helps explain why DevOps
practices correlate so strongly with high organizational performance.
DevOps for Big Data by @Supaket
4 April 2015
2014 State of DevOps report
Job satisfaction is the No. 1 predictor of organisational performance. We all know
how job satisfaction feels: It’s about doing work that’s challenging and meaningful,
and being empowered to exercise our skills and judgment. We also know that where
there’s job satisfaction, employees bring the best of themselves to work: their
engagement, their creativity and their strongest thinking. That makes for more
innovation in any area of the business, including IT.
DevOps for Big Data by @Supaket
4 April 2015
Production vs Development environment
What the problem?
DevOps for Big Data by @Supaket
4 April 2015
Common Problems
It works on my machine
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket
4 April 2015
Common Problems
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket
4 April 2015
Common Problems
Reproducible
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket
4 April 2015
Common Problems
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket
4 April 2015
Common Problems
20 Guys join team, How to Start develop in 1st Day?
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket
4 April 2015
Common Problems
Production Like environment
http://www.blue-agility.com/important-lesson-getting-code-production/
DevOps for Big Data by @Supaket
4 April 2015
Introduction to Virtualization
Production Environment
Production Like environment
Developer Machine
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket
4 April 2015
What ’s about virtualization ?
Hypervisor
Container
DevOps for Big Data by @Supaket
4 April 2015
What is Vagrant & Docker ?
DevOps for Big Data by @Supaket
4 April 2015
What is Vagrant?
Vagrant is a tool for building complete development environments. With an
easy-to-use workflow and focus on automation, Vagrant lowers development
environment setup time, increases development/production parity, and
makes the "works on my machine" excuse a relic of the past.
Vagratup.com
•
•
A VM management tool
Automate the setup of your environment ( Dev & QA )
DevOps for Big Data by @Supaket
4 April 2015
Vagrant.
Vagrant Command
-
init
up
halt
reload
pause
resume
destroy
package
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket
4 April 2015
Vagrant - Big Picture
DevOps for Big Data by @Supaket
4 April 2015
Vagrant - Network Mode
DevOps for Big Data by @Supaket
4 April 2015
Vagrant for Developer Machine
New Joiner
•
Someone joins your project…
•
They pick up their laptop…
•
Then spend the next 1-2 days
following instructions on setting up
their environment, tools, etc.
DevOps for Big Data by @Supaket
4 April 2015
What is Docker?
Docker is an open platform for developers and sysadmins to build, ship, and run
distributed applications. Consisting of Docker Engine, a portable, lightweight
runtime and packaging tool, and Docker Hub, a cloud service for sharing
applications and automating workflows, Docker enables apps to be quickly
assembled from components and eliminates the friction between development, QA,
and production environments. As a result, IT can ship faster and run the same app,
unchanged, on laptops, data center VMs, and any cloud.
Solomon Hykes, Docker’s Founder & CTO, gives an overview of Docker in this short video (7:16).
DevOps for Big Data by @Supaket
4 April 2015
What is Docker?
DevOps for Big Data by @Supaket
4 April 2015
Docker for shipping an immune environment
DevOps for Big Data by @Supaket
4 April 2015
Apache'Spark
An'introduction'to'Spark'and'Spark'streaming
DevOps for Big Data by @Supaket
4 April 2015
What'is'Apache'Spark?
• Cluster'computing'engine'designed'to'be'fast'and'general:purpose'
• Good'for'Processing'data'streaming'
• Good'for'Machine'learning'task'
• Unified'platform
DevOps for Big Data by @Supaket
4 April 2015
Spark'Components
DevOps for Big Data by @Supaket
4 April 2015
Spark'Core
• Basic'functionality'of'Spark,'including'components'for'task'scheduling,'
memory'management,'fault'recovery,'interacting'with'storage'
systems,'and'more'
• Provide(API(for(Resilient(distributed(datasets'(RDDs)
DevOps for Big Data by @Supaket
4 April 2015
Concept':'Resilient'distributed'datasets'(RDDs)
• Immutable'Collections'of'objects'spread'across'a'cluster'
• Built'through'parallel'transformations'(map,'filter,'etc.)'
• Controllable'persistence'(e.g.'caching'in'RAM)'
• Automatically'rebuilt'on'failure'
• Contain'any'type'of'Python,'Java,'or'Scala'objects,'including'user:defined'classes.
Key'Idea:'Write'programs'in'terms'of'transformations'on'
distributed'datasets
DevOps for Big Data by @Supaket
4 April 2015
Spark'Streaming'(1)
• Spark'component'that'enables'processing'of'live%streams'of'data''
i.e.'production'log'file,'queue,''
• Provide'an'API'for'manipulate'data'stream'(DStream)''
• Fault'tolerance,'throughput,'and'scalability'as'Spark'Core.'
• Spark’s'built:in'machine'learning'algorithms'and'graph'processing'
algorithms'can'be'applied'to'data'streams
DevOps for Big Data by @Supaket
4 April 2015
Spark'Streaming'(2)
• Chop'up'the'live'stream'into'batches'of'X'seconds'
• Spark'treats'each'batch'of'data'as'RDDs''
'''and'processes'them'using'RDD'operations'
• Finally,'the'processed'results'of''
'''the'RDD'operations'are'returned'in'batches
DevOps for Big Data by @Supaket
4 April 2015
Log anomaly detection in production
Apache'Spark
Input'Reader
APACHE'LOG'Reader
JsonMesage
DSTREAM
PredictionModel
production environment
RDD
FileOutPut
YARN
Result Output
Vagrant
DevOps for Big Data by @Supaket
4 April 2015
Log anomaly detection in Development
Apache'Spark
Input'Reader
APACHE'LOG'Reader
JsonMesage
DSTREAM
PredictionModel
developer machine
RDD
FileOutPut
YARN
Docker
Docker
Result Output
Vagrant
DevOps for Big Data by @Supaket
4 April 2015
Show case
Running Demo
DevOps for Big Data by @Supaket
4 April 2015
Q&A
Thank you
DevOps for Big Data by @Supaket
4 April 2015
Reference
http://www.devopsdays.in.th
http://www.devopsdays.org
http://devopscafe.org
http://vimeo.com/devopsdays
http://newrelic.com/devops/lifecycle
http://www.slideshare.net/search/slideshow?searchfrom=header&q=devops
DevOps for Big Data by @Supaket
4 April 2015