Introduction to Cask Data Application Platform

Introduction to Cask Data Application Platform
The Introduction to Cask Data Application Platform (CDAP) course delivers the key
concepts and expertise needed to build real-time and batch applications on CDAP. This
course is comprised of a combination of lectures and labs to reinforce core concepts.
At the end of the course participants will be able to write CDAP applications, integrate
applications in their CI environments and possess the skills to test and debug
applications.
Course Duration
8 Hours
Audience & Prerequisites
This course is designed for developers and architects who are looking to use CDAP.
Participants must be familiar with the basics of Java programming, debugging skills
and should have basic knowledge about HDFS and MapReduce.
Materials Required
Laptop with Mac, Windows or Linux installed.
For Windows
Virtual Box installed (https://www.virtualbox.org/wiki/Downloads)
CDAP Standalone VM (will be provided by the instructor)
For Mac / Linux
JDK 1.6 or 1.7 installed
Maven 3.1.1 or higher installed
CDAP Standalone ZIP (will be provided by the instructor)
●
●
●
●
●
Course Overview
●
●
●
●
●
●
●
●
CDAP Concepts & Capabilities
Data Ingestion & Exploration
Understanding Datasets
Data Serving
Batch Processing
Scheduling & Sequencing Jobs using Workflow
Real-time processing
Testing and Debugging Strategies
Course Outline
1. The Motivation for CDAP
Context
The Hadoop Ecosystem Today
Challenges with Hadoop
The need for CDAP
CDAP benefits for developers
●
●
●
●
8. Datasets on HBase
Introduction to Ta ble
Using Table Datasets
Comparison to relational
databases
●
●
●
●
2. CDAP Overview
Functional view of CDAP
CDAP architecture
Building Blocks
●
●
9. Datasets on HDFS
Creating and using file based
Datasets
Partitioned file Datasets
●
●
●
3. Lab1-Application Lifecycle
Lifecycle management for
CDAP applications
●
4. Data Ingestion
Ingesting data in real-time
Ingesting data in batch
Managing the data lifecycle
●
10. Data Serving
Using services in applications
Writing handlers to serve data
●
●
11. Lab3 - Datasets and Services
Reading and Writing Datasets
using Services
5. Data Exploration
Exploring ingested data
Attaching schemas
●
●
6. Lab2-Ingestion & Exploration
Ingesting data in real-time &
batch
Attaching and refining schemas
●
●
7. Introduction to Datasets
The need for Datasets
Data abstraction using
Datasets
Using Datasets in applications
●
●
15. Real-time Processing
Introduction to Tigon
Reading from Streams &
Datasets
Writing to Datasets from
Tigon Flows
●
●
●
16. Lab 5 -Simple real-time Flows
Writing a simple flow that
reads from streams and
writes to Datasets
●
●
●
●
14. Lab 4 - MapReduce & Workflow
Writing a simple MapReduce
job and scheduling using
Workflow
17. Testing and Debugging
Unit testing CDAP
applications
Integrating testing in CI
Debugging CDAP
applications in clusters
●
12. Batch Processing
Brief intro to MapReduce
Writing batch jobs in CDAP
Reading from Streams &
Datasets
Writing data to Datasets
●
●
●
●
●
●
18. Lab 6 -Writing Unit tests
Writing unit tests for CDAP
applications
●
13. Scheduling using Workflow
Scheduling jobs using
workflows
Creating time-based and
size-based triggers
●
●
●
●
Cask Data, Inc.,
150 Grant Ave, Palo Alto, CA, 94306