TRAINING PROGRAM ON BIGDATA/HADOOP Course: Training on Bigdata/Hadoop with Hands Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30 - 17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi, Navi Mumbai - 400705 Ph: 022 27841425 Fee Details: For Indian participants: 20,000 INR For Foreign participants: 350 USD Course Description: Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools or processing applications. On the other hand, the Apache Hadoop software library is a framework that allows for the distributed processing of large data sets using simple programming models. This hands-on on course equips participants on how to manage Bigdata using Hadoop. Who should attend? This course is meant for software developers/programmers who are interested in Bigdata/Hadoop. Key benefits: On course completion, participants would be knowledgeable on Managing Bigdata and comfortable working with Hadoop Distributed File Systems & components. Course Outline: Module 1: Introduction to Big Data Session 1: Introduction to Big Data • So What Is Big Data? • History of Data Management Management—Evolution of Big Data • Structuring of Big Data • Types of Big Data • Elements of Big Data • Application of Big Data in the Business Context • Careers in Big Data Session 2: Business application of Big Data • Significance of Social network Data • Uses of Social Network Data Analysis • Financial Fraud and Big Data • Preventing Fraud Using Big Data Analytics 1 TRAINING PROGRAM ON BIGDATA/HADOOP • Use of Big Data in the Retail Industry Session 3: Technologies for handling Big Data • Distributed and Parallel Computing for Big Data • Virtualization and its Importance to Big Data • Introducing Hadoop • Cloud Computing and Big Data • Features of Cloud Computing • Providers in Big Data Cloud Market • Issues in Using Cloud Services • In-Memory Memory Technology for Big Data Session 4: Understanding the Hadoop Ecosystem • The Hadoop Ecosystem • Processing Data with Hadoop MapReduce • Managing Resources and Applications with Hadoop YARN • Storing Big Data with HBase • Using Hive for Querying Big Databases • Interacting with Hadoop Ecosystem Session 5: Map reduce fundamentals • Origins of MapReduce • Characteristics of MapReduce • How MapReduce Works • More about Map and Reduce Functions • Optimization Techniques for MapReduce Jobs • Hardware/Network Topology • Applications of MapReduce • Role of HBase in Processing Big Data • Mining Big Data with Hive Module 2: Managing an Enterprise Wide Big Data Ecosystem Session 1- Big Data Technology Foundations • Exploring the Big Data Stack • Virtualization and Big Data • Processor and Memory Virtualization • Data and Storage Virtualization • Managing ging Virtualization with Hypervisor • Abstraction and Virtualization • Implementing Virtualization to Work with Big Data 2 TRAINING PROGRAM ON BIGDATA/HADOOP Session 2: Big Data management Systems – Databases and Warehouses • RDBMSs and Big Data Environment • PostgreSQL Relational Datab Database • Nonrelational Databases • Key-Value Pair Databases • Document Databases • Columnar Databases • Graph Databases • Spatial Databases. • Polyglot Persistence • Integrating Big Data with Traditional Data Warehouse • Rethinking Extraction, Transformation, and Loading • Big Data Analysis and Data Warehouse • Changing Deployment Models in Big Data Era Session 3: Analytics and Big Data • Using Big Data to Get Results. • What Constitutes Big Data • Exploring Unstructured Data • Understanding Text Analytics • Building New Models and Approaches to Support Big Data Session 4: Integrating Data, Real- Time Data and Implementing Big Data • Stages in Big Data Analysis • Fundamentals of Big Data Inte Integration • Streaming Data and Complex Event Processing • Making Big Data a Part of Your Operational Process • Ensuring Validity, Veracity, and Volatility of Big Data • Data Validity and Veracity • Data Volatility Session 5: Big Data Solutions and Data in Motion • Big Data as a Business Strategy Tool • Analysis in Real-Time: Time: Adding New Dimensions to the Cycle • The Needs for Data in Motion • Case 1: Using Streaming Data for Environmental Impact • Case 2: Using Streaming Data for Public Policy • Case 3: Use of Streaming Data in Health Care Industry • Case 4: Use of Streaming Data in Energy Industry • Case 5: Improving Customer Experience with Real Real-Time Text Analytics • Case 6: Using Real-time time Data in Finance Industry 3 TRAINING PROGRAM ON BIGDATA/HADOOP • Case 7: Using Real-Time Time Data for Insurance Fraud Prevention Module 3: Storing and Processing Data – HDFS and MapReduce Session 1: Storing Data in Hadoop • HDFS, HBase • Combining HDFS and HBase for Effective Data Storage • Choosing an Appropriate Hadoop Data Organization for Your Applications Session 2: Processing your data with map Reduce • Getting to Know MapReduce • Your First MapReduce Application • Designing MapReduce Implementations Session 3: Customizing MapReduce Execution • Controlling MapReduce Execution with Input Format • Reading Data Your Way with Custom Record Reader • Organizing Output Data with Custom Output Formats • Optimizing Your MapReduce Execution with a Combiner • Controlling Reducer Execution with Partitioners Session 4: Testing and Debugging map Reduce Applications • Unit Testing MapReduce Applications • Local Application Testing with Eclipse • Using Logging for Hadoop Testing • Reporting Metrics with Job Counters • Defensive Programming in MapReduce Session 5: Implementing MapReduce Wordcount Program Program- A case study Module 4: Increasing Efficiency with Hadoop Tools: Hive and Pig Session 1: Exploring Hive • Introducing Hive • Starting Hive • Executing Hive Queries from Files • Data Types • Hive Built-In Functions • Compressed Data Storage • Data Manipulation in Hive Session 2: Advanced Querying with Hive • Queries 4 TRAINING PROGRAM ON BIGDATA/HADOOP • Manipulating Column Values Using Functions • JOINS in Hive • Hive Best Practices • Performance-Tuning Tuning and Query Optimizations • Various Execution Types • Hive File and Record Formats • HiveThrift Service • Security in Hive Session 3: Analyzing Data with Pig • Introduction to Pig • Installing Pig • Properties of Pig • Running Pig • Pig Latin Application Flow • Beginning with Pig Latin • Relational Operators in Pig Module 5: Additional Hadoop Tools: Sqoop, Flume, YARN and Storm Session 1: Efficiently transferring Bulk data Using Sqoop • Introducing Sqoop • Using Sqoop 1 • Importing Data with Sqoop • Controlling Parallelism • Encoding NULL Values • Importing Data into Hive Tables • Importing Data into HBase • Exporting Data • Exporting Data into Subset of Columns • Drivers and Connectors in Sqoop • Sqoop Architecture Overview • Sqoop 2 Session 2: Flume • Introducing Flume • The Flume Architecture • Setting Up Flume • Building Flume Session 3: Beyond MapReduce – YARN • Why YARN? 5 TRAINING PROGRAM ON BIGDATA/HADOOP • The YARN Ecosystem • A YARN API Example • Mesos versus YARN Session 4: Storm on YARN • Storm and Hadoop • Overview of Storm • The Storm API • Storm on YARN • Installing Storm on YARN • An Example of Storm on YARN Module 6: Leveraging NoSQL, Hadoop Security, on Cloud and Real Time Session 1: Hello MoSQL • Two Simple Examples • Storing and Accessing Data • Storing and Accessing Data in MongoDB • Storing and Accessing Data in HBase • Storing and Accessing Data iin Apache Cassandra • Language Bindings for NoSQL Data Stores Session 2: Working with NoSQL • Creating Records • Accessing Data • Updating and Deleting Data • MongoDB Query Language Capabilities • Accessing Data from Column Column-Oriented Databases Like HBase Session 3: Hadoop Security • Hadoop Security Challenges • Authentication • Delegated Security Credentials • Authorization Session 4: Running Hadoop Applications on AWS • Getting to Know AWS • Options for Running Hadoop on AWS • Understanding the EMR–Hadoop Hadoop Relationship • Using AWS S3 • Automating EMR Job Flow Creation and Job Execution • Orchestrating Job Execution in EMR 6 TRAINING PROGRAM ON BIGDATA/HADOOP Session 5: Real Time Hadoop • Real-Time Time Hadoop Applications • Using Specialized Real-Time Time Hadoop Quer Query Systems • Using Hadoop-Based Event-Processing Processing Systems Trainer Profile Mr Biswajyoti Kar holds A.M.I.E from Institution of Engineers(India), Gokhale Road Calcutta & B.Sc in Physics from University Of Calcutta. He is a Senior Architect with over 19 yea years rs of rich experience with proven record in architecture, designing and implementing systems software software. He has experience of BIG Data Analytics, UNIX/Linux kernel mode development, Data structures and algorithm development in C C. His area of interest is building solutions around Big Data and Analytics and IP creation in Big Data space. Training Experience • Big Data, Hadoop Distributed file systems in Dell. • Algorithm and Data Structures in C/C++, UNIX/Linux advanced programming, shell scripting in Dell • Algorithm and Data Structures in C Proton solutions Project Experiences 1. BIG Data Work Leading a project that involved setting up of Hadoop distributed file system (HDFS) on Linux box to test the elasticity part of cloud computing. Bench-marking marking the hado hadoop op system for crunching terabytes of data using macro-programming macro model called PIG Latin. 2. Statistical analysis was done using R language. Parallel network file system Leading a project that involved setting up a pNFS client client-server server file and block layout configuration. 3. Figuring out pros and cons of each configuration in HPC and NAS environments. Big Data Analytics Providing consulting in the area of Big Data Analysis to credit rating agency *** 7
© Copyright 2025