:  +91 8099 133 133 (India)


:  +91 8099 133 133 (India)

Course Description

Get trained in Big Data and Hadoop and get In-depth knowledge of core concepts will be covered in the course along with implementation on varied industry use-cases.


What You will learn:


  • Apache Hadoop, HDFS, and Hadoop Cluster Administration.
  • Internals of Hadoop 2.0, YARN, MapReduce, and HDFS.
  • Identifying the hardware and infrastructure for setting up Hadoop cluster.
  • Plan, Deploy, Load Data and run applications in Hadoop cluster.
  • Hadoop Cluster Configuration and Performance Analysis.
  • Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster.
  • Secure a deployment and understand Backup and Recovery.
  • Administration of Oozie, Hive, and HBase.
  • Backup options, diagnose and recover node failures in a Hadoop Cluster.
  • The internals of MapReduce and HDFS and how to write MapReduce code
  • Master the concepts of Hadoop Distributed File System and MapReduce framework
  • Best practices for Hadoop development, debugging, and implementation of workflows and common algorithms
  • How to leverage Hive, Pig, Sqoop, Flume, Oozie, and other Hadoop ecosystem projects
  • Creating custom components such as WritableComparables and InputFormats to manage complex data types
  • Writing and executing joins to link data sets in MapReduce
  • Advanced Hadoop API topics required for real-world data analysis
  • Learn to write Complex MapReduce programs
  • Program in YARN.
  • Perform Data Analytics using Pig and Hive
  • Implement HBase, MapReduce Integration, Advanced Usage and Advanced Indexing.
  • Schedule jobs using Oozie. 
  • Implement best Practices for Hadoop Development
  • Implement a Hadoop Project
  • Implement some Real world projects on Big Data Analytics


Upon completion of the course, attendees receive a certificate from GlobalEdx. This certificate will be a great differentiator in this field by providing employers and customers a good evidence of your skills and expertise.

  • Some prior experience in Core Java and good analytical skills
  • Basic knowledge of unix , sql scripting
  • Prior knowledge of Apache Hadoop is not required.

Why Learn Hadoop and Bigdata:

Hadoop is a combination of online running applications on a very huge scale built of commodity hardware. Hadoop is uncluttered source software which is handled by the Apache Software Foundation and it’s very helpful in storing and handling huge amounts of data inexpensively and professionally. Basically Hadoop collects huge packets of data and classifies this data using MapReduce.

If you are looking for Hadoop Jobs and you are a Hadoop professional then there are a lot of jobs about Hadoop and related technologies. There are many companies like Google, Yahoo, Apple, Hortonworks, ebaY, Facebook, ORACLE, IBM, Microsoft, and CISCO which are looking for skilled professionals having experience in this field and are capable of managing the Big Data in their companies. If you are a professional of Hadoop then you could be one of them. These companies such as Google, Facebook, and ORACLE etc are looking for the Hadoop Professionals at different levels such as database Administrators, Hadoop Professionals having complete operational skills, Hadoop engineers & also senior Hadoop engineers, big data Engineers, Hadoop developers and also Java Engineers (DSE Team).

Research of IDC shows that the Big Data market revenue’s will grow at 31.7 percent a year and it will hit the $23.8 billion mark in 2016. According to the latest research by market the Hadoop and Big Data world widely is expected to growth about 13.9$ billion by 2017.

Hadoop Developers: Hadoop Developer is a person who just loves programming and he must have knowledge about Core, Jave, SQL and other languages along with remarkable skills.

Companies Using Hadoop:

Amazon Web Services, IBM, Hortonworks, Cloudera, Intel, Microsoft, Pivotal, Twitter, Salesforce, AT&T Stumbleupon, Ebay, Yahoo, Facebook , Hulu etc.

Career Opportunities after Hadoop course:

Google trends tells exponential growth of Jobs in Hadoop. Check Top Job websites for Hadoop Jobs:

Indeed :11000+

Simplyhired: 12000+

LikedIn: 4500+ 8000+

Rating: 8.8/10- 96 reviews
Course Contents

What Hadoop?

  • Hadoopvs other traditional systems
  • Hadoop history
  • Hadoop core components and architecture


Hadoop Distributed File System (HDFS)

  • HDFS overview and design
  • Hadoop cluster planning
  • HDFS architecture
  • HDFS file storage
  • Component failures and recoveries
  • Block placement
  • Balancing the Hadoop cluster
  • Planning your Hadoop cluster
  • Hadoop software and hardware configuration
  • HDFS Block replication and rack awareness
  • Network topology for Hadoop cluster


Hadoop Deployment

  • Different Hadoop deployment types
  • Hadoop distribution options
  • Hadoop competitors
  • Hadoop installation procedure
  • Distributed cluster architecture
  • Hands-On session on Hadoop Installation


Working with HDFS

  • Differnet ways of accessing data in HDFS
  • HDFS operations and commands
  • CInternals of a file read in HDFS
  • Data copying with ‘distcp’
  • Hands-on session on working with HDFS


Map-Reduce Abstraction

  • What MapReduce?
  • MapReduce Evolution
  • MapReduce process and terminology
  • MapReduce components failures and recoveries
  • Hands-On practise session on working with MapReduce


Hadoop Cluster Configuration

  • Hadoop configuration overview
  • Hadoop configuration file
  • Configuration parameters and values
  • HDFS parameters MapReduce parameters
  • Hadoop environment setup C‘Include’ and ‘Exclude’ configuration files
  • Hands-On practise session on MapReduce Performance Tuning


Hadoop Administration and Maintenance

  • Namenode/Datanode directory structures and files
  • File system image and Edit log
  • The Checkpoint Procedure
  • Namenode failure and recovery procedure
  • Safe Mode CMetadata and Data backup
  • Potential problems and solutions / what to look for
  • Adding and removing nodes
  • Hands-On practise session on MapReduce File system Recovery


Hadoop Monitoring and Troubleshooting

  • Best practices of monitoring a Hadoop cluster
  • Using logs and stack traces for monitoring and troubleshooting
  • Using open-source tools to monitor Hadoop cluster Job Scheduling
  • How to schedule Hadoop Jobs on the same cluster
  • Default Hadoop FIFO Schedule
  • Fair Scheduler and its configuration


Hadoop Multi Node Cluster Setup on Amazon EC2

  • Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
  • Running Map Reduce Jobs on Cluster
  • High Availability Fedration, Yarn and Security


Object Oriented Programming & Linux

  • Concepts of Object Oriented Programming
  • Class, Object, Data Abstraction, Data Encapsulation, Inheritance and Polymorphism
  • Identifying Real world examples
  • Core JavaOverview of Java, Environment setup
  • Basic Syntax, Objects and Classes
  • Basic Data types, Variable types and Modifier types
  • Basic Operators, Loop control and Decision making
  • MethodsImplementing Abstractions, Encapsulation, Inheritance and Polymorphism
  • Characters, String, Arrays
  • Packages, Interfaces, Collections, Generics and Basics of Java design patterns
  • Files and I/O
  • Serialization
  • Exceptions
  • RegEx

Introduction to the Linux Operating System

UNIX Commands and Concepts


Introduction to Big Data

  • What is Big Data? Limitations of existing Solutions for Big data.
  • What is Hadoop? Advantages and Disadvantages
  • History of Hadoop
  • Why Hadoop and its Real Time Use cases?
  • Where exactly Hadoop fits into Application Architecture World?
  • Different Core Components of Hadoop 2.x and Hadoop Architecture
  • Different Ecosystems of Hadoop


Hadoop Architecture

  • What is Cluster? What is node?
  • Hadoop 2.x Cluster Architecture – Federation and High Availability
  • Cluster Modes
  • Common Configuration files



  • What is HDFS? HDFS Architecture
  • Features of HDFS
  • Hadoop daemons and its functionalities
  • Introduction to Data Storage – Blocks and Data Replication
  • Accessing HDFS – CLI, Admin commands and Java based approach


Hadoop MapReduce

  • Why MapReduce? Traditional way vs MapReduce way
  • Hadoop 2.x MapReduce Architecture and its Components
  • YARN MR Application Execution Flow and YARN Workflow
  • Anatomy of MapReduce program
  • Basics of MapReduce – Driver, Mapper and Reducer, Input Splits, Relation between
  • Input Splits and HDFS Blocks
  • Writing basic MR Job, Running MR jobs in Local mode and Distributed mode
  • Data types in MapReduce
  • Input Formatters and its associated Record Readers
    • Text Input Formatter
    • Key Value Text Input Formatter
    • Sequence File Input Formatter
    • Writing Custom Input Formatters and Its Record Readers
  • Output Formatters and its associated Record Writers
    • Text Output Formatter
    • Sequence File Output Formatter
    • Writing Custom Output Formatters and Its Record Writers
  • Combiners (Mini Reducers), Partitioners and uses of them
  • Importance of Distributed Cache, Counters and how to use Counters
  • MR Joins – Map Side and Reducer Side
  • Use of Secondary Sorting
    • Importance of Writable and Writable Comparable APIs
    • Writing MR Keys and Values
  • Use of Compression Techniques – Snappy, LZO and Zip
  • Schedulers and Importance of Schedulers – FIFO, Capacity and Fail
  • Debug MR Jobs in Local and Pseudo Cluster modes
  • MR Streaming and Pipelining
  • Identifying Performance Bottlenecks and fine tuning MR Jobs



  • Introduction to Apache Pig
  • MapReduce vs Apache Pig, SQL vs Apache Pig
  • Types of Use Cases where we can use Apache Pig
  • Different data types in Pig and its Limitations
  • Pig Latin Programming StructureModes of Execution in Pig
    • Grunt Shell
    • Script
    • EmbeddedTransformations in PigUDFs in Pig



  • Introduction, Use cases where Hive can be used and its Limitations
  • Hive Architecture and Components – Driver, Compiler and Semantic Analyzer
  • Installation and Configuration of Hive on Single node and Multi node Cluster
  • Hive Integration with Hadoop
  • Hive Query Language (Hive QL)
  • SQL vs Hive QL
  • Hive Execution – MapR and Local mode
  • Hive DLL and DML operations
  • Hive Metastore
    • Embedded Metastore Configuration
    • External Metastore Configuration
  • Hive Data types and Data Models
  • Partitions and Buckets
  • Hive Tables
    • Managed Tables
    • External Tables
  • Importing, Querying and Managing Data
  • Hive Script
  • Hive UDFs and Hive UDAFs
  • Hive SerDe



  • HBase Introduction, How HBase works
    • Column Families and Scans
  • Use cases where HBase can be used
  • HBase Installation
  • HBase Architecture
    • Storage
    • WriteAhead Log
    • Log Structured Merge Trees
  • HBase Introduction with MapR
  • HBase Basics
    • Schema Definition
    • Basic CRUD operations
  • HBase Clients
    • REST
    • Thrift
    • Avro
    • Web based UI
  • Tools you will learn during Hadoop Training – FLUME, SQOOP, OOZIE and Zookeeper
  • FLUME – During HDFS
    • Introduction
    • How FLUME will be used – Flume agents
    OOZIE – During MapReduce
    • Introduction – Batch vs Real Time
    • How Oozie will be used – Scheduling jobs and Executing workflow jobs
  • SQOOP – During Hive
    • Introduction
    • MySQL Client and Server Installation
    • Connecting to RDBMS using SQOOP
    • SQOOP CommandsZookeeper – During HBase
    • Introduction • Zookeeper Data Model
    • Zookeeper Services
  • Proof of Concept (POC)
  • Twitter Application – From Day1, start developing Twitter application by applying what we have learned in each phase
  • Real time use cases on each Topic
  • Highlights of the Training
  • Real time Project execution - hands on
Write a Review