This course is for system administrators and others responsible for managing Apache Hadoop clusters in production or development environments. Prior knowledge of Apache Hadoop is not required. This course is designed to provide knowledge and skills need for you to become a successful Hadoop Administrator.
There will be many challenging, practical and focused hands-on exercises for the learners. On completion of this course, you will be able to solve real world problems in industry w.r.t Hadoop Cluster.
You will learn:
- Apache Hadoop, HDFS, and Hadoop Cluster Administration.
- Internals of Hadoop 2.0, YARN, MapReduce, and HDFS.
- Identifying the hardware and infrastructure for setting up Hadoop cluster.
- Plan, Deploy, Load Data and run applications in Hadoop cluster.
- Hadoop Cluster Configuration and Performance Analysis.
- Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster.
- Secure a deployment and understand Backup and Recovery.
- Administration of Oozie, Hive, and HBase.
- Backup options, diagnose and recover node failures in a Hadoop Cluster.
Upon completion of the course, attendees receive a certificate from GlobalEdx. This certificate will be a great differentiator in this field by providing employers and customers a good evidence of your skills and expertise.
This course is appropriate for system administrators who will be setting up or maintaining a Hadoop cluster. Good knowledge of Linux is required and some fundamental Linux scripting knowledge is required.Some basic Linux system administration experience is a prerequisite for this training session. Prior knowledge of Hadoop is not required.
- Why Hadoop Administrattion course?
Administrators are leading the transition from traditional databases and data warehouses to more flexible, scalable systems built on Apache Hadoop. We provide all the training needeed for you to drive Big Data strategy from Hadoop implementation and cluster monitoring with complete exposure to security and scaling at massive speed.
- Why lead the Hadoop Movement?
Hadoop is quickly becoming the most important component in the data stack, enabling open management and rapid processing of data at petabyte scale. Prepare to lead at every step in the data value chain and develop the skills to transform complex data sets and enable high-value analytics alongside core systems maintenance and utilization.
- Who should attend this course?
- Professionals aspiring to make a career in Big Data Analytics using Hadoop Framework.
- System Administrators and Support Engineers who will maintain and troubleshoot Hadoop clusters in production or development environments.
Introduction to Hadoop
- What Hadoop?
- Hadoop vs other traditional systems
- Hadoop history
- Hadoop core components and architecture
Hadoop Distributed File System (HDFS)
- HDFS overview and design
- Hadoop cluster planning
- HDFS architecture
- HDFS file storage
- Component failures and recoveries
- Block placement
- Balancing the Hadoop cluster
- Planning your Hadoop cluster
- Hadoop software and hardware configuration
- HDFS Block replication and rack awareness
- Network topology for Hadoop cluster
- Different Hadoop deployment types
- Hadoop distribution options
- Hadoop competitors
- Hadoop installation procedure
- Distributed cluster architecture
- Hands-On session on Hadoop Installation
Working with HDFS
- Differnet ways of accessing data in HDFS
- HDFS operations and commands
- Internals of a file read in HDFS
- Data copying with ‘distcp’
- Hands-on session on working with HDFS
- What MapReduce?
- MapReduce Evolution
- MapReduce process and terminology
- MapReduce components failures and recoveries
- Hands-On practise session on working with MapReduce
Hadoop Cluster Configuration
- Hadoop configuration overview
- Hadoop configuration file
- Configuration parameters and values
- HDFS parameters MapReduce parameters
- Hadoop environment setup
- ‘Include’ and ‘Exclude’ configuration files
- Hands-On practise session on MapReduce Performance Tuning
Hadoop Administration and Maintenance
- Namenode/Datanode directory structures and files
- File system image and Edit log
- The Checkpoint Procedure
- Namenode failure and recovery procedure
- Safe Mode
- Metadata and Data backup
- Potential problems and solutions / what to look for
- Adding and removing nodes
- Hands-On practise session on MapReduce File system Recovery
Hadoop Monitoring and Troubleshooting
- Best practices of monitoring a Hadoop cluster
- Using logs and stack traces for monitoring and troubleshooting
- Using open-source tools to monitor Hadoop cluster
- How to schedule Hadoop Jobs on the same cluster
- Default Hadoop FIFO Schedule
- Fair Scheduler and its configuration
Hadoop Multi Node Cluster Setup on Amazon EC2
- Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
- Running Map Reduce Jobs on Cluster
- High Availability Fedration, Yarn and Security