Big Data & Hadoop (Administrator)

Enroll Now
Pay Now

About The Course

Big Data Hadoop Administration training provides participants ability in every one of the means important to work and keep up a Hadoop cluster , i.e. From Planning, Installation and Configuration through load adjusting, Security and Tuning, This course will give hands-on-prepration to this present reality challenges confronted by Hadoop administrators.The course educational modules takes after Apache Hadoop distribution.

Course Objectives

During the Hadoop Administration Online training, you'll master:

i) Hadoop Architecture, HDFS, Hadoop Cluster and Hadoop Administrator's role

ii) Plan and Deploy a Hadoop Cluster

iii) Load Data and Run Applications

iv) Configuration and Performance Tuning

v) How to Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster

vi) Cluster Security, Backup and Recovery

vii) Insights on Hadoop 2.0, Name Node High Availability, HDFS Federation, YARN, MapReduce v2

viii) Oozie, Hcatalog/Hive, and HBase Administration and Hands-On Project

Who should go for this course?

The Hadoop Administration course is best suited to professionals with IT Admin experience such as:

i) Linux / Unix Administrator

ii) Database Administrator

iii) Windows Administrator

iv) Infrastructure Administrator

v) System Administrator

What are the pre-requisites for this Course?

This course requires basic Linux knowledge and prior knowledge of Apache Hadoop is not required.

We also offers a complementary course on "Linux Fundamentals" to all the Hadoop Administration course participants.

How will I do practicals in Online Training?

Practical Set Up: We will help you set up a virtual machine in your system. For VM installation, 8GB RAM is required. You can also create an account with AWS EC2 and use 'Free tier usage' eligible servers. This is the most preferred option currently as most of the deployments are happening over the cloud and provides you a step-by-step procedure guide which is available on the LMS. Additionally, our 24*7 expert support team will be available to assist you around any queries.

Why Learn Hadoop Administration?

Big Data & Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 Forbes

McKinsey predicts that by 2018 there will be a shortage of 1.5M data experts

Mckinsey Report

Average Hadoop Admin Salary is $123k Salary Data

Which Case-Studies will be a part of the Course?

Towards end of the Course, you will get an opportunity to work on a live project, that will use the different Hadoop ecosystem components to work together in a Hadoop implementation to solve big data problems.

1. Setup a minimum 2 Node Hadoop Cluster

Node 1 - Namenode, JobTracker,datanode, tasktracker

Node 2 - Secondary namenode, datanode, tasktracker

2. Create a simple text file and copy to HDFS

Find out the location of the node to which it went.

Find in which data node the output files are written.

3. Create a large text file and copy to HDFS with a block size of 256 MB. Keep all the other files in default block size and find how block size has an impact on the performance.

4. Set a spaceQuota of 200MB for projects and copy a file of 70MB with replication=2

Identify the reason the system is not letting you copy the file?

How will you solve this problem without increasing the spaceQuota?

5. Configure Rack Awareness and copy the file to HDFS

Find its rack distribution and identify the command used for it.

Find out how to change the replication factor of the existing file.

The final certification project is based on real world use cases as follows:

Problem Statement 1:

1. Setup a Hadoop cluster with a single node or a 2 node cluster with all daemons like namenode, datanode, jobtracker, tasktracker, a secondary namenode that must run in the cluster with block size = 128MB.

2. Write a Namespace ID for the cluster and create a directory with name space quota as 10 and a space quota of 100MB in the directory.

3. Use the distcp command to copy the data to the same cluster or a different cluster, and create the list of data nodes participating in the cluster.

Problem statement 2:

1. Save the namespace of the Namenode, without using the secondary namenode, and ensure that the edit file merge, without stopping the namenode daemon.

2. Set include file, so that no other nodes can talk to the namenode.

3. Set the cluster re-balancer threshold to 40%.

4. Set the map and reduce slots to s4 and 2 respectively for each node.

1. Understanding Big Data and Hadoop

Learning Objectives:In this module, you will understand what is big data and Apache Hadoop. You will also learn how Hadoop solves the big data problems, about Hadoop cluster architecture, its core components & ecosystem, Hadoop data loading & reading mechanism and role of a Hadoop cluster administrator.

Topics:Introduction to big data, limitations of existing solutions, Hadoop architecture, Hadoop components and ecosystem, data loading & reading from HDFS, replication rules, rack awareness theory, Hadoop cluster administrator: Roles and responsibilities.

2. Hadoop Architecture and Cluster setup

Learning Objectives:In this module, you will understand different Hadoop components, understand working of HDFS, Hadoop cluster modes, configuration files, and more. You will also understand the Hadoop 1.0 cluster setup and configuration, setting up Hadoop Clients using Hadoop 1.0 and resolve problems simulated from real-time environment.

Topics:Hadoop server roles and their usage, Hadoop installation and initial configuration, deploying Hadoop in a pseudo-distributed mode, deploying a multi-node Hadoop cluster, Installing Hadoop Clients, understanding working of HDFS and resolving simulated problems.

3. Hadoop cluster Administration & Understanding MapReduce

Learning Objectives:In this module you will understand the working of the secondary namenode, working with Hadoop distributed cluster, enabling rack awareness, maintenance mode of Hadoop cluster, adding or removing nodes to your cluster in adhoc and recommended way, understand MapReduce programming model in context of Hadoop administrator and schedulers.

Topics:Understanding secondary namenode, working with Hadoop distributed cluster, Decommissioning or commissioning of nodes, understanding MapReduce, understanding schedulers and enabling them.

4. Backup, Recovery and Maintenance

Learning Objectives:In this module, you will understand day to day cluster administration tasks, balancing data in cluster, protecting data by enabling trash, attempting a manual failover, creating backup within or across clusters, safe guarding your metadata and doing metadata recovery or manual failover of NameNode recovery, learn how to restrict the usage of HDFS in terms of count and volume

Topics:Key admin commands like Balancer, Trash, Import Check Point, Distcp, data backup and recovery, enabling trash, namespace count quota or space quota, manual failover or metadata recovery.

5. Hadoop 2.0 Cluster: Planning and Management

Learning Objectives:In this module, you will gather insights around cluster planning and management, learn about the various aspects one needs to remember while planning a setup of a new cluster, capacity sizing, understanding recommendations and comparing different distributions of Hadoop, understanding workload and usage patterns and some examples from world of big data.

Topics:Planning a Hadoop 2.0 cluster, cluster sizing, hardware, network and software considerations, popular Hadoop distributions, workload and usage patterns, industry recommendations.

6. Hadoop 2.0 and it's features

Learning Objectives:In this module, you will learn more about new features of Hadoop 2.0, HDFS High Availability, YARN framework and job execution flow, MRv2, federation, limitations of Hadoop 1.x and setting up Hadoop 2.0 Cluster setup in pseudo-distributed and distributed mode

Topics:Limitations of Hadoop 1.x, features of Hadoop 2.0, YARN framework, MRv2, Hadoop high availability and federation, yarn ecosystem and Hadoop 2.0 Cluster setup.

7. Setting up Hadoop 2.X with High Availability and upgrading Hadoop

Learning Objectives:In this module, you will learn to setup Hadoop 2 with high availability, upgrading from v1 to v2, importing data from RDBMS into HDFS, understand why Oozie, Hive and Hbase are used and working of the components.

Topics:Configuring Hadoop 2 with high availability, upgrading to Hadoop 2, working with Sqoop, understanding Oozie, working with Hive, working with Hbase.

8. Project: Cloudera manager and Cluster setup, Overview on Kerberos

Learning Objectives:In this module, you will learn about Cloudera manager to setup Cluster, optimisations of Hadoop/Hbase/Hive performance parameters and understand basics on Kerberos. You will learn to setup Pig to use in local/distributed mode to perform data analytics.

Topics:Cloudera manager and cluster setup,Hive administration, HBase architecture, HBase setup, Hadoop/Hive/Hbase performance optimization, Pig setup and working with grunt, why Kerberos and how it helps.

Jonathon Smith

Wordpress Teacher

Answer all of your questions

We are resolved to give you a magnificent learning knowledge through world-class substance and best-in-class teachers. We will make a biological community through this preparation, that will empower you to change over circumstances into occupation offers by showing your aptitudes at the season of a meeting. We can help you on resume building and furthermore share imperative inquiries addresses once you are finished with the preparation. In any case, please comprehend that we are not into occupation positions.

Yes, you can enroll in the early bird batches and may join the classes later.


You will never lose any lecture. You can choose either of the two options: 1. View the recorded session of the class available in your LMS. 2. You can attend the missed session, in any other live batch.

All our instructors are working experts from the Industry and have no less than 10-12 yrs of significant involvement in different spaces. They are topic specialists and are prepared for giving web based preparing with the goal that members get an awesome learning knowledge.

Yes, access to the course material will be available for lifetime once you have enrolled into the course.

These classes will be completely Online Live Instructor-led Interactive sessions. You will have chat option available to discuss your queries with instructor during a class.

Depending on the batch you select, Your Live Classes will be held either every weekend for 5 weeks or for 15 weekdays. It would typically be 6-7 hours of effort needed each week post live sessions. This shall comprise hands-on assignments.

1 Mbps of internet speed is preferable to attend the LIVE classes.

You can pay by Credit Card, Debit Card or Net Banking from all the leading banks. For USD payment, you can pay by PayPal. We also have EMI options available.

You can give us a CALL at +1-828-5448230 OR email at

Customer Reviews