HADOOP

HADOOP is a Java-based programming framework that supports the processing of large data sets.

HADOOP Online Training

Bigdata HADOOP makes it possible to run applications on systems with thousands of nodes Engage thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating constant in case of a node failure.

What you will learn

Hadoop Online Training classes will help to learn their Flexible timings. As all our Trainers is real time professionals our trainer will cover all the real time scenarios.

  • What is Big Data & Why Hadoop

HADOOP Course Content

 

  • What is Big Data & Why Hadoop
  • What is Big Data
  • Characteristics of big data
  • Traditional data management systems and their limitations
  • What is Hadoop
  • Why is Hadoop used
  • The Hadoop eco-system
  • Big data/Hadoop use cases
  • HDFS (Hadoop Distributed File System) and installing Hadoop on single node
  • Content Learning goal
  • HDFS Architecture
  • HDFS internals and use cases
  • HDFS Daemons
  • Files and blocks
  • Namenode memory concerns
  • Secondary namenode
  • HDFS access options
  • Installing and configuring Hadoop
  • Hadoop daemons
  • Basic Hadoop commands
  • Hands-on exercise
  • Advanced HDFS concepts
  • HDFS workshop
  • HDFS API
  • How to use configuration class
  • Using HDFS in MapReduce and programatically
  • HDFS permission and security
  • Additional HDFS tasks
  • HDFS web-interface
  • Hands-on exercise
  • Cloud computing overview and installing Hadoop on multiple nodes
  • Cloud computing overview
  • SaaS/PaaS/IaaS
  • Characteristics of cloud computing
  • Cluster configurations
  • Configuring Masters and Slaves
  • Introduction to MapReduce
  • MapReduce basics
  • Functional programming concepts
  • List processing
  • Mapping and reducing lists
  • Putting them together in MapReduce
  • Word Count example application
  • Understanding the driver, mapper and reducer
  • Closer look at MapReduce data flow
  • Additional MapReduce functionality
  • Fault tolerance
  • Hands-on exercises

 

  • MapReduce workshop
  • Hands-on work on MapReduce
  • Advanced MapReduce concepts
  • Understand combiners & partitioners
  • Understand input and output formats
  • Distributed cache
  • Understanding counters
  • Chaining, listing and killing jobs
  • Hands-On Exercise
  • Using Pig and Hive for data analysis
  • Pig program structure and execution process
  • Joins & filtering using Pig
  • Group & co-group
  • Schema merging and redefining functions
  • Pig functions
  • Understanding Hive
  • Using Hive command line interface
  • Data types and file formats
  • Basic DDL operations
  • Schema design
  • Hands-on examples
  • Introduction to HBase, Zookeeper & Sqoop
  • HBase overview, architecture & installation
  • HBase admin: test
  • HBase data access
  • Overview of Zookeeper
  • Sqoop overview and installation
  • Importing and exporting data in Sqoop
  • Hands-on exercise
  • Introduction to Oozie, Flume and advanced Hadoop concepts
  • Overview of Oozie and Flume
  • Oozie features and challenges
  • How does Flume work
  • Connecting Flume with HDFS
  • YARN
  • HDFS Federation
  • Authentication and high availability in Hadoop
  • Building a web-log analysis POC using MapReduce & project discussion
  • Designing structures for POC
  • Developing MapReduce code
  • Push data using Flume into HDFS
  • Run MapReduce code
  • Analyse the output
Back to top