Monday, 31 March 2014

HADOOP Administration Course Content

HADOOP Administration Course Content

Hadoop Admin

  • How the Hadoop Distributed File System and Map Reduce work
  • What hardware configurations are optimal for Hadoop clusters
  • How to configure Hadoop’s options for best cluster performance
  • How to configure NameNode High Availability
  • How to configure NameNode Federation
  • How to configure the FairScheduler to provide service-level agreements for multiple users of a cluster
  • How to install and implement Kerberos-based security for your cluster
  • What system administration issues exist with other Hadoop projects such as Hive, Pig, and HBase

Introduction

  • A brief history of Hadoop
  • Core Hadoop components
  • Fundamental concepts

The Hadoop Distributed File System

  • HDFS features
  • HDFS design assumptions
  • Overview of HDFS architecture
  • Writing and reading files
  • NameNode considerations
  • An overview of HDFS security

MapReduce

  • What is MapReduce?
  • Features of MapReduce
  • Basic MapReduce concepts
  • Architectural overview
  • Failure recovery

Hadoop Ecosystem

  • What is the Hadoop ecosystem?
  • Integration tools
  • Analysis tools
  • Hive
  • Hbase
  • Sqoop
  • Zookeeper
  • Pig

Hadoop Cluster prerequisites

  • General planning considerations
  • Choosing the right hardware
  • Network considerations
  • Configuring nodes

Hadoop Installation

  • Installing Hadoop
  • Basic configuration parameters
  • Advanced Configuration

Advanced Configuration

  • Configuring rack awareness
  • Configuring Federation
  • Configuring High Availability

Managing and Scheduling Jobs

  • Managing running jobs
  • The FIFO Scheduler
  • The FairScheduler

Cluster Maintenance

  • Checking HDFS status
  • Copying data between clusters
  • Adding and removing cluster nodes
  • Rebalancing the cluster
  • NameNode Metadata backup
  • Cluster upgrading

No comments:

Post a Comment