Call Us at: +1616-469-0394 | Email:

Follow Us on: facebook_icon twitter-icon linkdein_icon





Hadoop Online Training


  • Identifying the business benefits of Hadoop
  • Surveying the Hadoop ecosystem
  • Selecting a suitable distribution

Parallelizing Program Execution

Meeting the challenges of parallel programming

  • Investigating parallelizable challenges: algorithms, data and information exchange
  • Estimating the storage and complexity of Big Data

Parallel programming with MapReduce

  • Dividing and conquering large-scale problems
  • Uncovering jobs suitable for MapReduce
  • Solving typical business problems

Implementing Real-World MapReduce Jobs

Applying the Hadoop MapReduce paradigm

  • Configuring the development environment
  • Exploring the Hadoop distribution
  • Creating the components of MapReduce jobs
  • Introducing the Hadoop daemons
  • Analyzing the stages of MapReduce processing:splitting, mapping, shuffling and reducing

Building complex MapReduce jobs

  • Selecting and employing multiple mappers and reducers
  • Leveraging built-in mappers, reducers and partitioners
  • Coordinating jobs with Oozie workflow scheduler
  • Streaming tasks through various programming languages

Customizing MapReduce

Solving common data manipulation problems

  • Executing algorithms:parallel sorts, joins and searches
  • Analyzing log files, social media data and e-mails

Implementing partitioners and combiners

  • Identifying network bound, CPU bound and disk I/O bound parallel algorithms
  • Reducing network traffic with combiners
  • Dividing the workload efficiently using partitioners
  • Collecting metrics with counters

Persisting Big Data with Distributed Data Stores

Making the case for distributed data

  • Achieving high performance data throughput
  • Recovering from media failure through redundancy

Interfacing with Hadoop Distributed File System (HDFS)

  • Breaking down the structure and organization of HDFS
  • Loading raw data and retrieving results
  • Reading and writing data programmatically
  • Partitioning text or binary data
  • Manipulating Hadoop SequenceFile types

Structuring data with HBase

  • Migrating from structured to unstructured storage
  • Applying NoSQL concepts with schema on read
  • Connecting to HBase from MapReduce jobs
  • Comparing HBase to other types of NoSQL data stores

Simplifying Data Analysis with Query Languages

Unleashing the power of SQL with Hive

  • Structuring data with the Hive MetaStore
  • Extracting, Transforming and Loading (ETL) data
  • Querying with HiveQL
  • Accessing Hive servers through JDBC
  • Extending HiveQL with User-Defined Functions (UDF)

Executing workflows with Pig

  • Developing Pig Latin scripts to consolidate workflows
  • Integrating Pig queries with Java
  • Interacting with data through the grunt console
  • Extending Pig with User-Defined Functions (UDF)

Managing and Deploying Big Data Solutions

Testing and debugging Hadoop code

  • Logging significant events for auditing and debugging
  • Debugging in local mode
  • Validating requirements with MRUnit

Deploying, monitoring and tuning performance

  • Deploying to a production cluster
  • Optimizing performance with administrative tools
  • Monitoring job execution through web user interfaces


Back to Top…