Streaming

Nov 5 21:24 2020 John Smith454 Print This Article

Learn from Hadoop streaming using Python with examples. know about how Hadoop streaming works, important commands used, and Hadoop pipes.  

Big Data Hadoop Streaming uses UNIX standard streams as the interface between Hadoop and your program so you can write a MapReduce program in any language that can write to standard output and read standard input. Hadoop offers a lot of methods to help non-Java development.

The primary mechanisms are Hadoop Pipes which gives a native C++ interface to Hadoop and Hadoop Streaming which permits any program that uses standard input and output to be used for map tasks and reduce tasks.
With this utility,Guest Posting one can create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer.

Hadoop Streaming supports any programming language that can read from standard input and write to standard output. For Hadoop streaming, one must consider the word-count problem. Codes are written for the mapper and the reducer in python script to be run under Hadoop.

 

 

Mapper Code !/usr/bin/python import sys for intellipaatline in sys.stdin: # Input takes from standard input intellipaatline = intellipaatline.strip() # Remove whitespace either side words = intellipaatline.split() # Break the line into words for myword in words: # Iterate the words list output print '%st%s' % (myword, 1) # Write the results to standard

Reducer Code #!/usr/bin/python from operator import item getter import sys current_word = "" current_count = 0 word = "" for intellipaatline in sys.stdin: # Input takes from standard input intellipaatline = intellipaatline.strip() # Remove whitespace either side word , count = intellipaatline.split('t', 1) # Split the input we got from mapper.py try: # Convert count variable to integer count = int(count) except ValueError: # Count was not a number, so silently ignore this line continue if current_word == word: current_count += count else: if current_word: print '%st%s' % (current_word, current_count) # Write result to standard o/p current_count = count current_word = word if current_word == word: # Do not forget to output the last word if needed! print '%st%s' % (current_word, current_count)

Mapper and Reducer codes should be saved in mapper.py and reducer.py in the Hadoop home directory.

How Hadoop Streaming Works?
  • Input is read from standard input and the output is emitted to standard output by Mapper and the Reducer. The utility creates a Map/Reduce job, submits the job to an appropriate cluster, and monitors the progress of the job until completion.
  • Every mapper task will launch the script as a separate process when the mapper is initialized after a script is specified for mappers. Mapper task inputs are converted into lines and fed to the standard input and Line oriented outputs are collected from the standard output of the procedure Mapper and every line is changed into a key, value pair which is collected as the outcome of the mapper.
  • Each reducer task will launch the script as a separate process and then the reducer is initialized after a script is specified for reducers. As the reducer task runs, reducer task input key/values pairs are converted into lines and feds to the standard input (STDIN) of the process.
  • Each line of the line-oriented outputs is converted into a key/value pair after it is collected from the standard output (STDOUT) of the process, which is then collected as the output of the reducer.

What is Hadoop YARN? Check out the best big data Hadoop course in Noida and learn more!

Hadoop Pipes

It is the name of the C++ interface to Hadoop MapReduce. Unlike Hadoop Streaming which uses standard I/O to communicate with the map and reduce code Pipes uses sockets as the channel over which the task tracker communicates with the process running the C++ map or reduce function. JNI is not used.

That’s all for this section of the Hadoop tutorial. Let’s move on to the next one on Pig!

Source: Free Guest Posting Articles from ArticlesFactory.com

About Article Author

John Smith454
John Smith454

From the best Industry Experts of Big Data in intellipaat  

View More Articles