Mohd Amir Ahmad: October 2017

Friday, 20 October 2017

How to run MapReduce job in Hadoop:

Step 1: Download eclipse from here: Eclipse

Step 2: File -> New -> Project -> Java

Give Project name.

Step 3: Create your MapReduce application on Eclipse.

Step 4: Remember to create different Mapper, Reducer and Driver class.

Step 5: Right click on Project name -> Export -> jar -> Next -> Give jar file name -> Finish

Step 6: Keep your data file ready with you.

Step 7: Create a directory on HDFS

hdfs dfs -mkdir /input

Step 8: Copy your data file from local drive to HDFS

hdfs dfs -put [path of data file] /input

Step 9: Run the yarn command:

yarn jar [jar-file-name] [package-name.Driver-class-name] [input path] [output path]

EG: yarn jar wcount.jar com.amir.WordCount /input /output

A series of processing will take place. If any error is there then MapReduce job will not
run successfully, and it will give error.

Step 10: After the job is successfully processed. Check the ouptut path.
hdfs dfs -ls /output

By default, two files will be created in Output directory.
1) _SUCCESS
2) part-r-00000

Here 'r' signifies output from Reducer job.