Friday, 20 October 2017

How to run MapReduce job in Hadoop:

How to run MapReduce job in Hadoop:


Step 1: Download eclipse from here: Eclipse

Step 2: File -> New -> Project -> Java
            Give Project name.

Step 3: Create your MapReduce application on Eclipse.

Step 4: Remember to create different Mapper, Reducer and Driver class.

Step 5: Right click on Project name -> Export -> jar -> Next -> Give jar file name -> Finish

Step 6: Keep your data file ready with you.

Step 7: Create a directory on HDFS
            hdfs dfs -mkdir /input

Step 8: Copy your data file from local drive to HDFS
            hdfs dfs -put [path of data file] /input

Step 9: Run the yarn command:
            yarn jar [jar-file-name]  [package-name.Driver-class-name] [input path] [output path]

            EG: yarn jar wcount.jar com.amir.WordCount /input /output

            A series of processing will take place. If any error is there then MapReduce job will not
            run successfully, and it will give error.
         
Step 10: After the job is successfully processed. Check the ouptut path.
              hdfs dfs -ls /output

             By default, two files will be created in Output directory.
             1) _SUCCESS
             2) part-r-00000

             Here 'r' signifies output from Reducer job.