2016-05-03

Hadoop

Comments on the Hadoop2.7.2 setup official documentation

Tags： BigData

Hadoop provides an offical page describing how to set up a single node cluster. But there are some minor defects in that page and probably confuse users if they are completely new to Hadoop. I’d like to point those defects out to help new Hadoop users.

$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
$ cat output/*

The piece of code above is in the offical page to run the first example code on Hadoop. The purpose of the code is to make a directory called ‘input’ in hdfs and copy etc/hadoop from local file system into the ‘input’ directory, run the job in hdfs and cat the result in hdfs.

Here it probably confused the new users because it mixed the commands in local file system and in hdfs. If new user just followes the commands, they will fail to run the example. it should be as following:

$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>
$ bin/hdfs dfs -mkdir /user/<username>/input
$ bin/hdfs dfs -put etc/hadoop input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
$ bin/hdfs dfs -cat /user/<username>/output/*