Hadoop Installation/Configuration Guide for Ubuntu Linux With Wordcount Example

In this tutorial I am going to guide you through setting up hadoop environment on Ubuntu.

hadoop-installation-guide-linux

Prerequisite

  • Ubuntu 19.04 (Any Stable Release)
  • Java 8 (Any Stable Release)
  • Hadoop 2.9.2 (Any Stable Release)

Download Link

Note – You can also download and install java and Hadoop through terminal Command.

Step 1 – Add Hadoop Group and User (Optional)

Creating a normal (root) account for Hadoop

Command: sudo adduser hduser

Command: sudo adduser hduser sudo

After user is created, re-login into ubuntu using hduser

Step 2 – Installing Java 8 and Hadoop 2.9.2 on Linux machine

First you have to download java 8 and Hadoop 2.9.2 in the given above download link and just paste the downloaded file to the ubuntu HOME directory.

  1. hadoop-2.9.2.tar.gz
  2. jre-8u221-linux-x64.tar.gz

You got these two files in your Ubuntu HOME directory we just need to extract this file in current directory just follow these steps.

Open Terminal and type following command: –

Command: tar -xvf jre-8u221-linux-x64.tar.gz

Command: tar -xvf hadoop-2.9.2.tar.gz

Step 3 – Setup Hadoop and JAVA Environment Variables

Open. bashrc file.

Command: nano ./.bashrc 

OR

Command: sudo gedit ~/.bashrc 

Now, add Hadoop and Java Path as shown below.

#Hadoop variables

export HADOOP_HOME=/home/your-username/hadoop-2.9.2

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

#Java variables

export JAVA_HOME=/home/your-username/jre1.8.0_221

Change your-username to your username Then, save the bash file and close it.

Note – Use Command: whoami to Display Your Username.

For applying all these changes to the current Terminal, execute the source command.

Command: source .bashrc

Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME environment variable.

Change the JAVA path as per install on your system.

Command: cd $HADOOP_HOME/etc/hadoop/

Command: nano hadoop-env.sh

Or

Command: sudo gedit hadoop-env.sh

In the file add the line and save it.

export JAVA_HOME=/home/your-username/jre1.8.0_221

Step 4 – Verifying java and Hadoop installation

To make sure that Java and Hadoop have been properly installed on your system and can be accessed through the Terminal, execute the java -version and hadoop version commands.

Command: java -version

Command: hadoop version

Step 5 – Setup SSH Certificate In Hadoop

Command: sudo apt-get install ssh (Optional Step if not already installed)

Command: ssh-keygen -t rsa -P ” -f ~/.ssh/id_rsa

Command: cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Command: chmod 0600 ~/.ssh/authorized_keys

Verify key based login

Command: ssh localhost

Command: exit

Step 6 – Setup Hadoop Configuration Files

We need to configure basic Hadoop single node clusters as per requirements of your Hadoop infrastructure.

Command: cd $HADOOP_HOME/etc/hadoop

Command: ls

All the Hadoop configuration files are located in hadoop-2.9.2/etc/hadoop directory.

Open core-site.xml and edit the property mentioned below inside configuration tag:

Command: gedit core-site.xml

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

Edit hdfs-site.xml and edit the property mentioned below inside configuration tag:

Command:  cd

Command:  mkdir -p /home/hadoop/hdfs/namenode

Command:  cd /home/hadoop/etc/hadoop

Command: gedit hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>file:/home//your-username/hadoop/hdfs/namenode</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>file:/home//your-username/hadoop/hdfs/datanode</value>

</property>

</configuration>

Edit the mapred-site.xml file and edit the property mentioned below inside configuration tag:

In some cases, mapred-site.xml file is not available. So, we have to create the mapred-site.xml file using mapred-site.xml template. Then save as mapred-site.xml.

Command: cp mapred-site.xml.template mapred-site.xml

Command: gedit mapred-site.xml

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

Edit yarn-site.xml and edit the property mentioned below inside configuration tag:

Command: gedit yarn-site.xml

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration> 

Step 7 – Format Namenode

Go to Hadoop home directory and format the NameNode.

Command: cd

Command: hdfs namenode -format

This formats the HDFS via NameNode. This command is only executed for the first time. Formatting the file system means initializing the directory specified by the dfs.name.dir variable.

Never format, up and running Hadoop filesystem. You will lose all your data stored in the HDFS.

Step 8 – Start Hadoop Cluster

Either you can start all daemons with a single command or do it individually.

Command: start-all.sh

The above command is a combination of start-dfs.sh, start-yarn.sh & mr-jobhistory-daemon.sh

Or you can run all the services individually as below:

Command: start-dfs.sh

Command: start-yarn.sh

To check that all the Hadoop services are up and running, run the below command.

Command: jps

Step 9 – Access Hadoop Services in Browser

Hadoop NameNode started on port 50070 default. Access your server on port 50070 in your favorite web browser.(Its system and OS based)

http://localhost:50070/

Now access port 8088 for getting the information about the cluster and all applications

http://localhost:8088/

Step 10 – Running A Map-Reduce Example Job on a single node Cluster

Command: cd $HADOOP_HOME

Command: touch input.txt

Command: gedit input.txt (add some sample text for MapReduce task)

Command: hdfs dfs -mkdir -p input

Command: hdfs dfs -put input.txt input

Command: hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount input output

Command: hdfs dfs -ls output

Command: hdfs dfs -cat output/part-r-00000

That’s all guys if you have any questions regarding the installation of Hadoop on Ubuntu Linux fill free to ask the question in the comment box below and don’t forget to like and share this article.

Leave a Reply

Your email address will not be published. Required fields are marked *