Setup master node as guide here.

Part 1: preparing the other nodes

First we need to make sure that each of our Raspberry Pi's can be accessed via an address. We need to configure the /etc/hosts file for each node that we want to use.

sudo nano /etc/hosts  

What you fill in here is very dependent on your network setup. Make sure that the IP addresses that are assigned to your Hadoop nodes are static (or at least very unlikely to change). A little example of which lines you can add to your /etc/hosts file. master slave-1 slave-2 slave-3  

You may want to verify that all your Pi nodes are using the same Java version. Things go a bit smoother that way. I am running Oracle Java 8 by the way.

java -version  

Every node needs to have a Hadoop user, so we can reuse a bunch of commands from the single node part.

sudo addgroup hadoop  
sudo adduser --ingroup hadoop hduser  
sudo adduser hduser sudo  

We want to be able to SSH as hduser to the other nodes. We start by install openssh-server if ssh server not installed.

sudo apt-get -y install openssh-server  

There is a big chance that this is already default (I am pretty sure it is installed in the default Raspbian).
To get everything going smoothly we want to enable passwordless SSH from the master to the slaves. Log in to the node that you used in single node post articles and switch to the hduser.

su hduser  
cat ~/.ssh/ >> ~/.ssh/authorized_keys  
chmod 0600 ~/.ssh/authorized_keys  
ssh-copy-id hduser@slave-1 (Repeat for each slave node)  
ssh hduser@slave1  

Repeat this step on every nodes. This last step ensures that we can successfully login on the other nodes and that their signature is added to the known hosts list. Make sure to also SSH to the node your working on, its signature has to be in the known_hosts file as well.

ssh hduser@<current_node>  

Now we want to copy our Hadoop installation to the other nodes. We're still working on our original Hadoop node. Start by zipping the Hadoop installation directory.

zip -r /opt/hadoop-2.7.3/  

The archive is about 210 megabytes of Hadoop data. Thanks to our passwordless-SSH setup we can easily transfer to this the other nodes.

scp hduser@slave-1:~  
ssh hduser@slave-1  
sudo unzip hadoop-2.7.3-configured -d /  
sudo chown -R hduser:hadoop /opt/hadoop-2.7.3/  
scp .bashrc hduser@slave-1:~/.bashrc
ssh hduser@slave-1
source ~/.bashrc

Repeat this for each node that you want to add.

Create hdfs/tmp folder for each nodes added

sudo mkdir -p /hdfs/tmp  
sudo chown hduser:hadoop /hdfs/tmp  
chmod 750 /hdfs/tmp  
hdfs namenode -format 

Part 2: Wiping HDFS

rm -rf /hdfs/tmp/*  

Part 3: Configuring Hadoop again (all nodes)

Each node can now run as a single node cluster, but that's not really the point of what we're doing here. In order to get them working together as a real cluster we need to do some more configuring stuff. I hope you're ready for some more XML files...

The unfortunate part that for most of these changes need to happen on all our nodes. We start be editing yarn-site.xml (found in /opt/hadoop-2.7.1/etc/hadoop/) and adding these properties to the XML file.


We also want to edit our core-site.xml on all our nodes so that it looks like this:


Part 4: Configuring the master (master only)

Two files must be edited on the master only: slaves and masters. The slaves file tells the master node which other nodes can be used for this cluster. Just add the nodes that you want to use for data processing to this file, perhaps even including the master node itself.

slave-n (all nodes)  

Create file masters with content:


(Again) make sure that your system can resolve these hostnames. This file only goes on the master node, the other nodes don't need it.

Part 5: booting the cluster from master node  

Part 6: shutting down the cluster from master node

To shut down the cluster we just need to run the stop... scripts in reverse order. We start by shutting down yarn and finally dfs.  


There we have it. You can expect more material somewhere in the future.