Setup master node as guide here.
Part 1: preparing the other nodes
First we need to make sure that each of our Raspberry Pi's can be accessed via an address. W
sudo nano /etc/hosts
What you fill in here is very dependent on your network setup. Make sure that the IP addresses that are assigned to your Hadoop nodes are static (or at least very unlikely to change). A little example of which lines you can add to your /etc/hosts file.
192.168.0.188 master 192.168.0.189 slave-1 192.168.0.190 slave-2 192.168.0.191 slave-3
You may want to verify that all your Pi nodes are using the same Java version. Things go a bit smoother that way. I am running Oracle Java 8 by the way.
Every node needs to have a Hadoop user, so we can reuse a bunch of commands from the single node part.
sudo addgroup hadoop sudo adduser --ingroup hadoop hduser sudo adduser hduser sudo
We want to be able to SSH as hduser to the other nodes. We start by install openssh-server if ssh server not installed.
sudo apt-get -y install openssh-server
There is a big chance that this is already default (I am pretty sure it is installed in the default Raspbian).
To get everything going smoothly we want to enable passwordless SSH from the master to the slaves. Log in to the node that you used in single node post articles and switch to the hduser.
su hduser ssh-keygen cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys ssh-copy-id hduser@slave-1 (Repeat for each slave node) ssh hduser@slave1
Repeat this step on every nodes. This last step ensures that we can successfully login on the other nodes and that their signature is added to the known hosts list. Make sure to also SSH to the node your working on, its signature has to be in the known_hosts file as well.
Now we want to copy our Hadoop installation to the other nodes. We're still working on our original Hadoop node. Start by zipping the Hadoop installation directory.
zip -r hadoop-2.7.3-configured.zip /opt/hadoop-2.7.3/
The archive is about 210 megabytes of Hadoop data. Thanks to our passwordless-SSH setup we can easily transfer to this the other nodes.
scp hadoop-2.7.3-configured.zip hduser@slave-1:~ ssh hduser@slave-1 sudo unzip hadoop-2.7.3-configured -d / sudo chown -R hduser:hadoop /opt/hadoop-2.7.3/ rm hadoop-2.7.3-configured.zip exit scp .bashrc hduser@slave-1:~/.bashrc
Repeat this for each node that you want to add.
Create hdfs/tmp folder for each nodes added
sudo mkdir -p /hdfs/tmp sudo chown hduser:hadoop /hdfs/tmp chmod 750 /hdfs/tmp hdfs namenode -format
Part 2: Wiping HDFS
rm -rf /hdfs/tmp/*
Part 3: Configuring Hadoop again (all nodes)
Each node can now run as a single node cluster, but that's not really the point of what we're doing here. In order to get them working together as a real cluster we need to do some more configuring stuff. I hope you're ready for some more XML files...
The unfortunate part that for most of these changes need to happen on all our nodes. We start be editing yarn-site.xml (found in /opt/hadoop-2.7.1/etc/hadoop/) and adding these properties to the XML file.
<property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8040</value> </property>
We also want to edit our core-site.xml on all our nodes so that it looks like this:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/hdfs/tmp</value> </property> </configuration>
Part 4: Configuring the master (master only)
Two files must be edited on the master only: slaves and masters. The slaves file tells the master node which other nodes can be used for this cluster. Just add the nodes that you want to use for data processing to this file, perhaps even including the master node itself.
master slave-1 slave-2 slave-n (all nodes)
Create file masters with content:
(Again) make sure that your system can resolve these hostnames. This file only goes on the master node, the other nodes don't need it.
Part 5: booting the cluster from master node
Part 6: shutting down the cluster from master node
To shut down the cluster we just need to run the stop... scripts in reverse order. We start by shutting down yarn and finally dfs.
There we have it. You can expect more material somewhere in the future.