Hadoop环境部署

 发布日期:2019-03-22 06:37:44  阅读次数:阅读数:153  来源:

一、配置网络环境  host

 

1.通过ip addr showifconfig命令查看  IP地址

2.修改主机名字:vi /etc/sysconfig/network

3.NETWORKING=yes

HOSTNAME=hdpvm1         ###这里配置,改成你想要的名字 eg:master

 

 

输入  :hostname   检测是否修改成功

不行就重启下

二、JDK安装

1.解压:

2.   配置   Java环境变量

vim ~/.bashrc
//在文件最后添加
export JAVA_HOME=/home/ubuntu/java/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH
source ~/.bashrc //刷新配置
java –version     //验证,查看 java 版本
 

 

三、关防火墙

 

#查看防火墙状态

service iptables status

#关闭防火墙

service iptables stop

#查看防火墙开机启动状态

chkconfig iptables --list

#关闭防火墙开机启动

chkconfig iptables off
 

 

 

四.  ssh免密码验证配置

1.首先在Master机器配置

1-1.进去.ssh文件: [root@Hadoop-Master ~]#cd ~/.ssh 如果没有该目录,先执行一次ssh localhost,不要手动创建,不然配置好还要输入密码

1-2.生成秘钥 ssh-keygen:ssh-keygen -t rsa,一路狂按回车键就可以了,最终生成(id_rsa,id_rsa.pub两个文件) 

1-3.生成authorized_keys文件:[spark@S1PA11 .ssh]$ cat id_rsa.pub >>authorized_keys  

1-4.在另两台台机器Slave1、Slave2也生成公钥和秘钥

1-5.将Slave1机器的id_rsa.pub文件copy到Master机器:[root@Slave1 .ssh]#scp id_rsa.pub root@Hadoop-Master:~/.ssh/id_rsa.pub_s1

1-6.将Slave2机器的id_rsa.pub文件copy到Master机器:[root@Slave1 .ssh]#scp id_rsa.pub root@Hadoop-Master:~/.ssh/id_rsa.pub_s2

1-7.此切换到Master机器合并authorized_keys;

 [root@Hadoop-Master .ssh]# cat id_rsa.pub_s1>> authorized_keys  

 [root@Hadoop-Master .ssh]# cat id_rsa.pub_s2>> authorized_keys   

1-8.将authorized_keyscopy到Slave1、Slave2机器:

 [root@Hadoop-Master.ssh]# scp authorized_keys root@Hadoop-Slave1:~/.ssh/

 [root@Hadoop-Master.ssh]# scp authorized_keys root@Hadoop-Slave2:~/.ssh/

1-9.现在将各台 .ssh/文件夹权限改为700,authorized_keys文件权限改为600(or 644)

chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys

1-10.验证ssh

[root@Hadoop-Master .ssh]# ssh Hadoop-Slave1
Welcome to aliyun Elastic Compute Service!
[root@Hadoop-Slave1 ~]# exit
logout
Connection to Hadoop-Slave1 closed.
[root@Hadoop-Master .ssh]# ssh Hadoop-Slave2
Welcome to aliyun Elastic Compute Service!
[root@Hadoop-Slave2 ~]# exit
logout
Connection to Hadoop-Slave2 closed.

 

五、Hadoop部署:

3.1 安装Hadoop

 

3.1.1 下载

(1)到Hadoop官网下载,我下载的是hadoop-3.0.0.tar.gz 
(2)同jdk类似,在家目录下(/home/ubuntu/)创建文件夹hadoop:mkdir hadoop,然后解压:tar –zxvf hadoop-3.0.0.tar.gz

 

3.1.2 配置环境变量

执行如下命令:

 

vim ~/.bashrc
//在文件最后添加
export HADOOP_HOME=/home/ubuntu/hadoop/hadoop-3.0.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source ~/.bashrc //刷新配置12345

 

3.1.3 创建文件目录

 

mkdir /home/ubuntu/hadoop/tmp 
mkdir /home/ubuntu/hadoop/dfs 
mkdir /home/ubuntu/hadoop/dfs/data
mkdir /home/ubuntu/hadoop/dfs/name1234

 

3.2 配置Hadoop

进入hadoop-3.0.0的配置目录:cd /home/ubuntu/hadoop/hadoop-3.0.0/etc/hadoop,依次修改hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml以及workers文件。

 

3.2.1 配置 hadoop-env.sh

 

vim hadoop-env.sh1

export JAVA_HOME=/home/ubuntu/java/jdk1.8.0_151 //在hadoop-env.sh中找到 JAVA_HOME,配置成对应安装路径

 

3.2.2 配置 core-site.xml (根据自己节点进行简单修改即可)

 

vim core-site.xml1

 

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://mpi-1:9000</value>
        <description>HDFS的URI,文件系统://namenode标识:端口号</description>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/ubuntu/hadoop/tmp</value>
        <description>namenode上本地的hadoop临时文件夹</description>
        </property>
        <property>
                <name>io.file.buffer.size</name>
                <value>131072</value>
        <description>Size of read/write buffer used in SequenceFiles</description>
        </property>
</configuration>1234567891011121314151617

 

3.2.3 配置 hdfs-site.xml

 

vim hdfs-site.xml1

 

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        <description>Hadoop的备份系数是指每个block在hadoop集群中有几份,系数越高,冗余性越好,占用存储也越多</description>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:///home/ubuntu/hadoop/dfs/name</value>
        <description>namenode上存储hdfs名字空间元数据 </description>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:///home/ubuntu/hadoop/dfs/data</value>
        <description>datanode上数据块的物理存储位置</description>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>mpi-1:50090</value>
        </property>
        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        <description>dfs.permissions配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除,请将它设置为true,或者直接将该property节点删除,因为默认就是true</description>
        </property>
</configuration>123456789101112131415161718192021222324252627282930

 

3.2.4 配置 mapred-site.xml

 

vim mapred-site.xml1

注:之前版本需要cp mapred-site.xml.template mapred-site.xml,hadoop-3.0.0直接是mapred-site.xml

 

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        <description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.</description>
            <final>true</final>
        </property>
        <property>
                <name>mapreduce.jobtracker.http.address</name>
                <value>mpi-1:50030</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>mpi-1:10020</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>mpi-1:19888</value>
        </property>
        <property>
                <name>mapred.job.tracker</name>
                <value>http://mpi-1:9001</value>
        </property>
</configuration>123456789101112131415161718192021222324

 

3.2.5 配置 yarn-site.xml

 

vim yarn-site.xml1

 

<configuration>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>mpi-1</value>
        <description>The hostname of the RM.</description>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.resourcemanager.address</name>
                <value>mpi-1:8032</value>
        <description>${yarn.resourcemanager.hostname}:8032</description>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>mpi-1:8030</value>
        </property>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>mpi-1:8031</value>
        </property>
        <property>
                <name>yarn.resourcemanager.admin.address</name>
                <value>mpi-1:8033</value>
        </property>
        <property>
                <name>yarn.resourcemanager.webapp.address</name>
                <value>mpi-1:8088</value>
        </property>
</configuration>1234567891011121314151617181920212223242526272829303132

 

3.2.6 配置 workers 文件(之前版本是slaves,注意查看)

 

vim workers
--------------------- 

原文:https://blog.csdn.net/secyb/article/details/80170804

 

 

 

4 运行Hadoop集群

4.1 格式化namenode

hdfs namenode -format //第一次使用hdfs,必须对其格式化(只需格式化一次)

 

 

 

六、最后 没起来的话,报这样的错:

Starting namenodes on [mpi-1]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [mpi-1]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
2019-03-14 11:40:27,925 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[root@mpi-1 hadoop]# start-yarn.sh
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
 

解决办法:

 

写在最前注意: 
 1、master,slave都需要修改start-dfs.sh,stop-dfs.sh,start-yarn.sh,stop-yarn.sh四个文件 
  2、如果你的Hadoop是另外启用其它用户来启动,记得将root改为对应用户

HDFS格式化后启动dfs出现以下错误:

 

[root@master sbin]# ./start-dfs.sh
Starting namenodes on [master]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [slave1]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
1234567891011

查度娘,见一仁兄的博客有次FAQ,故参考处理顺便再做一记录 
参考地址:https://blog.csdn.net/u013725455/article/details/70147331

在/hadoop/sbin路径下: 
将start-dfs.sh,stop-dfs.sh两个文件顶部添加以下参数

 

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
123456

还有,start-yarn.sh,stop-yarn.sh顶部也需添加以下:

 

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root


# Licensed to the Apache Software Foundation (ASF) under one or more
12345678

修改后重启 ./start-dfs.sh,成功!

[root@master sbin]# ./start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [master]
上一次登录:日 6月  3 03:01:37 CST 2018从 slave1pts/2 上
master: Warning: Permanently added 'master,192.168.43.161' (ECDSA) to the list of known hosts.
Starting datanodes
上一次登录:日 6月  3 04:09:05 CST 2018pts/1 上
Starting secondary namenodes [slave1]
上一次登录:日 6月  3 04:09:08 CST 2018pts/1 上

 

 

七、最终在浏览器检查    

9870端口查看(这里是9870,不是50070了) 
在浏览器输入master的IP:9870,结果如下: 
 
测试YARN 
在浏览器输入master的IP:8088,结果如下: 
 
注:将绑定IP或mpi-1改为0.0.0.0,而不是本地回环IP,这样,就能够实现外网访问本机的8088端口了。比如这里需要将yarn-site.xml中的

        <property>
                <name>yarn.resourcemanager.webapp.address</name>
                <value>mpi-1:8088</value>
        </property>1234

修改为:

 

        <property>
                <name>yarn.resourcemanager.webapp.address</name>
                <value>0.0.0.0:8088</value>
        </property>1234

另外,可以直接参考Hadoop官网的默认配置文件进行修改,比如hdfs-site.xml文件,里面有详细的参数说明。另外可以使用hdfs dfs命令,比如hdfs dfs -ls /进行存储目录的查看。

如果您有好的新闻与建议,欢迎点击文章投稿

    发表评论

    电子邮件地址不会被公开。

  • 内容

  • 网名