视频教程

官方文档

Hadoop HA 自动切换原理

 

IP HOSTNAME 启动服务
64.115.3.30 hadoop30 Namenode\Datanode\NodeManager\JournalNode\Zookeeper\ZKFC
64.115.3.40 hadoop40 Namenode\ResourceManager\Datanode\NodeManager\JournalNode\Zookeeper\ZKFC
64.115.3.50 hadoop50 Namenode\ResourceManager\Datanode\NodeManager\JournalNode\Zookeeper\ZKFC

注意事项:

1、SSH 免密登录要三台服务器互相配置

2、最小化安装的Centos7操作系统需要安装一下psmisc 这个软件包,ZKFC 切换active namenode时使用。

 

一、初始服务器

每台服务器都要修改,都是基本操作不细说下面举个例子

  • 配置主机IP
$ vim /etc/sysconfig/network-scripts/ifcfg-ens33

TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=1fce55bf-9a16-45ab-abce-386039d76568
DEVICE=ens33
ONBOOT=yes

IPADDR=64.115.3.30
GATEWAY=64.115.3.1
NETMASK=255.255.255.0
DNS1=114.114.114.114
DNS2=8.8.8.8
  • 配置主机名称
 hostnamectl set-hostname hadoop30
  • 配置域名解析
$ vim /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
64.115.3.30  hadoop30
64.115.3.40  hadoop40
64.115.3.50  hadoop50
  • 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
  • 配置ssh 免密登录

SSH 免密登录要三台服务器互相配置

#hadoop30、40、50 服务器上执行
ssh-keygen
ssh-copy-id  root@hadoop30
ssh-copy-id  root@hadoop40
ssh-copy-id  root@hadoop50

  • 安装依赖包
#这个软件是zkfc 在做自动切换的时候需要用的,最小化安装的centos7系统默认没有这个软件,造成自动切换失败。
yum install psmisc

二、下载安装包,配置环境变量,安装Zookeeper

我的软件安装目录是 /software

mkdir /software

下载地址:

JAVA JDK : https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html

HADOOP : https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

Zookeeper : 安装配置方式 传送门

注意:Zookeeper 安裝包下載的時候下载带-bin 的压缩包apache-zookeeper-3.7.0-bin.tar.gz

  • 解压
  tar -zxvf jdk-8u271-linux-x64.tar.gz
  tar -zxvf hadoop-3.3.0.tar.gz

  • 配置环境变量
$  vim /etc/profile 最后面追加以下内容
#java
export JAVA_HOME=/software/jdk1.8.0_271
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

#hadoop
export HADOOP_HOME=/software/hadoop-3.3.0
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
#zookeeper
export ZOOKEEPER_HOME=/software/apache-zookeeper-3.7.0-bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin
#脚本
export PATH=/root:$PATH
  • 编写两个小脚本,用于后续同步文件,查看进程
$ cd /root
$ vim xsync

#!/bin/bash
# 判断参数个数
if [ $# -lt 1  ]
then
  echo "没有传递参数!"
  exit;
fi

# 遍历群集所有机器
for host in 64.115.3.30  64.115.3.40  64.115.3.50
do
   echo "============================= $host ==========================="
   #遍历所有目录,挨个发送
   for file in $@
   do
       #判断文件是否存在
       if [ -e $file ]
          then
               #获取父目录
               pdir=$( cd -P $(dirname $file ); pwd )
               #获取当前文件的名称
               fname=$(basename $file)
               ssh $host "mkdir -p $pdir"
               rsync -av $pdir/$fname  $host:$pdir
           else
               echo "$file  不存在!"
         fi
    done
done
$ vim jpsall

#!/bin/bash
for host in hadoop30 hadoop40 hadoop50
do
 echo =============== $host ===============
 ssh $host    /software/jdk1.8.0_271/bin/jps
done

 

#加个执行权限
$ chmod +x xsync

三、修改Hadoop 群集配置

每项配置基本都有注释,很好理解。

  1. 配置hadoop.env.sh

    #设置为Java安装的根目录
    export JAVA_HOME=/software/jdk1.8.0_271
    export HADOOP_PID_DIR=/software/hadoop-3.3.0/tmp
    
    #将运行用户设置为root 用户
    export HDFS_NAMENODE_USER=root
    export HDFS_DATANODE_USER=root
    export HDFS_SECONDARYNAMENODE_USER=root
    export YARN_RESOURCEMANAGER_USER=root
    export YARN_NODEMANAGER_USER=root
    export HDFS_JOURNALNODE_USER=root
    export HDFS_ZKFC_USER=root
    
  2. 配置hdfs.site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <!--副本数 默认为 3 -->
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <!-- hdfs 集群名称-->
        <property>
          <name>dfs.nameservices</name>
          <value>mycluster</value>
        </property>
        <!--集群内namenode配置-->
        <property>
          <name>dfs.ha.namenodes.mycluster</name>
          <value>nn30,nn40,nn50</value>
        </property>
        <!-- namenode rpc 通信端口 -->
        <property>
          <name>dfs.namenode.rpc-address.mycluster.nn30</name>
          <value>hadoop30:8020</value>
        </property>
        <property>
          <name>dfs.namenode.rpc-address.mycluster.nn40</name>
          <value>hadoop40:8020</value>
        </property>
        <property>
          <name>dfs.namenode.rpc-address.mycluster.nn50</name>
          <value>hadoop50:8020</value>
        </property>
        <!-- namenode WEB 端配置 -->
        <property>
          <name>dfs.namenode.http-address.mycluster.nn1</name>
          <value>hadoop30:9870</value>
        </property>
        <property>
          <name>dfs.namenode.http-address.mycluster.nn2</name>
          <value>hadoop40:9870</value>
        </property>
        <property>
          <name>dfs.namenode.http-address.mycluster.nn3</name>
          <value>hadoop50:9870</value>
        </property>
        <!-- 指定NameNode元数据在JournalNode上的存放位置 -->
        <property>
          <name>dfs.namenode.shared.edits.dir</name>
          <value>qjournal://hadoop30:8485;hadoop40:8485;hadoop50:8485/mycluster</value>
        </property>
          <!-- 声明journalnode服务器存储目录-->
        <property>
          <name>dfs.journalnode.edits.dir</name>
          <value>/data/hadoop/journalnode</value>
        </property>
        <!-- HDFS客户端用于联系Active NameNode的Java类 -->
        <property>
          <name>dfs.client.failover.proxy.provider.mycluster</name>
          <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <!-- 在故障转移期间将用于隔离Active NameNode的脚本或Java类的列表 -->
        <property>
          <name>dfs.ha.fencing.methods</name>
          <value>sshfence</value> <!--SSH到Active NameNode并终止进程,同一时刻只能有一台服务对外相应-->
        </property>
        <property>
          <name>dfs.ha.fencing.ssh.private-key-files</name>
          <value>/root/.ssh/id_rsa</value> <!-- 用于ssh登录的私钥 -->
        </property>  
        <!-- 启用故障自动转移 -->
        <property>
          <name>dfs.ha.automatic-failover.enabled</name>
          <value>true</value>
        </property>
    
        <!-- 白名单
        <property>
           <name>dfs.hosts</name>
           <value>/software/hadoop-3.3.0/etc/hadoop/whitelist</value>
        </property>
        -->
        <!-- 黑名单 
        <property>
           <name>dfs.hosts.exclude</name>
           <value>/software/hadoop-3.3.0/etc/hadoop/blacklist</value>
        </property>
         -->
    </configuration>
    
  3. 配置core.site.xnl

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
      <!--定义namenode地址-->
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://mycluster</value>
        </property>
      <!-- 配置Zookeeper 地址 -->
      <property>
        <name>ha.zookeeper.quorum</name>
        <value>hadoop30:2181,hadoop40:2181,hadoop50:2181</value>
      </property>
      <!--修改hadoop存储数据的默认位置-->
      <property>
            <name>hadoop.tmp.dir</name>
            <value>/data/hadoop</value>
      </property>  
      <!-- 启用回收站,垃圾回收时间为5分钟 -->
        <property>
             <name>fs.trash.interval</name>
             <value>5</value>
        </property>
      <!-- 当前用户全设置成root -->
      <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
      </property>
    
      <!-- 不开启权限检查 -->
      <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
      </property>
    </configuration>
    
  4. 配置yarn.site.xml

    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <!-- Site specific YARN configuration properties -->
    <configuration>
            <!--指定MR走shuffle协议 -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <!-- yarn ha 启动 -->
        <property>
          <name>yarn.resourcemanager.ha.enabled</name>
          <value>true</value>
        </property>
        <property>
          <name>yarn.resourcemanager.cluster-id</name>
          <value>yarn-cluster</value>
        </property>
        <property>
          <name>yarn.resourcemanager.ha.rm-ids</name>
          <value>rm1,rm2</value>
        </property>
        <property>
          <name>yarn.resourcemanager.hostname.rm1</name>
          <value>hadoop40</value>
        </property>
        <property>
          <name>yarn.resourcemanager.hostname.rm2</name>
          <value>hadoop50</value>
        </property>
        <!-- yarn web 端地址端口 -->
        <property>
          <name>yarn.resourcemanager.webapp.address.rm1</name>
          <value>hadoop40:8088</value>
        </property>
        <property>
          <name>yarn.resourcemanager.webapp.address.rm2</name>
          <value>hadoop50:8088</value>
        </property>
        <!-- Zookeeper 地址端口 -->
        <property>
          <name>hadoop.zk.address</name>
          <value>hadoop30:2181,hadoop40:2181,hadoop50:2181</value>
        </property>
    
     <!-- 下面配置都是优化项,根据个人服务器配置来修改,只做HA实验的话可以不加。 -->
    
        <!-- 选择调度器,默认容量 -->
        <property>
            <description>The class to use as the resource scheduler.</description>
            <name>yarn.resourcemanager.scheduler.class</name>
            <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
        </property>
        <!-- ResourceManager 处理调度器请求的线程数量,默认 50;如果提交的任务数大于 50,可以
             增加该值,但是不能超过 3 台 * 4 线程 = 12 线程(去除其他应用程序实际不能超过 8) -->
        <property>
            <description>Number of threads to handle scheduler interface.</description>
            <name>yarn.resourcemanager.scheduler.client.thread-count</name>
            <value>8</value>
        </property>
    
        <!-- 是否让 yarn 自动检测硬件进行配置,默认是 false,如果该节点有很多其他应用程序,建议
             手动配置。如果该节点没有其他应用程序,可以采用自动 -->
        <property>
            <description>Enable auto-detection of node capabilities such as memory and CPU.</description>
            <name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
            <value>false</value>
        </property>
        <!-- 是否将虚拟核数当作 CPU 核数,默认是 false,采用物理 CPU 核数 -->
        <property>
            <description>Flag to determine if logical processors(such as
        hyperthreads) should be counted as cores. Only applicable on Linux
        when yarn.nodemanager.resource.cpu-vcores is set to -1 and
        yarn.nodemanager.resource.detect-hardware-capabilities is true.
        </description>
            <name>yarn.nodemanager.resource.count-logical-processors-as-cores</name>
            <value>false</value>
        </property>
        <!-- 虚拟核数和物理核数乘数,默认是 1.0 -->
        <property>
            <description>Multiplier to determine how to convert phyiscal cores to
        vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
        is set to -1(which implies auto-calculate vcores) and
        yarn.nodemanager.resource.detect-hardware-capabilities is set to true. 
        The number of vcores will be calculated as number of CPUs * multiplier.
            </description>
            <name>yarn.nodemanager.resource.pcores-vcores-multiplier</name>
            <value>1.0</value>
        </property>
        <!-- NodeManager 使用内存数,默认 8G,修改为 4G 内存 -->
        <property>
            <description>Amount of physical memory, in MB, that can be allocated 
        for containers. If set to -1 and
        yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
        automatically calculated(in case of Windows and Linux).
        In other cases, the default is 8192MB.
            </description>
            <name>yarn.nodemanager.resource.memory-mb</name>
            <value>4096</value>
        </property>
        <!-- nodemanager 的 CPU 核数,不按照硬件环境自动设定时默认是 8 个,修改为 4 个 -->
        <property>
            <description>Number of vcores that can be allocated
        for containers. This is used by the RM scheduler when allocating
        resources for containers. This is not used to limit the number of
        CPUs used by YARN containers. If it is set to -1 and
        yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
        automatically determined from the hardware in case of Windows and Linux.
        In other cases, number of vcores is 8 by default.</description>
            <name>yarn.nodemanager.resource.cpu-vcores</name>
            <value>4</value>
        </property>
        <!-- 容器最小内存,默认 1G -->
        <property>
            <description>The minimum allocation for every container request at the 
        RM in MBs. Memory requests lower than this will be set to the value of 
        this property. Additionally, a node manager that is configured to have 
        less memory than this value will be shut down by the resource manager.
            </description>
            <name>yarn.scheduler.minimum-allocation-mb</name>
            <value>1024</value>
        </property>
        <!-- 容器最大内存,默认 8G,修改为 2G -->
        <property>
            <description>The maximum allocation for every container request at the 
        RM in MBs. Memory requests higher than this will throw an
        InvalidResourceRequestException.
            </description>
            <name>yarn.scheduler.maximum-allocation-mb</name>
            <value>2048</value>
        </property>
        <!-- 容器最小 CPU 核数,默认 1 个 -->
        <property>
            <description>The minimum allocation for every container request at the 
        RM in terms of virtual CPU cores. Requests lower than this will be set to 
        the value of this property. Additionally, a node manager that is configured 
        to have fewer virtual cores than this value will be shut down by the 
        resource manager.
            </description>
            <name>yarn.scheduler.minimum-allocation-vcores</name>
            <value>1</value>
        </property>
        <!-- 容器最大 CPU 核数,默认 4 个,修改为 2 个 -->
        <property>
            <description>The maximum allocation for every container request at the 
        RM in terms of virtual CPU cores. Requests higher than this will throw an
        InvalidResourceRequestException.</description>
            <name>yarn.scheduler.maximum-allocation-vcores</name>
            <value>2</value>
        </property>
        <!-- 虚拟内存检查,默认打开,修改为关闭 -->
        <property>
            <description>Whether virtual memory limits will be enforced for
        containers.</description>
            <name>yarn.nodemanager.vmem-check-enabled</name>
            <value>false</value>
        </property>
        <!-- 虚拟内存和物理内存设置比例,默认 2.1 -->
        <property>
            <description>Ratio between virtual memory to physical memory when
        setting memory limits for containers. Container allocations are
        expressed in terms of physical memory, and virtual memory usage is 
        allowed to exceed this allocation by this ratio.
        </description>
            <name>yarn.nodemanager.vmem-pmem-ratio</name>
            <value>2.1</value>
        </property>
        <!-- 配置最大优先级 -->
        <property>
            <name>yarn.cluster.max-application-priority</name>
            <value>5</value>
        </property>
    
    </configuration>
    
  5. 配置mapred.site.xml

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <!--指定MapReduce 程序运行在Yarn 上 -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <!--继承环境变量 -->
        <property>
          <name>yarn.app.mapreduce.am.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
        <property>
          <name>mapreduce.map.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
        <property>
          <name>mapreduce.reduce.env</name>
          <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
        </property>
        <!--指定Job历史服务器地址-->
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>hadoop50:10020</value>
        </property>
        <!--指定Job历史服务器WEB端地址 http-->
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>hadoop50:19888</value>
        </property>
    </configuration>
    
  6. 配置workers 旧版本叫 slave

    hadoop30
    hadoop40
    hadoop50
    

     

四、初始化启动集群

  1. 创建数据目录

    mkdir -p /data/hadoop
    
  2. 复制hadoop到其余机器

    配置文件修改好之后,可以用编写好的xsync 脚本,将hadoop 复制到其余机器。

    #同步Hadoop配置
    xsync /software/hadoop-3.3.0
    #同步Hadoop数据目录
    xsync /data/hadoop
    
  3. 启动JournalNode

    # hadoop30 下启动 journalnode
    hadoop-daemon.sh start journalnode
    
    # hadoop40  下启动 journalnode
    hadoop-daemon.sh start journalnode
    
    # hadoop50  下启动 journalnode
    hadoop-daemon.sh start journalnode
    

  4. 选择一个 Namenode 做格式化

    # 格式化 hadoop30 下的 namenode
    hdfs namenode -format
    

  5. 启动格式化后的 Namenode

    # 启动 hadoop30 下的 namenode
    hadoop-daemon.sh start namenode
    

  6. 另外几台 Namenode 同步元数据

    # hadoop40 同步namenode元数据
    hdfs namenode -bootstrapStandby
    
    # hadoop50 同步namenode元数据
    hdfs namenode -bootstrapStandby
    

  7. 格式化 Zookeeper

    # hadoop30 执行格式化 ZooKeeper 集群
    hdfs zkfc -formatZK
    

  8. 启动Hadoop HA 集群

    start-dfs.sh
    start-yarn.sh
    

  9. 启动历史服务器

    #hadoop50 服务器执行
    mapred --daemon start historyserver
    

五、Web界面

Hadoop集群启动并运行后,如下所述检查组件的Web UI:

守护进程 网页界面 笔记
NameNode http:// nn_host:port / 默认的HTTP端口是9870。
ResourceManager http:// rm_host:port / 默认的HTTP端口为8088。
MapReduce JobHistory Server http:// jhs_host:port / 默认的HTTP端口是19888。

Namenode

Resourcemanager

JobHistory

六、验证集群

  • 关闭状态为 active 的 namenode

    另外两台 namenode其中一台 会成为 active 状态

    重新启动 hadoop-daemon.sh start namenode 第一台 namenode,会成为 standby 状态

  • 关闭状态为 active 主机的 ZKFC

    另外两台 ZKFC 其中一台会将本机的 namenode 提升为 active

    关闭 ZKFC 所在的 namenode 降级为 standby

    重新启动 ZKFC hadoop-daemon.sh start zkfc 依然为 standby

  • Yarn 高可用验证命令

     $ yarn rmadmin -getServiceState rm1
     active
    
     $ yarn rmadmin -getServiceState rm2
     standby
    

    关闭rm1 rm2 将成为acrive 状态。

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据

open