本帖最后由 Seven_2017 于 2021-6-17 11:05 编辑
一、问题描述
由于目前hadoop数仓和传统数仓是采用hdfs hdfs -put(hadoop fs -put)方式入到hdfs路径,非java方法。执行put命令报错如下:
ERR>2021-06-11 09:34:30,000 WARN hdfs.DFSUtilClient: Namenode for hacluster remains unresolved for ID 608. Check your hdfs-site.xml file to ensure namenodes are configured properly.
ERR>2021-06-11 09:34:30,002 WARN hdfs.DFSUtilClient: Namenode for hacluster remains unresolved for ID 609. Check your hdfs-site.xml file to ensure namenodes are configured properly.
...
ERR>2021-06-11 09:38:39,284 INFO retry.RetryInvocationHandler: Invalid host name: local host is: (unknown); destination host is: "-":25000; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over -:25000 after 19 failover attempts. Trying to failover after sleeping for 10276ms.
ERR>2021-06-11 09:38:49,560 WARN ha.BlackListingFailoverProxyProvider: All proxies are added to blacklist: [-:25000, -:25000] ,hence clearing blackListing
ERR>test: Invalid host name: local host is: (unknown); destination host is: "-":25000; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost
ERR>/*.sh: line 13: kill: (20551) - No such process
从报错信息可以获取的信息有
1.hdfs-site.xml没有配置对应的namenode
2.namenode的dns域名unknown
3.本机访问ip被列入黑名单blacklist
二、处理过程
1.检查/etc/hosts 文件有没有配置对应的DNS(无异常)
2.检查hdfs-site.xml有没有配置正确的namenode(无异常)
3.ping主节点的NameNode的IP(无异常)
4.telnet主节点的NameNode的IP 25000端口(无异常)
5.ping/telnet主节点NameNode的域名(IP对应的网址)-- 异常
三、解决方案
一看ping主节点IP通,但是域名不通,就很奇怪...且部分主机出现这种情况
集群配置和黑名单都是正常报错,因为连不上就会列如黑名单,重新连上黑名单即失效,所以是个误导报错。
解决:原因为网络升级割接,影响部分DNS
于是屏蔽对应主机的DN服务器地址:/etc/resolve文件里的DNS服务器地址。
测试ping、telnet主节点域名,均可访问。至此问题解决。
|