dncp_block_verification日志文件增加了HDFS的大小

时间:2021-09-19 14:59:52

We are using cloudera CDH 5.3. I am facing a problem wherein the size of "/dfs/dn/current/Bp-12345-IpAddress-123456789/dncp-block-verification.log.curr" and "dncp-vlock-verification.log.prev" keeps increasing to TBs within hours. I read in some of the blogs and they mention it is an HDFS bug. A temporary solution to this problem is to stop the datanode services and delete these files. But we have observed that the log file increases in size on either of the datanodes (even on the same node after deleting it). Thus, it requires continuous monitoring.

我们正在使用cloudera CDH 5.3。我遇到一个问题,其中“/dfs/dn/current/Bp-12345-IpAddress-123456789/dncp-block-verification.log.curr”和“dncp-vlock-verification.log.prev”的大小不断增加TBs在几小时内。我读了一些博客,他们提到这是一个HDFS错误。此问题的临时解决方案是停止datanode服务并删除这些文件。但是我们观察到日志文件在任一数据节点上都增加了(即使在删除它之后也在同一节点上)。因此,它需要持续监测。

Does anyone have a permanent solution to this problem?

有没有人有这个问题的永久解决方案?

1 个解决方案

#1


1  

One solution, although slightly drastic, is to disable the block scanner entirely, by setting into the HDFS DataNode configuration the key dfs.datanode.scan.period.hours to 0 (default is 504 in hours). The negative effect of this is that your DNs may not auto-detect corrupted block files (and would need to wait upon a future block reading client to detect them instead); this isn't a big deal if your average replication is 3-ish, but you can consider the change as a short term one until you upgrade to a release that fixes the issue.

一种解决方案虽然略显激烈,但是完全禁用块扫描器,通过将密钥dfs.datanode.scan.period.hours设置为HDFS DataNode配置为0(默认为504小时)。这样做的负面影响是您的DN可能无法自动检测损坏的块文件(并且需要等待将来的块读取客户端来检测它们);如果您的平均复制是3-ish,这不是什么大问题,但您可以将更改视为短期更改,直到您升级到修复该问题的版本。

Note that this problem will not happen if you upgrade to the latest CDH 5.4.x or higher release versions, which includes the HDFS-7430 rewrite changes and associated bug fixes. These changes have done away with the use of such a local file, thereby removing the problem.

请注意,如果升级到最新的CDH 5.4.x或更高版本(包括HDFS-7430重写更改和相关的错误修复),则不会发生此问题。这些更改已经废除了使用这样的本地文件,从而消除了这个问题。

#1


1  

One solution, although slightly drastic, is to disable the block scanner entirely, by setting into the HDFS DataNode configuration the key dfs.datanode.scan.period.hours to 0 (default is 504 in hours). The negative effect of this is that your DNs may not auto-detect corrupted block files (and would need to wait upon a future block reading client to detect them instead); this isn't a big deal if your average replication is 3-ish, but you can consider the change as a short term one until you upgrade to a release that fixes the issue.

一种解决方案虽然略显激烈,但是完全禁用块扫描器,通过将密钥dfs.datanode.scan.period.hours设置为HDFS DataNode配置为0(默认为504小时)。这样做的负面影响是您的DN可能无法自动检测损坏的块文件(并且需要等待将来的块读取客户端来检测它们);如果您的平均复制是3-ish,这不是什么大问题,但您可以将更改视为短期更改,直到您升级到修复该问题的版本。

Note that this problem will not happen if you upgrade to the latest CDH 5.4.x or higher release versions, which includes the HDFS-7430 rewrite changes and associated bug fixes. These changes have done away with the use of such a local file, thereby removing the problem.

请注意,如果升级到最新的CDH 5.4.x或更高版本(包括HDFS-7430重写更改和相关的错误修复),则不会发生此问题。这些更改已经废除了使用这样的本地文件,从而消除了这个问题。