Hadoop hdfs副本存储和纠删码(Erasure Coding)存储优缺点

时间:2023-03-09 01:13:24
Hadoop hdfs副本存储和纠删码(Erasure Coding)存储优缺点

body { margin: 0 auto; font: 13px / 1 Helvetica, Arial, sans-serif; color: rgba(68, 68, 68, 1); padding: 5px }
h1, h2, h3, h4 { color: rgba(17, 17, 17, 1); font-weight: 400 }
h1, h2, h3, h4, h5, p { margin-bottom: 16px; padding: 0 }
h1 { font-size: 28px }
h2 { font-size: 22px; margin: 20px 0 6px }
h3 { font-size: 21px }
h4 { font-size: 18px }
h5 { font-size: 16px }
a { color: rgba(0, 153, 255, 1); margin: 0; padding: 0; vertical-align: baseline }
a:link, a:visited { text-decoration: none }
a:hover { text-decoration: underline }
ul, ol { padding: 0; margin: 0 }
li { line-height: 24px; margin-left: 30px }
li ul, li ul { margin-left: 24px }
ul, ol { font-size: 14px; line-height: 20px; max-width: 98% }
p { font-size: 14px; line-height: 20px; max-width: 98%; margin-top: 3px }
pre { padding: 0 4px; max-width: 98%; white-space: pre; word-wrap: normal; overflow: auto; font-family: Consolas, Monaco, Andale Mono, monospace; line-height: 1.5; font-size: 13px; border: 1px solid rgba(221, 221, 221, 1); background-color: rgba(247, 247, 247, 1); border-radius: 3px }
code { font-family: Consolas, Monaco, Andale Mono, monospace; line-height: 1.5; font-size: 13px; border: 1px solid rgba(221, 221, 221, 1); background-color: rgba(247, 247, 247, 1); border-radius: 3px }
code pref { color: rgba(255, 0, 0, 1) }
pre code { border: 0 }
aside { display: block; float: right; width: 390px }
blockquote { border-left: 0.5em solid rgba(64, 170, 83, 1); padding: 0 2em; margin-left: 0; max-width: 98% }
blockquote cite { font-size: 14px; line-height: 20px; color: rgba(191, 191, 191, 1) }
blockquote cite:before { content: "— " }
blockquote p { color: rgba(102, 102, 102, 1); max-width: 98% }
hr { height: 1px; border-top: 1px dashed rgba(0, 102, 204, 1); border-right: none; border-bottom: none; border-left: none }
button, input, select, textarea { font-size: 100%; margin: 0; vertical-align: baseline; *vertical-align: middle }
button, input { line-height: normal; *overflow: visible }
{ border: 0; padding: 0 }
button, input[type="button"], input[type="reset"], input[type="submit"] { cursor: pointer; -webkit-appearance: button }
input[type="checkbox"], input[type="radio"] { cursor: pointer }
input:not([type="image"]), textarea { -webkit-box-sizing: content-box; -moz-box-sizing: content-box; box-sizing: content-box }
input[type="search"] { -webkit-appearance: textfield; -webkit-box-sizing: content-box; -moz-box-sizing: content-box; box-sizing: content-box }
{ -webkit-appearance: none }
label, input, select, textarea { font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; font-weight: normal; line-height: normal; margin-bottom: 18px }
input[type="checkbox"], input[type="radio"] { cursor: pointer; margin-bottom: 0 }
input[type="text"], input[type="password"], textarea, select { display: inline-block; width: 210px; padding: 4px; font-size: 13px; font-weight: normal; line-height: 18px; height: 18px; color: rgba(128, 128, 128, 1); border: 1px solid rgba(204, 204, 204, 1); -webkit-border-radius: 3px; -moz-border-radius: 3px; border-radius: 3px }
select, input[type="file"] { height: 27px; line-height: 27px }
textarea { height: auto }
{ color: rgba(191, 191, 191, 1) }
{ color: rgba(191, 191, 191, 1) }
input[type="text"], input[type="password"], select, textarea { -webkit-transition: border linear 0.2s, box-shadow linear 0.2s; -moz-transition: border linear 0.2s, box-shadow linear 0.2s; transition: border 0.2s linear, box-shadow 0.2s linear; -webkit-box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1); -moz-box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1); box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1) }
input[type="text"]:focus, input[type="password"]:focus, textarea:focus { outline: none; border-color: rgba(82, 168, 236, 0.8); -webkit-box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1), 0 0 8px rgba(82, 168, 236, 0.6); -moz-box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1), 0 0 8px rgba(82, 168, 236, 0.6); box-shadow: inset 0 1px 3px rgba(0, 0, 0, 0.1), 0 0 8px rgba(82, 168, 236, 0.6) }
button { display: inline-block; padding: 4px 14px; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 13px; line-height: 18px; -webkit-border-radius: 4px; -moz-border-radius: 4px; border-radius: 4px; -webkit-box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); -moz-box-shadow: inset 0 1px 0 rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); box-shadow: inset 0 1px rgba(255, 255, 255, 0.2), 0 1px 2px rgba(0, 0, 0, 0.05); background-color: rgba(0, 100, 205, 1); background-repeat: repeat-x; color: rgba(255, 255, 255, 1); text-shadow: 0 -1px rgba(0, 0, 0, 0.25); border-top: 1px solid rgba(0, 0, 0, 0.1); border-right: 1px solid rgba(0, 0, 0, 0.1); border-bottom: 1px solid rgba(0, 0, 0, 0.25); border-left: 1px solid rgba(0, 0, 0, 0.1); -webkit-transition: 0.1s linear all; -moz-transition: 0.1s linear all; transition: all 0.1s linear }
button:hover { color: rgba(255, 255, 255, 1); background-position: 0 -15px; text-decoration: none }
button:active { -webkit-box-shadow: inset 0 3px 7px rgba(0, 0, 0, 0.15), 0 1px 2px rgba(0, 0, 0, 0.05); -moz-box-shadow: inset 0 3px 7px rgba(0, 0, 0, 0.15), 0 1px 2px rgba(0, 0, 0, 0.05); box-shadow: inset 0 3px 7px rgba(0, 0, 0, 0.15), 0 1px 2px rgba(0, 0, 0, 0.05) }
{ padding: 0; border: 0 }
table { border-spacing: 0; border: 1px solid rgba(204, 204, 204, 1) }
td, th { border: 1px solid rgba(204, 204, 204, 1); padding: 5px }
pre .literal, pre .comment, pre .template_comment, pre .diff .header, pre .javadoc { color: rgba(0, 128, 0, 1) }
pre .keyword, pre .css .rule .keyword, pre .winutils, pre .javascript .title, pre .nginx .title, pre .subst, pre .request, pre .status { color: rgba(0, 0, 255, 1); font-weight: bold }
pre .number, pre .hexcolor, pre .python .decorator, pre .ruby .constant { color: rgba(0, 0, 255, 1) }
pre .string, pre .tag .value, pre .phpdoc, pre .tex .formula { color: rgba(221, 17, 68, 1) }
pre .title, pre .id { color: rgba(153, 0, 0, 1); font-weight: bold }
pre .javascript .title, pre .lisp .title, pre .clojure .title, pre .subst { font-weight: normal }
pre .class .title, pre .haskell .type, pre .vhdl .literal, pre .tex .command { color: rgba(68, 85, 136, 1); font-weight: bold }
pre .tag, pre .tag .title, pre .rules .property, pre .django .tag .keyword { color: rgba(0, 0, 128, 1); font-weight: normal }
pre .attribute, pre .variable, pre .lisp .body { color: rgba(0, 128, 128, 1) }
pre .regexp { color: rgba(0, 153, 38, 1) }
pre .class { color: rgba(68, 85, 136, 1); font-weight: bold }
pre .symbol, pre .ruby .symbol .string, pre .lisp .keyword, pre .tex .special, pre .prompt { color: rgba(153, 0, 115, 1) }
pre .built_in, pre .lisp .title, pre .clojure .built_in { color: rgba(0, 134, 179, 1) }
pre .preprocessor, pre .pi, pre .doctype, pre .shebang, pre .cdata { color: rgba(153, 153, 153, 1); font-weight: bold }
pre .deletion { background: rgba(255, 221, 221, 1) }
pre .addition { background: rgba(221, 255, 221, 1) }
pre .diff .change { background: rgba(0, 134, 179, 1) }
pre .chunk { color: rgba(170, 170, 170, 1) }
pre .markdown .header { color: rgba(136, 0, 0, 1); font-weight: bold }
pre .markdown .blockquote { color: rgba(136, 136, 136, 1) }
pre .markdown .link_label { color: rgba(136, 136, 255, 1) }
pre .markdown .strong { font-weight: bold }
pre .markdown .emphasis { font-style: italic }
pref { color: rgba(255, 0, 0, 1) }

The advantages and disadvantages of hadoop hdfs replicating storage and erasure coding storage.

Hadoop 3.0.0-alpha1 及以上版本提供了纠删码(Erasure Coding)存储数据的支持,用户可以根据不同的场景和需求选择副本存储或EC存储方案,两种存储方案各有优缺点和适用场景。

1 副本存储

以文件大小:2.5G 写入HDFS为例;

$ ls –ltr
-rw-r--r-- 1 sywu sywu 2.5G Mar 4 14:15 data000

hdfs数据块大小:512M,默认副本数:3;2.5G文件被切分为5个数据块(B1,B2,B3,B4,B5)存储,单节点存储完成,数据块复制到其它节点,最终总的数据块数:15个,总消耗存储空间:7.5G。

Hadoop hdfs副本存储和纠删码(Erasure Coding)存储优缺点

1.1 优点

  1. 副本保证数据可用性;当某个节点数据块丢失或破损,datanode发现后会自动从其它正常的节点中恢复数据块;
  2. 副本提高了作业运行并行度,作业可以同时在副本节点运行。 ## 1.2 缺点
  3. 每个副本使用100%的存储开销,副本数越多,存储开销越大;
  4. 副本同步占用大量的网络和IO资源。

2 纠删码(Erasure Coding)存储

HDFS使用纠删码(Erasure Coding,以下简称EC)解决副本复制和副本存储所带来的空间和资源消耗问题,以EC代替副本,提供和副本存储相同级别的容错能力,并且存储开销不超过单副本存储的50%。

2.1 纠删码(Erasure Coding)组成结构

Hadoop hdfs副本存储和纠删码(Erasure Coding)存储优缺点

  1. EC由数据(Data)和奇偶校验码(Parity)两部分组成,数据存储时通过EC算法生成;生成的过程称为编码(encoding),恢复丢失数据块的过程称为解码(decoding)。
  2. 与HDFS文件基本构成单位:块(Block) 不同,EC的构成单位:块组(Block group)、块(Block)、单元(cell),每个块组存放与其它块组一样数量的数据块和奇偶校验码块;单元(cell)是EC内部最小的存储结构,多个单元组成条(Striping),存储在块(Block)里。
  3. EC写入方案有:连续布局(Contiguous Layout)和条形布局(Striping Layout);连续布局从 Hadoop 3.0.0-alpha1 版本开始提供支持,数据依次写入块中的单元(cell),一个块写满之后再写入下一个块;条形布局写入方案目前还在开发阶段,按官方介绍,这种方案由若干个相同大小的单元(cell)序列构成条(stripe),数据被依次写入条的各个单元中,当一个条写满之后再写入下一个条,一个条的不同单元位于不同的数据块中
  4. 数据和奇偶校验码块数量由EC策略(Erasure coding policies)决定,比如上图的策略:RS-3-2-1024k,表示每个块组由3个数据块和2个校验块构成,每个单元(cell)大小为1024k,最大可丢失块2个,丢失超过2个则无法恢复。

2.2 Erasure Coding算法

目前支持的EC算法有XOR和Reed-Solomon两种;

2.2.1 XOR 算法

exclusive OR,基于异或运算的算法,从任意数量的单元(cell)中生成1个奇偶校验块。如果任何数据块丢失,则通过对其余位和奇偶校验位进行来恢复。

Hadoop hdfs副本存储和纠删码(Erasure Coding)存储优缺点

由于只有1个奇偶校验块,因此它只能容忍1个数据块故障,对于像HDFS这样需要处理多个故障的系统来说,不适用。

2.2.2 Reed-Solomon 算法

Reed-Solomon(简称:RS),该算法有两个参数k和m,记为 RS(k,m),RS算法将k个数据单元(cell)与生成器矩阵(Generator Matrix)相乘得到具有k个数据单元和m个奇偶校验码的扩展码字(extended codewords),最多可容忍m个数据块丢失;如果数据块丢失,则只要将 k+m 个单元格中的k个可用,通过将生成器矩阵的逆与扩展码字(extended codewords)相乘来恢复数据块。由于可以容忍多个数据块丢失,更适用于生产环境。

2.3 使用EC RS算法存储数据

Hadoop 3.0.0-alpha1 版本及以上版本默认已经启用EC,首先设置目录策略使用RS算法,以6个data块和3个奇偶校验码构成一个block group;

hdfs ec -setPolicy -path /data/ec -policy RS-6-3-1024k

将2.5G数据写入EC目录下,数据最终存储在一个block group下,共9个数据块,总消耗存储空间:2.5G。

Hadoop hdfs副本存储和纠删码(Erasure Coding)存储优缺点

检查状态;

/data/ec/split000 2684354560 bytes, erasure-coded: policy=RS-6-3-1024k, 1 block(s):  OK
0. BP-1016852637-192.168.1.192-1597819912196:blk_-9223372036854775792_4907534 len=2684354560 Live_repl=9 [blk_-9223372036854775792:DatanodeInfoWithStorage[192.168.1.192:1019,DS-3b10bc66-c5c9-47f8-b4ea-99441fc5df04,DISK], blk_-9223372036854775791:DatanodeInfoWithStorage[192.168.1.196:1019,DS-c618bb83-03f2-4007-a3a9-cbd11eb2a15a,DISK], blk_-9223372036854775790:DatanodeInfoWithStorage[192.168.1.193:1019,DS-6554db41-d285-4f31-9fde-2017cc211b0c,DISK], blk_-9223372036854775789:DatanodeInfoWithStorage[192.168.1.188:1019,DS-09a7d2b0-6018-43af-8a63-6be8ba9217e6,DISK], blk_-9223372036854775788:DatanodeInfoWithStorage[192.168.1.199:1019,DS-b24cf7ce-1235-478f-884d-19cde4a03c9e,DISK], blk_-9223372036854775787:DatanodeInfoWithStorage[192.168.1.187:1019,DS-43b686d2-e83c-49a3-b8e5-64c4e5edcb53,DISK], blk_-9223372036854775786:DatanodeInfoWithStorage[192.168.1.195:1019,DS-eb42e686-1ca9-44c3-9891-868c67d9d1fa,DISK], blk_-9223372036854775785:DatanodeInfoWithStorage[192.168.1.194:1019,DS-15635fea-314c-4703-a7ab-6f81db1e52cd,DISK], blk_-9223372036854775784:DatanodeInfoWithStorage[192.168.1.198:1019,DS-73bc78d2-e218-40a9-ae9c-d24706a0bc31,DISK]] Status: HEALTHY
Number of data-nodes: 10
Number of racks: 1
Total dirs: 0
Total symlinks: 0 Replicated Blocks:
Total size: 0 B
Total files: 0
Total blocks (validated): 0
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 3
Average block replication: 0.0
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 0 Erasure Coded Block Groups:
Total size: 2684354560 B
Total files: 1
Total block groups (validated): 1 (avg. block group size 2684354560 B)
Minimally erasure-coded block groups: 1 (100.0 %)
Over-erasure-coded block groups: 0 (0.0 %)
Under-erasure-coded block groups: 0 (0.0 %)
Unsatisfactory placement block groups: 0 (0.0 %)
Average block group size: 9.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0 (0.0 %)

尝试删除(blk-9223372036854775792、blk-9223372036854775791、blk_-9223372036854775790)三个数据块(最多允许3个数据块丢失)。

rm /data/current/BP-1016852637-192.168.1.192-1597819912196/current/finalized/subdir0/subdir0/blk_-9223372036854775792
rm /data/current/BP-1016852637-192.168.1.192-1597819912196/current/finalized/subdir0/subdir0/blk_-9223372036854775791
rm /data/current/BP-1016852637-192.168.1.192-1597819912196/current/finalized/subdir0/subdir0/blk_-9223372036854775790

现在尝试读取文件;

hdfs dfs -get  /data/ec/split000 .

datanode首先抛未找到数据块异常;

21/03/05 16:55:39 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
java.io.IOException: Got error, status=ERROR, status message opReadBlock BP-1016852637-192.168.1.192-1597819912196:blk_-9223372036854775792_4907534 received exception java.io.FileNotFoundException: BlockId -9223372036854775792 is not valid., for OP_READ_BLOCK, self=/192.168.1.192:42168, remote=/192.168.1.192:1019, for file /data/ec/split000, for pool BP-1016852637-192.168.1.192-1597819912196 block -9223372036854775792_4907534
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.checkSuccess(BlockReaderRemote.java:447)
at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:415)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:860)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:756)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:390)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:657)
at org.apache.hadoop.hdfs.DFSStripedInputStream.createBlockReader(DFSStripedInputStream.java:256)
at org.apache.hadoop.hdfs.StripeReader.readChunk(StripeReader.java:293)
at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:323)
at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:318)
at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:391)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:94)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:68)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:129)
at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:497)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:419)
at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:354)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:289)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:274)
at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:303)
at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:269)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:285)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:269)
at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:240)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:120)
at org.apache.hadoop.fs.shell.Command.run(Command.java:176)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)

然后自动通过RS算法恢复数据;

21/03/05 16:55:39 hdfs.DFSClient: refreshLocatedBlock for striped blocks, offset=0. Obtained block LocatedStripedBlock{BP-1016852637-192.168.1.192-1597819912196:blk_-9223372036854775792_4907534; getBlockSize()=2684354560; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[192.168.1.192:1019,DS-3b10bc66-c5c9-47f8-b4ea-99441fc5df04,DISK], DatanodeInfoWithStorage[192.168.1.196:1019,DS-c618bb83-03f2-4007-a3a9-cbd11eb2a15a,DISK], DatanodeInfoWithStorage[192.168.1.193:1019,DS-6554db41-d285-4f31-9fde-2017cc211b0c,DISK], DatanodeInfoWithStorage[192.168.1.188:1019,DS-09a7d2b0-6018-43af-8a63-6be8ba9217e6,DISK], DatanodeInfoWithStorage[192.168.1.199:1019,DS-b24cf7ce-1235-478f-884d-19cde4a03c9e,DISK], DatanodeInfoWithStorage[192.168.1.187:1019,DS-43b686d2-e83c-49a3-b8e5-64c4e5edcb53,DISK], DatanodeInfoWithStorage[192.168.1.195:1019,DS-eb42e686-1ca9-44c3-9891-868c67d9d1fa,DISK], DatanodeInfoWithStorage[192.168.1.194:1019,DS-15635fea-314c-4703-a7ab-6f81db1e52cd,DISK], DatanodeInfoWithStorage[192.168.1.198:1019,DS-73bc78d2-e218-40a9-ae9c-d24706a0bc31,DISK]]; indices=[0, 1, 2, 3, 4, 5, 6, 7, 8]}, idx=0
21/03/05 16:55:39 WARN hdfs.DFSClient: [DatanodeInfoWithStorage[192.168.1.192:1019,DS-3b10bc66-c5c9-47f8-b4ea-99441fc5df04,DISK]] are unavailable and all striping blocks on them are lost. IgnoredNodes = null
21/03/05 16:55:39 hdfs.DFSClient: refreshLocatedBlock for striped blocks, offset=0. Obtained block LocatedStripedBlock{BP-1016852637-192.168.1.192-1597819912196:blk_-9223372036854775792_4907534; getBlockSize()=2684354560; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[192.168.1.192:1019,DS-3b10bc66-c5c9-47f8-b4ea-99441fc5df04,DISK], DatanodeInfoWithStorage[192.168.1.196:1019,DS-c618bb83-03f2-4007-a3a9-cbd11eb2a15a,DISK], DatanodeInfoWithStorage[192.168.1.193:1019,DS-6554db41-d285-4f31-9fde-2017cc211b0c,DISK], DatanodeInfoWithStorage[192.168.1.188:1019,DS-09a7d2b0-6018-43af-8a63-6be8ba9217e6,DISK], DatanodeInfoWithStorage[192.168.1.199:1019,DS-b24cf7ce-1235-478f-884d-19cde4a03c9e,DISK], DatanodeInfoWithStorage[192.168.1.187:1019,DS-43b686d2-e83c-49a3-b8e5-64c4e5edcb53,DISK], DatanodeInfoWithStorage[192.168.1.195:1019,DS-eb42e686-1ca9-44c3-9891-868c67d9d1fa,DISK], DatanodeInfoWithStorage[192.168.1.194:1019,DS-15635fea-314c-4703-a7ab-6f81db1e52cd,DISK], DatanodeInfoWithStorage[192.168.1.198:1019,DS-73bc78d2-e218-40a9-ae9c-d24706a0bc31,DISK]]; indices=[0, 1, 2, 3, 4, 5, 6, 7, 8]}, idx=1
21/03/05 16:55:39 hdfs.DFSClient: refreshLocatedBlock for striped blocks, offset=0. Obtained block LocatedStripedBlock{BP-1016852637-192.168.1.192-1597819912196:blk_-9223372036854775792_4907534; getBlockSize()=2684354560; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[192.168.1.192:1019,DS-3b10bc66-c5c9-47f8-b4ea-99441fc5df04,DISK], DatanodeInfoWithStorage[192.168.1.196:1019,DS-c618bb83-03f2-4007-a3a9-cbd11eb2a15a,DISK], DatanodeInfoWithStorage[192.168.1.193:1019,DS-6554db41-d285-4f31-9fde-2017cc211b0c,DISK], DatanodeInfoWithStorage[192.168.1.188:1019,DS-09a7d2b0-6018-43af-8a63-6be8ba9217e6,DISK], DatanodeInfoWithStorage[192.168.1.199:1019,DS-b24cf7ce-1235-478f-884d-19cde4a03c9e,DISK], DatanodeInfoWithStorage[192.168.1.187:1019,DS-43b686d2-e83c-49a3-b8e5-64c4e5edcb53,DISK], DatanodeInfoWithStorage[192.168.1.195:1019,DS-eb42e686-1ca9-44c3-9891-868c67d9d1fa,DISK], DatanodeInfoWithStorage[192.168.1.194:1019,DS-15635fea-314c-4703-a7ab-6f81db1e52cd,DISK], DatanodeInfoWithStorage[192.168.1.198:1019,DS-73bc78d2-e218-40a9-ae9c-d24706a0bc31,DISK]]; indices=[0, 1, 2, 3, 4, 5, 6, 7, 8]}, idx=6

最终3个丢失的数据块被恢复,文件正常读取。

2.4 优点

  1. 相比副本存储方式大大降低了存储资源和IO资源的使用;
  2. 通过XOR和RS算法保证数据安全,有效解决允许范围内数据块破碎和丢失导致的异常;

2.5 缺点

  1. 恢复数据时需要去读其它数据块和奇偶校验码块数据,需要消耗IO和网络资源;
  2. EC算法编码和解密计算需要消耗CPU资源;
  3. 存储大数据,并且数据块较集中的节点运行作业负载会较高;
  4. EC文件不支持hflush, hsync, concat, setReplication, truncate, append 等操作。

3 副本存储和EC存储读写对比

写文件;

文件大小 三副本存储(秒) EC存储(秒)
2.5G 6.492 9.093
5G 29.778 17.245
12G 65.156 30.989

读文件;

文件大小 三副本存储(秒) EC存储(秒)
2.5G 4.915 4.755
5G 7.737 7.119
12G 14.493 13.775

4 总结

综合副本存储和EC存储优缺点,EC存储更适合用于存储备份数据和使用频率较少的非热点数据,副本存储更适用于存储需要追加写入和经常分析的热点数据。

参考文献