[DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

Eric Robinson eric.robinson at psmnv.com
Wed Oct 18 19:32:23 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Okay, here’s the latest.

I ran the TrimTester tool on two separate servers in DRBD standalone mode, and both wrote 24TB to disk without any file corruption errors. I then connected and sync’d the nodes and ran the test again with replication enabled. TrimTester detected file corruption after a few hours.

The supposedly “corrupted” file is 4.9GB in size (as expected, over 4GB). The file name is ’50.’ It contains a repeating sequence of characters that looks like this:

# od -b 50 | more
0000000 001 002 003 004 005 006 007 010 011 012 013 014 015 016 017 020
0000020 021 022 023 024 025 026 027 030 031 032 033 034 035 036 037 040
0000040 041 042 043 044 045 046 047 050 051 052 053 054 055 056 057 060
0000060 061 062 063 064 065 066 067 070 071 072 073 074 075 076 077 100
0000100 101 102 103 104 105 106 107 110 111 112 113 114 115 116 117 120
0000120 121 122 123 124 125 126 127 130 131 132 133 134 135 136 137 140
0000140 141 142 143 144 145 146 147 150 151 152 153 154 155 156 157 160
0000160 161 162 163 164 165 166 167 170 171 172 173 174 175 176 177 200
0000200 201 202 203 204 205 206 207 210 211 212 213 214 215 216 217 220
0000220 221 222 223 224 225 226 227 230 231 232 233 234 235 236 237 240
0000240 241 242 243 244 245 246 247 250 251 252 253 254 255 256 257 260
0000260 261 262 263 264 265 266 267 270 271 272 273 274 275 276 277 300
0000300 301 302 303 304 305 306 307 310 311 312 313 314 315 316 317 320
0000320 321 322 323 324 325 326 327 330 331 332 333 334 335 336 337 340
0000340 341 342 343 344 345 346 347 350 351 352 353 354 355 356 357 360
0000360 361 362 363 364 365 366 367 370 371 372 373 374 375 376 377 000
<snip>

The same sequence repeats to the end of the file. As you can see, each iteration of the character sequence ends with a zero/null. However, there are no zeroes anywhere else in the file except at the end of every sequence.

I guess it is safe to conclude that your theory is right. The file does not appear to really be corrupted. The TrimTester tool is reporting a false positive.

--Eric


From: drbd-user-bounces at lists.linbit.com [mailto:drbd-user-bounces at lists.linbit.com] On Behalf Of Lars Ellenberg
Sent: Tuesday, October 17, 2017 9:24 AM
To: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] Warning: Data Corruption Issue Discovered in DRBD 8.4 and 9.0

> Most importantly: once the trimtester (or *any* "corruption detecting"
> tool) claims that a certain corruption is found, you look at what supposedly is
> corrupt, and double check if it in fact is.
>
> Before doing anything else.
>
I did that, but I don't know what a "good" file is supposed to look like, so I can't tell whether it is really corrupted.
The last time TrimTester reported a corrupt file, I checked it manually and it looked fine to me, but I don't know what I'm looking for.

For it to be reported as corrupted by trimester, it would need to contain at least one aligned 512 byte sector full of zeroes. Did it? Did it contain even two zero bytes next to each other?

Or simply the same 256 byte cyclic pattern that is written by this tool?

If so, then obviously the tool in error claimed to find a corruption that clearly was not there.

Feel free to keep using that tool, but fix it first, change the unsigned i and j to size_t or uint64_t. And maybe have it really check all files, before removing them.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20171018/9095da4d/attachment.htm>


More information about the drbd-user mailing list