[DRBD-user] Reproductible IO errors

Ben RUBSON ben.rubson at gmail.com
Sat Apr 12 23:55:30 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

I encounter IO errors on my new DRBD installation.

Software : Debian 7, DRBD 8.4.4, kernel 3.10.23.
Hardware : LSI MegaRAID SAS 9271-8i.
Layout : DRBD on top of RAID arrays, LVM on top of DRBD.

DRBD main configuration items :
disk {
    disk-barrier no;
    disk-flushes no;
    md-flushes no;
}
net {
    protocol C;
    max-buffers 131072;
}
resource r0 {
    volume 0 {
        device /dev/drbd0;
        disk /dev/c0v0;
        meta-disk internal;
    }
    volume 1 {
        device /dev/drbd1;
        disk /dev/c0v1;
        meta-disk internal;
    }
    on server1 {
        address 192.168.0.1:7788;
    }
    on server2.urbackups.com {
        address 192.168.0.2:7788;
    }
}

The issue is reproductible.
2 times on the primary, when I created / mounted /played with an ext4 FS
(over a LV), I got the following error :

# Apr 12 12:30:56 server1 kernel: EXT4-fs (dm-1): barriers disabled
# Apr 12 12:30:56 server1 kernel: EXT4-fs (dm-1): mounted filesystem with
ordered data mode. Opts: nobarrier,user_xattr
# Apr 12 12:30:57 server1 kernel: block drbd1: local WRITE IO error sector
10512+4088 on sdb
# Apr 12 12:30:57 server1 kernel: block drbd1: disk( UpToDate -> Failed )
# Apr 12 12:30:57 server1 kernel: block drbd1: Local IO failed in
__req_mod. Detaching...
# Apr 12 12:30:57 server1 kernel: block drbd1: bitmap WRITE of 0 pages took
0 jiffies
# Apr 12 12:30:57 server1 kernel: block drbd1: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
# Apr 12 12:30:57 server1 kernel: block drbd1: disk( Failed -> Diskless )
# Apr 12 12:30:57 server1 kernel: drbd r0: sock was shut down by peer

# Apr 12 12:51:16 server1 kernel: EXT4-fs (dm-1): barriers disabled
# Apr 12 12:51:16 server1 kernel: EXT4-fs (dm-1): mounted filesystem with
ordered data mode. Opts: nobarrier,user_xattr
# Apr 12 12:52:02 server1 kernel: block drbd0: Remote failed to finish a
request within ko-count * timeout
# Apr 12 12:52:02 server1 kernel: block drbd0: peer( Secondary -> Unknown )
conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
# Apr 12 12:52:02 server1 kernel: block drbd1: Remote failed to finish a
request within ko-count * timeout
# Apr 12 12:52:02 server1 kernel: block drbd1: peer( Secondary -> Unknown )
conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
# Apr 12 12:52:02 server1 kernel: block drbd0: new current UUID
8EDD6F6B8037052F:B4686C2EAED0C5BD:0583520A709450C9:0582520A709450C9
# Apr 12 12:52:02 server1 kernel: drbd r0: asender terminated
# Apr 12 12:52:02 server1 kernel: drbd r0: Terminating drbd_a_r0
# Apr 12 12:52:02 server1 kernel: block drbd1: helper command:
/sbin/drbdadm pri-on-incon-degr minor-1
# Apr 12 12:52:02 server1 kernel: block drbd1: helper command:
/sbin/drbdadm pri-on-incon-degr minor-1 exit code 0 (0x0)
# Apr 12 12:52:02 server1 kernel: drbd r0: Connection closed
# Apr 12 12:52:02 server1 kernel: dm-1: WRITE SAME failed. Manually zeroing.
# Apr 12 12:52:02 server1 kernel: block drbd1: 3 messages suppressed in
/usr/src/drbd-8.4.4/drbd/drbd_req.c:1198.
# Apr 12 12:52:02 server1 kernel: block drbd1: IO ERROR: neither local nor
remote data, sector 10256+8
# Apr 12 12:52:02 server1 kernel: EXT4-fs error (device dm-1):
ext4_wait_block_bitmap:466: comm kworker/u82:2: Cannot read block bitmap -
block_group = 1, block_bitmap = 1026
# Apr 12 12:52:02 server1 kernel: block drbd1: IO ERROR: neither local nor
remote data, sector 10512+560
# Apr 12 12:52:02 server1 kernel: block drbd1: IO ERROR: neither local nor
remote data, sector 11072+560

I then had some difficulties to reproduct the issue.
So I decided to create another new FS, and to promote secondary to primary.
When I mounted the FS on the new primary, I got :

# Apr 12 22:45:39 server2 kernel: block drbd0: role( Secondary -> Primary )
# Apr 12 22:45:39 server2 kernel: block drbd0: new current UUID
EF985892FD50BA49:48CD646E1D04AC5C:919314E021E03BF4:919214E021E03BF5
# Apr 12 22:45:39 server2 kernel: block drbd1: role( Secondary -> Primary )
# Apr 12 22:45:39 server2 kernel: block drbd1: new current UUID
782430D91ABEAF91:480750117C05BF9C:41A69EE3B4A47B3A:41A59EE3B4A47B3B
# Apr 12 22:45:48 server2 kernel: bio: create slab <bio-2> at 2
# Apr 12 22:46:02 server2 kernel: EXT4-fs (dm-1): mounted filesystem with
ordered data mode. Opts: (null)
# Apr 12 22:46:02 server2 kernel: block drbd1: local WRITE IO error sector
16808192+4096 on sdb
# Apr 12 22:46:02 server2 kernel: block drbd1: disk( UpToDate -> Failed )
# Apr 12 22:46:02 server2 kernel: block drbd1: Local IO failed in
__req_mod. Detaching...
# Apr 12 22:46:02 server2 kernel: block drbd1: helper command:
/sbin/drbdadm pri-on-incon-degr minor-1
# Apr 12 22:46:02 server2 kernel: dm-1: WRITE SAME failed. Manually zeroing.
# Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor
remote data, sector 16808192+560
# Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor
remote data, sector 16808752+560
# Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor
remote data, sector 16809312+560
# Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor
remote data, sector 16809872+560
# Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor
remote data, sector 16810432+560
# Apr 12 22:46:02 server2 kernel: block drbd1: helper command:
/sbin/drbdadm pri-on-incon-degr minor-1 exit code 0 (0x0)
# Apr 12 22:46:02 server2 kernel: block drbd1: bitmap WRITE of 1 pages took
0 jiffies
# Apr 12 22:46:02 server2 kernel: block drbd1: 4 KB (1 bits) marked
out-of-sync by on disk bit-map.
# Apr 12 22:46:02 server2 kernel: block drbd1: disk( Failed -> Diskless )

No issue at all on RAID controllers' side, their logs are clean.

Sounds like my issue is the same as this one :
http://lists.linbit.com/pipermail/drbd-user/2013-November/020419.html

Is it a known issue ?
With DRBD 8.4.4 ?
With kernel 3.10.x ?
With both ?

Could you help me please ?

Thank you very much,

Best regards,

Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140412/b704109a/attachment.htm>


More information about the drbd-user mailing list