Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I encounter IO errors on my new DRBD installation. Software : Debian 7, DRBD 8.4.4, kernel 3.10.23. Hardware : LSI MegaRAID SAS 9271-8i. Layout : DRBD on top of RAID arrays, LVM on top of DRBD. DRBD main configuration items : disk { disk-barrier no; disk-flushes no; md-flushes no; } net { protocol C; max-buffers 131072; } resource r0 { volume 0 { device /dev/drbd0; disk /dev/c0v0; meta-disk internal; } volume 1 { device /dev/drbd1; disk /dev/c0v1; meta-disk internal; } on server1 { address 192.168.0.1:7788; } on server2.urbackups.com { address 192.168.0.2:7788; } } The issue is reproductible. 2 times on the primary, when I created / mounted /played with an ext4 FS (over a LV), I got the following error : # Apr 12 12:30:56 server1 kernel: EXT4-fs (dm-1): barriers disabled # Apr 12 12:30:56 server1 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: nobarrier,user_xattr # Apr 12 12:30:57 server1 kernel: block drbd1: local WRITE IO error sector 10512+4088 on sdb # Apr 12 12:30:57 server1 kernel: block drbd1: disk( UpToDate -> Failed ) # Apr 12 12:30:57 server1 kernel: block drbd1: Local IO failed in __req_mod. Detaching... # Apr 12 12:30:57 server1 kernel: block drbd1: bitmap WRITE of 0 pages took 0 jiffies # Apr 12 12:30:57 server1 kernel: block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map. # Apr 12 12:30:57 server1 kernel: block drbd1: disk( Failed -> Diskless ) # Apr 12 12:30:57 server1 kernel: drbd r0: sock was shut down by peer # Apr 12 12:51:16 server1 kernel: EXT4-fs (dm-1): barriers disabled # Apr 12 12:51:16 server1 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: nobarrier,user_xattr # Apr 12 12:52:02 server1 kernel: block drbd0: Remote failed to finish a request within ko-count * timeout # Apr 12 12:52:02 server1 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) # Apr 12 12:52:02 server1 kernel: block drbd1: Remote failed to finish a request within ko-count * timeout # Apr 12 12:52:02 server1 kernel: block drbd1: peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) # Apr 12 12:52:02 server1 kernel: block drbd0: new current UUID 8EDD6F6B8037052F:B4686C2EAED0C5BD:0583520A709450C9:0582520A709450C9 # Apr 12 12:52:02 server1 kernel: drbd r0: asender terminated # Apr 12 12:52:02 server1 kernel: drbd r0: Terminating drbd_a_r0 # Apr 12 12:52:02 server1 kernel: block drbd1: helper command: /sbin/drbdadm pri-on-incon-degr minor-1 # Apr 12 12:52:02 server1 kernel: block drbd1: helper command: /sbin/drbdadm pri-on-incon-degr minor-1 exit code 0 (0x0) # Apr 12 12:52:02 server1 kernel: drbd r0: Connection closed # Apr 12 12:52:02 server1 kernel: dm-1: WRITE SAME failed. Manually zeroing. # Apr 12 12:52:02 server1 kernel: block drbd1: 3 messages suppressed in /usr/src/drbd-8.4.4/drbd/drbd_req.c:1198. # Apr 12 12:52:02 server1 kernel: block drbd1: IO ERROR: neither local nor remote data, sector 10256+8 # Apr 12 12:52:02 server1 kernel: EXT4-fs error (device dm-1): ext4_wait_block_bitmap:466: comm kworker/u82:2: Cannot read block bitmap - block_group = 1, block_bitmap = 1026 # Apr 12 12:52:02 server1 kernel: block drbd1: IO ERROR: neither local nor remote data, sector 10512+560 # Apr 12 12:52:02 server1 kernel: block drbd1: IO ERROR: neither local nor remote data, sector 11072+560 I then had some difficulties to reproduct the issue. So I decided to create another new FS, and to promote secondary to primary. When I mounted the FS on the new primary, I got : # Apr 12 22:45:39 server2 kernel: block drbd0: role( Secondary -> Primary ) # Apr 12 22:45:39 server2 kernel: block drbd0: new current UUID EF985892FD50BA49:48CD646E1D04AC5C:919314E021E03BF4:919214E021E03BF5 # Apr 12 22:45:39 server2 kernel: block drbd1: role( Secondary -> Primary ) # Apr 12 22:45:39 server2 kernel: block drbd1: new current UUID 782430D91ABEAF91:480750117C05BF9C:41A69EE3B4A47B3A:41A59EE3B4A47B3B # Apr 12 22:45:48 server2 kernel: bio: create slab <bio-2> at 2 # Apr 12 22:46:02 server2 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null) # Apr 12 22:46:02 server2 kernel: block drbd1: local WRITE IO error sector 16808192+4096 on sdb # Apr 12 22:46:02 server2 kernel: block drbd1: disk( UpToDate -> Failed ) # Apr 12 22:46:02 server2 kernel: block drbd1: Local IO failed in __req_mod. Detaching... # Apr 12 22:46:02 server2 kernel: block drbd1: helper command: /sbin/drbdadm pri-on-incon-degr minor-1 # Apr 12 22:46:02 server2 kernel: dm-1: WRITE SAME failed. Manually zeroing. # Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor remote data, sector 16808192+560 # Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor remote data, sector 16808752+560 # Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor remote data, sector 16809312+560 # Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor remote data, sector 16809872+560 # Apr 12 22:46:02 server2 kernel: block drbd1: IO ERROR: neither local nor remote data, sector 16810432+560 # Apr 12 22:46:02 server2 kernel: block drbd1: helper command: /sbin/drbdadm pri-on-incon-degr minor-1 exit code 0 (0x0) # Apr 12 22:46:02 server2 kernel: block drbd1: bitmap WRITE of 1 pages took 0 jiffies # Apr 12 22:46:02 server2 kernel: block drbd1: 4 KB (1 bits) marked out-of-sync by on disk bit-map. # Apr 12 22:46:02 server2 kernel: block drbd1: disk( Failed -> Diskless ) No issue at all on RAID controllers' side, their logs are clean. Sounds like my issue is the same as this one : http://lists.linbit.com/pipermail/drbd-user/2013-November/020419.html Is it a known issue ? With DRBD 8.4.4 ? With kernel 3.10.x ? With both ? Could you help me please ? Thank you very much, Best regards, Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20140412/b704109a/attachment.htm>