[DRBD-user] DRBD Trouble (block drbd0: local WRITE IO error sector)

Seiichirou Hiraoka seiichirou.hiraoka at gmail.com
Fri Feb 3 07:32:39 CET 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello.

I use DRBD in the following environment.

OS: Redhat Enterprise Linux 7.1
Pacemaker: 1.1.12 (CentOS Repository)
Corosync: 2.3.4 (CentOS Repository)
DRBD: 8.4.9 (ELRepo)
# rpm -qi drbd84-utils
Name        : drbd84-utils
Version     : 8.9.2
Release     : 2.el7.elrepo
Architecture: x86_64
Vendor: The ELRepo Project (http://elrepo.org)
# rpm -qi kmod-drbd84
Name        : kmod-drbd84
Version     : 8.4.9
Release     : 1.el7.elrepo
Architecture: x86_64

Although DRBD is operated on two servers (server1, server2),
the following error message suddenly appears
and writing to the DRBD area can not be performed.

. server1(master)
Jan 20 10:41:16 server1 kernel: block drbd0: local WRITE IO error
sector 118616936+40 on dm-0
Jan 20 10:41:16 server1 kernel: block drbd0: disk( UpToDate -> Failed )
Jan 20 10:41:16 server1 kernel: block drbd0: Local IO failed in
__req_mod. Detaching...
Jan 20 10:41:16 server1 kernel: block drbd0: 0 KB (0 bits) marked
out-of-sync by on disk bit-map.
Jan 20 10:41:16 server1 kernel: block drbd0: disk( Failed -> Diskless )
Jan 20 10:41:16 server1 kernel: block drbd0: Got NegDReply; Sector
117512416s, len 4096.
Jan 20 10:41:16 server1 kernel: drbd0: WRITE SAME failed. Manually zeroing.
Jan 20 10:41:16 server1 kernel: block drbd0: Should have called
drbd_al_complete_io(, 118616936, 20480), but my Disk seems to have
failed :(
Jan 20 10:41:16 server1 kernel: block drbd0: pdsk( UpToDate -> Failed )
Jan 20 10:41:16 server1 kernel: block drbd0: pdsk( Failed -> Diskless )
Jan 20 10:41:16 server1 kernel: block drbd0: drbd_req_destroy: Logic
BUG rq_state = 0x90040, completion_ref = 0
Jan 20 10:41:16 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117512416+8
Jan 20 10:41:16 server1 kernel: EXT4-fs warning (device drbd0):
__ext4_read_dirblock:1375: error reading directory block (ino 3690583,
block 10)
Jan 20 10:41:16 server1 kernel: block drbd0: helper command:
/sbin/drbdadm pri-on-incon-degr minor-0
Jan 20 10:41:16 server1 kernel: EXT4-fs (drbd0): Delayed block
allocation failed for inode 4326138 at logical offset 1 with max
blocks 1 with error 5
Jan 20 10:41:16 server1 kernel: EXT4-fs (drbd0): This should not
happen!! Data will be lost
Jan 20 10:41:16 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117512384+8
Jan 20 10:41:16 server1 kernel: EXT4-fs warning (device drbd0):
__ext4_read_dirblock:1375: error reading directory block (ino 3690583,
block 6)
Jan 20 10:41:16 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117512296+8
Jan 20 10:41:16 server1 kernel: EXT4-fs warning (device drbd0):
__ext4_read_dirblock:1375: error reading directory block (ino 3670162,
block 1)
Jan 20 10:41:16 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117512296+8
Jan 20 10:41:16 server1 kernel: EXT4-fs warning (device drbd0):
__ext4_read_dirblock:1375: error reading directory block (ino 3670162,
block 1)
Jan 20 10:41:16 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117512296+8
Jan 20 10:41:16 server1 kernel: EXT4-fs warning (device drbd0):
__ext4_read_dirblock:1375: error reading directory block (ino 3670162,
block 1)
Jan 20 10:41:16 server1 kernel: Aborting journal on device drbd0-8.
Jan 20 10:41:16 server1 kernel: Buffer I/O error on dev drbd0, logical
block 25722880, lost sync page write
Jan 20 10:41:16 server1 kernel: JBD2: Error -5 detected when updating
journal superblock for drbd0-8.
Jan 20 10:41:16 server1 kernel: Buffer I/O error on dev drbd0, logical
block 0, lost sync page write
Jan 20 10:41:16 server1 kernel: EXT4-fs error (device drbd0):
ext4_journal_check_start:56: Detected aborted journal
Jan 20 10:41:16 server1 kernel: EXT4-fs (drbd0): Remounting filesystem read-only
Jan 20 10:41:16 server1 kernel: EXT4-fs (drbd0): previous I/O error to
superblock detected
Jan 20 10:41:16 server1 kernel: Buffer I/O error on dev drbd0, logical
block 0, lost sync page write
Jan 20 10:41:16 server1 kernel: block drbd0: helper command:
/sbin/drbdadm pri-on-incon-degr minor-0 exit code 0 (0x0)
Jan 20 10:41:25 server1 kernel: block drbd0: 122 messages suppressed
in /home/build/akemi/drbd84/el7/BUILD/drbd-8.4.9-1/drbd/drbd_req.c:1447.
Jan 20 10:41:25 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 134218048+8
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 16777256, lost async page write
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 14680354, lost async page write
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 14680421, lost async page write
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 14680422, lost async page write
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 14680437, lost async page write
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 14680491, lost async page write
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 14680492, lost async page write
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 14680493, lost async page write
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 14680494, lost async page write
Jan 20 10:41:25 server1 kernel: Buffer I/O error on dev drbd0, logical
block 14680495, lost async page write
Jan 20 10:46:50 server1 kernel: block drbd0: 168 messages suppressed
in /home/build/akemi/drbd84/el7/BUILD/drbd-8.4.9-1/drbd/drbd_req.c:1447.
Jan 20 10:46:50 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117512416+8
Jan 20 10:46:50 server1 kernel: EXT4-fs warning (device drbd0):
__ext4_read_dirblock:1375: error reading directory block (ino 3690583,
block 10)
Jan 20 10:47:03 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117512440+8
Jan 20 10:47:03 server1 kernel: EXT4-fs warning (device drbd0):
__ext4_read_dirblock:1375: error reading directory block (ino 3690583,
block 13)
Jan 20 10:47:03 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117512440+8
Jan 20 10:47:03 server1 kernel: EXT4-fs warning (device drbd0):
__ext4_read_dirblock:1375: error reading directory block (ino 3690583,
block 13)
Jan 20 10:47:03 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117512440+8
Jan 20 10:47:03 server1 kernel: EXT4-fs warning (device drbd0):
__ext4_read_dirblock:1375: error reading directory block (ino 3690583,
block 13)
Jan 20 10:47:03 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117450240+8
Jan 20 10:47:03 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117450248+8
Jan 20 10:47:03 server1 kernel: EXT4-fs error (device drbd0):
__ext4_get_inode_loc:3979: inode #3688997: block 14681282: comm
xymond_history: unable to read itable block
Jan 20 10:47:03 server1 kernel: EXT4-fs (drbd0): previous I/O error to
superblock detected
Jan 20 10:47:03 server1 kernel: buffer_io_error: 159 callbacks suppressed
Jan 20 10:47:03 server1 kernel: Buffer I/O error on dev drbd0, logical
block 0, lost sync page write
Jan 20 10:47:03 server1 kernel: EXT4-fs error (device drbd0):
__ext4_get_inode_loc:3979: inode #3688997: block 14681282: comm
xymond_history: unable to read itable block
Jan 20 10:47:03 server1 kernel: EXT4-fs (drbd0): previous I/O error to
superblock detected
Jan 20 10:47:03 server1 kernel: Buffer I/O error on dev drbd0, logical
block 0, lost sync page write
Jan 20 10:47:03 server1 kernel: EXT4-fs error (device drbd0):
__ext4_get_inode_loc:3979: inode #3688997: block 14681282: comm
xymond_history: unable to read itable block
Jan 20 10:47:03 server1 kernel: EXT4-fs (drbd0): previous I/O error to
superblock detected
Jan 20 10:47:03 server1 kernel: Buffer I/O error on dev drbd0, logical
block 0, lost sync page write
Jan 20 10:51:59 server1 kernel: block drbd0: 100 messages suppressed
in /home/build/akemi/drbd84/el7/BUILD/drbd-8.4.9-1/drbd/drbd_req.c:1447.
Jan 20 10:51:59 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 4194312+8
Jan 20 10:51:59 server1 kernel: EXT4-fs error (device drbd0):
ext4_wait_block_bitmap:494: comm xymond_history: Cannot read block
bitmap - block_group = 17, block_bitmap = 524289
Jan 20 10:51:59 server1 kernel: EXT4-fs (drbd0): previous I/O error to
superblock detected
Jan 20 10:51:59 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 0+8
Jan 20 10:51:59 server1 kernel: Buffer I/O error on dev drbd0, logical
block 0, lost sync page write
Jan 20 10:51:59 server1 kernel: EXT4-fs error (device drbd0):
ext4_discard_preallocations:3995: comm xymond_history: Error loading
buddy information for 17
Jan 20 10:51:59 server1 kernel: EXT4-fs (drbd0): previous I/O error to
superblock detected
Jan 20 10:51:59 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 0+8
Jan 20 10:51:59 server1 kernel: Buffer I/O error on dev drbd0, logical
block 0, lost sync page write
Jan 20 10:54:42 server1 kernel: xymond_history[14970]: segfault at 0
ip 00007fb37d6b5b3d sp 00007ffc0a184090 error 4 in
libc-2.17.so[7fb37d648000+1b7000]
Jan 20 10:55:42 server1 kernel: xymond_history[16324]: segfault at 0
ip 00007f18816dab3d sp 00007ffd91bd2b40 error
4 in libc-2.17.so[7f188166d000+1b7000]
Jan 20 10:55:58 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 118411408+8
Jan 20 10:55:58 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 118411408+8
Jan 20 10:55:58 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117506384+8
Jan 20 10:55:58 server1 kernel: EXT4-fs error (device drbd0):
ext4_find_entry:1312: inode #3670213: comm httpd: reading directory
lblock 0
Jan 20 10:55:58 server1 kernel: EXT4-fs (drbd0): previous I/O error to
superblock detected
Jan 20 10:55:58 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 0+8
Jan 20 10:55:58 server1 kernel: Buffer I/O error on dev drbd0, logical
block 0, lost sync page write
Jan 20 10:55:58 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 118411408+8
Jan 20 10:56:10 server1 kernel: block drbd0: 13 messages suppressed in
/home/build/akemi/drbd84/el7/BUILD/drbd-8.4.9-1/drbd/drbd_req.c:1447.
Jan 20 10:56:10 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117854864+8
Jan 20 10:56:10 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117854864+8
Jan 20 10:56:14 server1 kernel: block drbd0: 2 messages suppressed in
/home/build/akemi/drbd84/el7/BUILD/drbd-8.4.9-1/drbd/drbd_req.c:1447.
Jan 20 10:56:14 server1 kernel: block drbd0: IO ERROR: neither local
nor remote data, sector 117854864+8

...

I use xymon on DRBD area.

. server2(slave)
Jan 20 10:41:16 server2 kernel: block drbd0: disk( UpToDate -> Failed )
Jan 20 10:41:16 server2 kernel: block drbd0: Local IO failed in
drbd_endio_write_sec_final. Detaching...
Jan 20 10:41:16 server2 kernel: block drbd0: pdsk( UpToDate -> Failed )
Jan 20 10:41:16 server2 kernel: block drbd0: Can not satisfy peer's
read request, no local data.
Jan 20 10:41:16 server2 kernel: block drbd0: pdsk( Failed -> Diskless )
Jan 20 10:41:16 server2 kernel: block drbd0: 20 KB (5 bits) marked
out-of-sync by on disk bit-map.
Jan 20 10:41:16 server2 kernel: block drbd0: disk( Failed -> Diskless )
Jan 20 11:29:25 server2 kernel: drbd r0: peer( Primary -> Unknown )
conn( Connected -> Disconnecting ) pdsk( Diskless -> DUnknown )
Jan 20 11:29:25 server2 kernel: drbd r0: ack_receiver terminated
Jan 20 11:29:25 server2 kernel: drbd r0: Terminating drbd_a_r0
Jan 20 11:29:25 server2 kernel: drbd r0: Connection closed
Jan 20 11:29:25 server2 kernel: drbd r0: conn( Disconnecting -> StandAlone )
Jan 20 11:29:25 server2 kernel: drbd r0: receiver terminated
Jan 20 11:29:25 server2 kernel: drbd r0: Terminating drbd_r_r0
Jan 20 11:29:25 server2 kernel: drbd r0: Terminating drbd_w_r0

...


After stopping Pacemaker on server2,
I restarted OS of server1.
After that, after stopping Pacemaker on server2,
restart the OS of server2
As a result, it recovered.

Since the disks of two servers are located in SAN storage
and the server is operating on the virtual infrastructure,
we believe there is no physical failure of the disk.

Please give advice on how to isolate the problem and
prevent recurrence.

Best regards!



More information about the drbd-user mailing list