[DRBD-user] DRBD drive failure handling

matth ingersoll matthingersoll at gmail.com
Wed Jul 21 06:03:52 CEST 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I recently had an issue with a drive failure + DRBD.  The setup is:
 - 2 nodes with 12 drives
 - Each drive is a single volume (no raid) mirrored to the other
partner node. 12 DRBD resources total.

Order of events:
1. Drive failed and was replaced. DRBD began rebuilding.
2. The replacement drive resync failed and caused every DRBD
resource/the entire node to hang.
3. I forced a failover (which worked).  Rebooted the hung node with
'reboot -f -n -d' (was already logged in).

The time between step 2 and step 3 was about 25 minutes.

I'm not sure where the freeze took place and was hoping somebody would
have insight.  Is this hardware, kernel, DRBD or another issue.

Some logs and info pertaining to the situation:

kernel: 2.6.28.10 w/SCST patches
distro: Debian 5.0
schedular: deadline

version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
root at node01, 2010-04-18 20:24:44

>From node01 /var/log/kern.log (this went on for a few seconds and then
completely stopped... along with everything else):

Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware
Error [current]
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal
target failure
Jul 21 00:33:54 node01 kernel: end_request: I/O error, dev sdc, sector 795260268
Jul 21 00:33:54 node01 kernel: block drbd1: p read: error=-5
Jul 21 00:33:54 node01 kernel: dev_vdisk: ***ERROR*** cmd
ffff88022d4ba408 returned error -5
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware
Error [current]
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal
target failure
Jul 21 00:33:54 node01 kernel: end_request: I/O error, dev sdc, sector 42568
Jul 21 00:33:54 node01 kernel: block drbd1: p write: error=-5
Jul 21 00:33:54 node01 kernel: block drbd1: Local WRITE failed
sec=42505s size=512
Jul 21 00:33:54 node01 kernel: block drbd1: Resync aborted.
Jul 21 00:33:54 node01 kernel: block drbd1: conn( SyncTarget ->
Connected ) disk( Inconsistent -> Failed )
Jul 21 00:33:54 node01 kernel: block drbd1: Local IO failed in
__req_mod.Detaching...
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware
Error [current]
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal
target failure
Jul 21 00:33:54 node01 kernel: end_request: I/O error, dev sdc, sector 895352191
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352128s
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352192s
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352256s
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352320s
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware
Error [current]
Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal
target failure
Jul 21 00:33:54 node01 kernel: end_request: I/O error, dev sdc, sector 895352447
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352384s
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352448s
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352512s
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352576s
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352640s
Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352704s
.....
Jul 21 00:33:56 node01 kernel: block drbd1: write: error=-5 s=895462528s
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware
Error [current]
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal
target failure
Jul 21 00:33:56 node01 kernel: end_request: I/O error, dev sdc, sector 895462655
Jul 21 00:33:56 node01 kernel: block drbd1: write: error=-5 s=895462592s
Jul 21 00:33:56 node01 kernel: block drbd1: write: error=-5 s=895462656s
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware
Error [current]
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal
target failure
Jul 21 00:33:56 node01 kernel: end_request: I/O error, dev sdc, sector 935371503
Jul 21 00:33:56 node01 kernel: block drbd1: p write: error=-5
Jul 21 00:33:56 node01 kernel: block drbd1: Local WRITE failed
sec=935371440s size=512
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware
Error [current]
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal
target failure
Jul 21 00:33:56 node01 kernel: end_request: I/O error, dev sdc, sector 935372360
Jul 21 00:33:56 node01 kernel: block drbd1: p write: error=-5
Jul 21 00:33:56 node01 kernel: block drbd1: Local WRITE failed
sec=935372297s size=512
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware
Error [current]
Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal
target failure
Jul 21 00:33:56 node01 kernel: end_request: I/O error, dev sdc, sector 613191678
Jul 21 00:33:56 node01 kernel: block drbd1: p write: error=-5
Jul 21 00:33:56 node01 kernel: block drbd1: Local WRITE failed
sec=613191615s size=512

>From node02 during that window:

Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is
in troubles?
Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is
in troubles?
Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is
in troubles?
Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is
in troubles?
Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is
in troubles?


Regards,

Matthew Ingersoll



More information about the drbd-user mailing list