Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I recently had an issue with a drive failure + DRBD. The setup is: - 2 nodes with 12 drives - Each drive is a single volume (no raid) mirrored to the other partner node. 12 DRBD resources total. Order of events: 1. Drive failed and was replaced. DRBD began rebuilding. 2. The replacement drive resync failed and caused every DRBD resource/the entire node to hang. 3. I forced a failover (which worked). Rebooted the hung node with 'reboot -f -n -d' (was already logged in). The time between step 2 and step 3 was about 25 minutes. I'm not sure where the freeze took place and was hoping somebody would have insight. Is this hardware, kernel, DRBD or another issue. Some logs and info pertaining to the situation: kernel: 2.6.28.10 w/SCST patches distro: Debian 5.0 schedular: deadline version: 8.3.7 (api:88/proto:86-91) GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root at node01, 2010-04-18 20:24:44 >From node01 /var/log/kern.log (this went on for a few seconds and then completely stopped... along with everything else): Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal target failure Jul 21 00:33:54 node01 kernel: end_request: I/O error, dev sdc, sector 795260268 Jul 21 00:33:54 node01 kernel: block drbd1: p read: error=-5 Jul 21 00:33:54 node01 kernel: dev_vdisk: ***ERROR*** cmd ffff88022d4ba408 returned error -5 Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal target failure Jul 21 00:33:54 node01 kernel: end_request: I/O error, dev sdc, sector 42568 Jul 21 00:33:54 node01 kernel: block drbd1: p write: error=-5 Jul 21 00:33:54 node01 kernel: block drbd1: Local WRITE failed sec=42505s size=512 Jul 21 00:33:54 node01 kernel: block drbd1: Resync aborted. Jul 21 00:33:54 node01 kernel: block drbd1: conn( SyncTarget -> Connected ) disk( Inconsistent -> Failed ) Jul 21 00:33:54 node01 kernel: block drbd1: Local IO failed in __req_mod.Detaching... Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal target failure Jul 21 00:33:54 node01 kernel: end_request: I/O error, dev sdc, sector 895352191 Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352128s Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352192s Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352256s Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352320s Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] Jul 21 00:33:54 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal target failure Jul 21 00:33:54 node01 kernel: end_request: I/O error, dev sdc, sector 895352447 Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352384s Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352448s Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352512s Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352576s Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352640s Jul 21 00:33:54 node01 kernel: block drbd1: write: error=-5 s=895352704s ..... Jul 21 00:33:56 node01 kernel: block drbd1: write: error=-5 s=895462528s Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal target failure Jul 21 00:33:56 node01 kernel: end_request: I/O error, dev sdc, sector 895462655 Jul 21 00:33:56 node01 kernel: block drbd1: write: error=-5 s=895462592s Jul 21 00:33:56 node01 kernel: block drbd1: write: error=-5 s=895462656s Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal target failure Jul 21 00:33:56 node01 kernel: end_request: I/O error, dev sdc, sector 935371503 Jul 21 00:33:56 node01 kernel: block drbd1: p write: error=-5 Jul 21 00:33:56 node01 kernel: block drbd1: Local WRITE failed sec=935371440s size=512 Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal target failure Jul 21 00:33:56 node01 kernel: end_request: I/O error, dev sdc, sector 935372360 Jul 21 00:33:56 node01 kernel: block drbd1: p write: error=-5 Jul 21 00:33:56 node01 kernel: block drbd1: Local WRITE failed sec=935372297s size=512 Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Sense Key : Hardware Error [current] Jul 21 00:33:56 node01 kernel: sd 0:0:2:0: [sdc] Add. Sense: Internal target failure Jul 21 00:33:56 node01 kernel: end_request: I/O error, dev sdc, sector 613191678 Jul 21 00:33:56 node01 kernel: block drbd1: p write: error=-5 Jul 21 00:33:56 node01 kernel: block drbd1: Local WRITE failed sec=613191615s size=512 >From node02 during that window: Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is in troubles? Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is in troubles? Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is in troubles? Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is in troubles? Jul 21 00:33:54 node02 kernel: block drbd1: Got NegAck packet. Peer is in troubles? Regards, Matthew Ingersoll