[DRBD-user] Disk failure on secondary brings down primary

Wed Jul 6 18:33:54 CEST 2011

I asked about this before, but maybe I did it in the wrong way. I'll try again,
and be brief.

Setup: Two systems; hypatia is primary, orestes is secondary. OS is Scientific
Linux 5.5: kernel 2.6.18-194.26.1.el5xen; DRBD version drbd-8.3.8.1-30.el5.

On both systems: /dev/sdc1 and /dev/sdd1 make a software RAID1, /dev/md2. DRBD
resource "admin" is device /dev/drbd1 in a Primary/Secondary configuration,
formed from /dev/md2 on both systems.

Here's the problem. There was a hardware failure on one of the RAID1 drives on
the secondary:

Jun  8 01:04:04 orestes kernel: ata4.00: exception Emask 0x40 SAct 0x0 SErr
0x800 action 0x6 frozen

and so on. But for some reason, this led to a problem on the primary:

Jun  8 01:04:39 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg
time expired, ko = 4294967295
Jun  8 01:04:45 hypatia kernel: block drbd1: [drbd1_worker/6650] sock_sendmsg
time expired, ko = 4294967294

From googling, I know this means that DRBD couldn't write to drbd1 anymore.

Any ideas of how this could happen, or anything I could test?

Config file:

global {
	usage-count yes;
}

common {
	protocol A;

	handlers {
		pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
		pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
		local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
	}
	startup {
	}
	disk {
	}
	net {
		ping-timeout 11;
	}
	syncer {
		rate 15M;
	}
}

resource admin {
  device    /dev/drbd1;
  disk      /dev/md2;

  net {
    after-sb-0pri discard-zero-changes;
    after-sb-1pri consensus;
    after-sb-2pri disconnect;
  }
  startup {
    wfc-timeout 60;
    degr-wfc-timeout 60;
    outdated-wfc-timeout 60;
  }
  handlers {
    split-brain "/usr/lib/drbd/notify-split-brain.sh sysadmin at nevis.columbia.edu";
  }

  meta-disk internal;

  on hypatia.nevis.columbia.edu {
    address   192.168.100.7:7789;
  }
  on orestes.nevis.columbia.edu {
    address   192.168.100.6:7789;
  }
}

-- 
Bill Seligman             | Phone: (914) 591-2823
Nevis Labs, Columbia Univ | mailto://seligman@nevis.columbia.edu
PO Box 137                |
Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4497 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110706/6bec7ffc/attachment.bin>