Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'm working on getting a few DRBD servers up and running, but I'm having a ton of trouble with it. I've set up DRBD/heartbeat on some smaller servers before with no problems, but this current batch is behaving strangely with no error messages reported anywhere. Basically, any application (so far pound, nfsd, mysql, and ldap, along with a number of filesystem tools) will eventually lock up while working with the mounted drbd volume. It can happen with something as simple as 'ls', or 'touch'. After whatever process locks up, the drbd volume is considered busy and can't be released without fencing it. After a reboot, this happens in less than 3 hours. Killing whatever process is having the problem will do nothing without -9; with -9 they will sometimes become defunct and sometimes do nothing. /proc/drbd shows nothing unusual. This is an example after my second two drbd volumes have locked up, but I managed to unmount and secondary one of them. The second one is locked with mysql and the third with umount. [root at backend1 ben]# cat /proc/drbd version: 8.0.5 (api:86/proto:86) SVN Revision: 3011 build by ben at frontend2.scholar, 2007-08-24 11:35:47 0: cs:Connected st:Secondary/Secondary ds:UpToDate/UpToDate C r--- ns:61448 nr:0 dw:8 dr:61533 al:0 bm:27 lo:0 pe:0 ua:0 ap:0 resync: used:0/31 hits:3831 misses:9 starving:0 dirty:0 changed:9 act_log: used:0/907 hits:2 misses:0 starving:0 dirty:0 changed:0 1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r--- ns:201216 nr:0 dw:512 dr:203845 al:1 bm:44 lo:0 pe:2 ua:0 ap:2 resync: used:0/31 hits:12522 misses:22 starving:0 dirty:0 changed:22 act_log: used:1/907 hits:127 misses:1 starving:0 dirty:0 changed:1 2: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r--- ns:352612 nr:0 dw:356 dr:672073 al:0 bm:48 lo:0 pe:1 ua:0 ap:1 resync: used:0/31 hits:21992 misses:24 starving:0 dirty:0 changed:24 act_log: used:1/907 hits:89 misses:0 starving:0 dirty:0 changed:0 I've got no unusual log messages to show; no SELinux messages, no bus io errors, no failed drives. If anyone can help I'd appreciate it; I'm about to toss out DRBD in a situation where it would be ideal. These are the same machines that prompted my 'WFBitMapT and WFBitMapS' message from earlier this week which has not seen a response, namely, Dell 2950's with hardware RAID (Perc5's on the megaraid driver), x86_64, RHEL5, SELinux enabled. The DRBD volumes live on top of LVM partitions. Kernel is 2.6.18-8.1.8, DRBD is 8.0.5, all mentioned software is either the CentOS5 or RHEL5 version of the particular package. My more complicated drbd.conf follows, but I have experienced this on a machine with only one drbd volume and otherwise similar settings. Thanks, Ben common { net { after-sb-0pri discard-least-changes; after-sb-1pri call-pri-lost-after-sb; after-sb-2pri disconnect; } startup { wfc-timeout 120; degr-wfc-timeout 120; } handlers { pri-lost-after-sb "reboot"; } syncer { rate 400M; al-extents 907; } } resource drbd-ldap { protocol C; on backend1.scholar { device /dev/drbd0; disk /dev/mapper/VolGroup00-LogVol02; address 192.168.65.237:7788; flexible-meta-disk internal; } on backend2.scholar { device /dev/drbd0; disk /dev/mapper/VolGroup00-LogVol02; address 192.168.65.238:7788; meta-disk internal; } } resource drbd-mysql { protocol C; on backend1.scholar { device /dev/drbd1; disk /dev/mapper/VolGroup00-LogVol03; address 192.168.65.237:7789; flexible-meta-disk internal; } on backend2.scholar { device /dev/drbd1; disk /dev/mapper/VolGroup00-LogVol03; address 192.168.65.238:7789; meta-disk internal; } } resource drbd-nfs { protocol C; on backend1.scholar { device /dev/drbd2; disk /dev/mapper/VolGroup00-LogVol04; address 192.168.65.237:7790; flexible-meta-disk internal; } on backend2.scholar { device /dev/drbd2; disk /dev/mapper/VolGroup00-LogVol04; address 192.168.65.238:7790; meta-disk internal; } } Ben Lavender Systems Design MC Dean Europe - Stuttgart +49(0)711 849 50179 +1 703 803 6231 x 7179 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070829/83695555/attachment.htm>