Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello all. I fight with strange problem for more than a 3 week. What we have: 2xDell 2950 with Debian 5.0 2.6.32-bpo.5-amd64 from backports with DRBD inside + OCFS2 I make a heavy load by iozone on OCFS2 partition: iozone -RK -t 4 -s 10g -i 0 -i 1 -i 2 -b /tmp/`hostname`.xls on both nodes. And after a 1-3 hour servers(both) reboots. It is DRBD or OCFS2 related just because it is not happend on normal partition. OCFS2 developers look at stack trace what I catch(in attachment) and say what it is not a OCFS2 problem. I start to think what it is hardware or system. I try 2.6.26 kernel and updating to testing - not helps at all. So - it is hardware or DRBD. Could you please help me to find out there problem is? Configs below: mail01:~# cat /etc/drbd.d/drbd0.res resource drbd0 { on mail01.fxclub.org { device /dev/drbd0; disk /dev/sda9; address 192.168.1.1:7789; meta-disk internal; } on mail02.fxclub.org { device /dev/drbd0; disk /dev/sda9; address 192.168.1.2:7789; meta-disk internal; } } mail01:~# cat /etc/drbd.d/global_common.conf global { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { wfc-timeout 60; degr-wfc-timeout 30; outdated-wfc-timeout 15; become-primary-on both; # wait-after-sb; } disk { fencing resource-and-stonith; no-disk-flushes; no-md-flushes; no-disk-barrier; # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes # no-disk-drain no-md-flushes max-bio-bvecs } net { cram-hmac-alg sha1; shared-secret "password"; allow-two-primaries; ping-timeout 20; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; data-integrity-alg sha1; # Tuning max-buffers 8000; max-epoch-size 8000; sndbuf-size 0; # snd.buf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork } syncer { rate 60M; al-extents 3389; # rate after al-extents use-rle cpu-mask verify-alg csums-alg } } P.S. I start to think what it can be a handlers and comented them - not help. -- Best regards, Proskurin Kirill -------------- next part -------------- A non-text attachment was scrubbed... Name: debian-bug.jpg Type: image/jpeg Size: 37244 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100920/666ab588/attachment.jpg>