Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Someone?... On 20/09/10 15:43, Proskurin Kirill wrote: > Hello all. > > I fight with strange problem for more than a 3 week. > > What we have: > 2xDell 2950 with Debian 5.0 2.6.32-bpo.5-amd64 from backports with DRBD > inside + OCFS2 > > I make a heavy load by iozone on OCFS2 partition: > iozone -RK -t 4 -s 10g -i 0 -i 1 -i 2 -b /tmp/`hostname`.xls > on both nodes. > > And after a 1-3 hour servers(both) reboots. It is DRBD or OCFS2 related > just because it is not happend on normal partition. OCFS2 developers > look at stack trace what I catch and say what it is not a OCFS2 problem. > I try to send you an screenshot of a stacktrace but run in a 40kb limit > of message(it is 32kb) Below will be some strace what i gave on console. > > I start to think what it is hardware or system. I try 2.6.26 kernel and > updating to testing - not helps at all. > > So - it is hardware or DRBD. Could you please help me to find out there > problem is? > > > > Configs below: > > mail01:~# cat /etc/drbd.d/drbd0.res > resource drbd0 { > > on mail01.fxclub.org { > device /dev/drbd0; > disk /dev/sda9; > address 192.168.1.1:7789; > meta-disk internal; > } > > on mail02.fxclub.org { > device /dev/drbd0; > disk /dev/sda9; > address 192.168.1.2:7789; > meta-disk internal; > } > > } > > > mail01:~# cat /etc/drbd.d/global_common.conf > global { > usage-count yes; > # minor-count dialog-refresh disable-ip-verification > } > > common { > protocol C; > > handlers { > pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f"; > pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f"; > local-io-error "/usr/lib/drbd/notify-io-error.sh; > /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger > ; halt -f"; > outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; > # fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; > split-brain "/usr/lib/drbd/notify-split-brain.sh root"; > # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; > # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p > 15 -- -c 16k"; > # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; > } > > startup { > wfc-timeout 60; > degr-wfc-timeout 30; > outdated-wfc-timeout 15; > become-primary-on both; > # wait-after-sb; > } > > disk { > fencing resource-and-stonith; > no-disk-flushes; > no-md-flushes; > no-disk-barrier; > # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes > # no-disk-drain no-md-flushes max-bio-bvecs > } > > net { > cram-hmac-alg sha1; > shared-secret "password"; > allow-two-primaries; > ping-timeout 20; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > data-integrity-alg sha1; > # Tuning > max-buffers 8000; > max-epoch-size 8000; > sndbuf-size 0; > # snd.buf-size rcvbuf-size timeout connect-int ping-int ping-timeout > max-buffers > # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret > # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork > } > > syncer { > rate 60M; > al-extents 3389; > # rate after al-extents use-rle cpu-mask verify-alg csums-alg > } > } > > P.S. I start to think what it can be a handlers and comented them - not > help. > > Message from syslogd at mail02 at Sep 16 09:03:19 ... > kernel:[92182.173794] ------------[ cut here ]------------ > > Message from syslogd at mail02 at Sep 16 09:03:19 ... > kernel:[92182.173872] invalid opcode: 0000 [#1] SMP > > Message from syslogd at mail02 at Sep 16 09:03:19 ... > kernel:[92182.173899] last sysfs file: /sys/module/ocfs2/refcnt > > > Message from syslogd at mail01 at Sep 16 15:18:37 ... > kernel:[ 1432.310479] ------------[ cut here ]------------ > > Message from syslogd at mail01 at Sep 16 15:18:37 ... > kernel:[ 1432.310648] invalid opcode: 0000 [#1] SMP > > Message from syslogd at mail01 at Sep 16 15:18:37 ... > kernel:[ 1432.310801] last sysfs file: /sys/fs/o2cb/interface_revision > > Message from syslogd at mail01 at Sep 16 15:18:37 ... > kernel:[ 1432.312251] Stack: > > Message from syslogd at mail01 at Sep 16 15:18:37 ... > kernel:[ 1432.312251] Call Trace: > > Message from syslogd at mail01 at Sep 16 15:18:37 ... > kernel:[ 1432.312251] Code: 83 c3 08 48 83 3b 00 eb ec 48 83 fd 10 0f 86 > 89 00 00 00 48 89 ef e8 b9 e8 ff ff 48 89 c7 48 8b 00 84 c0 78 13 66 a9 > 00 c0 75 04 <0f> 0b eb fe 5b 5d 41 5c e9 94 58 fd ff 48 8b 4c 24 18 4c > 8b 4f > -- Best regards, Proskurin Kirill