Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I have a problem with a DRBD/HeartBeat setup on a RedHat AS 4 cluster. My cluster is fully functional for some time, then if I stop the services using resources on the /dev/drbdX filesystem, then try simply to umount the file system, the system heavily crashes, but does not seem to kernel panic. The time in which I can still unmount without problem seems to be of some days (I am currently trying to determine it). The cursor still blinks on the screen, I am losing all services on the network (no more logging-in possible by SSH, but the system still answers to ping, which is weird). I have to reboot the server manually to bring it back up (ie unplugging the server). I cannot get any information on the crash exact cause, since I lose almost any interface to the server when it happens. The keyboard is unresponsive (control-alt-del unusable), but the Num Lock key seems still functionnal. I am not wether it may come from DRBD directly, but if I try to unmount another file system (local, not located on a DRBD device), i don't get the problem. I tried to remount RO the file system before trying to unmount it, i got the same crash at unmount, I achieved to remount it without any problem. So, it does not seems to come from a write access to the device. Versions on both nodes of my cluster : - RedHat AS 4 (up-to-date) - Linux kernel : kernel-hugemem-2.6.9-42.0.2.EL (RedHat most actual version) - Linux kernel headers files : kernel-hugemem-devel-2.6.9-42.0.2.EL - DRBD software : 0.7.22 (compiled by hand) - FileSystem type : EXT3 (mount options : defaults,rw) The most recent RedHat kernel revision (kernel-hugemem-2.6.9-42.0.3.EL) has been deployed since, and DRBD 0.7.22 recompiled for this kernel and installed, I cannot yet tell whether it makes the error go away or not. One more information regarding my configuration, it is an active/active cluster (2 DRBD resources), and the crash is symmetrical (it happens on both resources/servers). Please find attached with this mail the configuration of DRBD. So it seems to happen only after some time, and only on the DRBD devices. Googling for that problem didn't brought me to interesting clues at the moment. Does anyone already saw something like that, and by the way has any solution to debug further or to solve that problem? Ty by advance. Charles-Antoine Guillat-Guignard -------------- next part -------------- global { minor-count 2; dialog-refresh 1; # You might disable one of drbdadm's sanity check. # disable-ip-verification; } resource res1 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { wfc-timeout 90; degr-wfc-timeout 180; # 2 minutes. } disk { on-io-error detach; } net { sndbuf-size 512k; timeout 90; # 6 seconds (unit = 0.1 seconds) connect-int 20; # 10 seconds (unit = 1 second) ping-int 10; # 10 seconds (unit = 1 second) max-buffers 4092; max-epoch-size 2048; ko-count 0; on-disconnect reconnect; } syncer { rate 125M; group 1; al-extents 257; } on prod0n { device /dev/drbd0; disk /dev/sdb10; address 172.17.69.1:7788; meta-disk /dev/sdb9[0]; } on prod1n { device /dev/drbd0; disk /dev/sda2; address 172.17.69.2:7788; meta-disk /dev/sda1[0]; } } resource res2 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { wfc-timeout 90; degr-wfc-timeout 180; # 2 minutes. } disk { on-io-error detach; } net { sndbuf-size 512k; timeout 90; # 6 seconds (unit = 0.1 seconds) connect-int 20; # 10 seconds (unit = 1 second) ping-int 10; # 10 seconds (unit = 1 second) max-buffers 4092; max-epoch-size 2048; ko-count 0; on-disconnect reconnect; } syncer { rate 125M; group 2; al-extents 257; } on prodn { device /dev/drbd1; disk /dev/sda2; address 172.17.96.1:7789; meta-disk /dev/sda1[0]; } on prodn { device /dev/drbd1; disk /dev/sdb10; address 172.17.96.2:7789; meta-disk /dev/sdb9[0]; } } -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Ceci est une partie de message num?riquement sign?e URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20070111/6e8b4db8/attachment.pgp>