[DRBD-user] Hosts freezes during 2 or 3 minutes

Jerome Delamarche jd at trickytools.com
Tue Oct 4 11:10:09 CEST 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Thanks for your response Philipp.
Here are my comments:

>> I installed DRBD 0.7.13 on 2 Dell Poweredge 2850.
>> I often experience a freeze of the Primary during several minutes.
>> There is no message logged anywhere, no message on the system console.
>> There is no network problem, no message are logged on the Secondary.
>>
>> After the freeze, everything restarts OK.
>>
>> I know I lack information to help debugging, but anyway, here is the
>> configuration of the servers:
>> - OS = Linux RedHat ES4 Update 1
>> - use a megaraid controler
>> - has 2MB of RAM and two Xeon at 3,2 MHz

> What do you men by freeze:
> * Does it respond to key-strokes ?
No

> * Does it respond to pings ?
No

> * Does it respond to the "Num-Lock" key with toggling the "num-Lock" led ?
I did not check it

> * Is the screen blank, or has it the same content as before the freeze ?
It's hard to say because of the use of a KVM: even in a nomal case, I need
to strike a key to
make the screen being redisplayed. Since the keyboard seems to be locked....

> Could you retry and boot the kernel with the parameter "nmi_watchdog=1"
> on the kernel's command line ?
Yes, I could, but according to the documentation(s), this parameter either
kills processes that make
the host hangs for more than 5 seconds, either it reboots the host.
I prefer to wait 2/3 minutes and recover the normal activity: the users can
use again their network sessions for example.

>> The former version of DRBD fixed a similar bug.
>> But maybe another bug with the same impact still exists ?

> We have got a new test cluster up and running here since about 10 days
> now. It mirrored many many terra bytes of device mapper zero targets
> in the last week.

>  The cluster is build out of two 4 way xeon machines. I do not have
>  the feeling that there is a spinlock deadlock in the latest release.

I did not say the bug was in DRBD, it can be a hardware, driver (RAID
controler) or kernel problem.

Does anyone has a successful experience with the last RedHat kernel
(2.6.9-11EL-SMP) ?
I'm OK to downgrade to another version of the kernel and/or DRBD.

Thanks,

Jerome






More information about the drbd-user mailing list