[DRBD-user] LVM crash maybe due to a drbd issue (Maxence DUNNEWIND)

Heribert Tockner drucko76 at gmail.com
Tue Feb 9 10:16:47 CET 2010

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


thx Lars for your answer, now i can give you some updates.




> it will be used:
>
version: 8.3.7 (api:88/proto:86-91)
GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
root at vserver3-backup, 2010-02-08 10:59:23


> > The Problem comes with the time not only a with load and requests. I made
> > test with dbench and recursive parallel processe on the installed Apache
> on
> > differnet sites on the Vserver in drbd. This produces load about 70
> > and more.
> > There were no Problem. After 5 days the system hangs with drbd. i let
> > you know about my tests.
>
> Are you sure it would not hang without DRBD?
>

yes. without Drbd the systems have no problems
>Funny memleaks somewhere?
I have done some memetests . ALso there are other productiv Standalone
Vmware-Guest without any problems


> TCP stack mistuned to break?
> Does it also hang with DRBD unconnected?
>


> What happens just before the hang?
>
> If you can reproduce with 2.6.22.19-grsec2.1.11-vs2.2.0.7,
> please try to also reproduce with something closer to kernel.org,
> just so we can rule out any strange side effects there.
>
> Good luck.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> __
> please don't Cc me, but send to list   --   I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>


Yesterday i have installed Linux.Vserver 2.6.22.19-vs2.2.0.7 without
Grsecurity and Drbd from  sources  Version: 8.3.7 .

yesterday night the secondary node gets unconnected with following log on
primary server:

Feb  8 21:18:04 vserver3-backup kernel: [40842.249486] block drbd1: PingAck
did not arrive in time.
Feb  8 21:18:04 vserver3-backup kernel: [40842.249678] block drbd1: peer(
Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Feb  8 21:18:04 vserver3-backup kernel: [40842.249700] block drbd1: asender
terminated
Feb  8 21:18:04 vserver3-backup kernel: [40842.249702] block drbd1:
Terminating asender thread
Feb  8 21:18:04 vserver3-backup kernel: [40842.249780] block drbd1: short
read expecting header on sock: r=-512
Feb  8 21:18:04 vserver3-backup kernel: [40842.249988] block drbd1: Creating
new current UUID
Feb  8 21:18:04 vserver3-backup kernel: [40842.250522] block drbd1:
Connection closed
Feb  8 21:18:04 vserver3-backup kernel: [40842.250531] block drbd1: conn(
NetworkFailure -> Unconnected )
Feb  8 21:18:04 vserver3-backup kernel: [40842.250535] block drbd1: receiver
terminated
Feb  8 21:18:04 vserver3-backup kernel: [40842.250537] block drbd1:
Restarting receiver thread
Feb  8 21:18:04 vserver3-backup kernel: [40842.250541] block drbd1: receiver
(re)started
Feb  8 21:18:04 vserver3-backup kernel: [40842.250546] block drbd1: conn(
Unconnected -> WFConnection )
Feb  8 21:18:11 vserver3-backup kernel: [40849.221564] block drbd0: PingAck
did not arrive in time.
Feb  8 21:18:11 vserver3-backup kernel: [40849.221658] block drbd0: peer(
Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate ->
DUnknown )
Feb  8 21:18:11 vserver3-backup kernel: [40849.221675] block drbd0: asender
terminated
Feb  8 21:18:11 vserver3-backup kernel: [40849.221701] block drbd0:
Terminating asender thread
Feb  8 21:18:11 vserver3-backup kernel: [40849.221746] block drbd0: short
read expecting header on sock: r=-512
Feb  8 21:18:11 vserver3-backup kernel: [40849.221903] block drbd0: Creating
new current UUID
Feb  8 21:18:11 vserver3-backup kernel: [40849.222606] block drbd0:
Connection closed
Feb  8 21:18:11 vserver3-backup kernel: [40849.222618] block drbd0: conn(
NetworkFailure -> Unconnected )
Feb  8 21:18:11 vserver3-backup kernel: [40849.222623] block drbd0: receiver
terminated
Feb  8 21:18:11 vserver3-backup kernel: [40849.222628] block drbd0:
Restarting receiver thread
Feb  8 21:18:11 vserver3-backup kernel: [40849.222633] block drbd0: receiver
(re)started
Feb  8 21:18:11 vserver3-backup kernel: [40849.222640] block drbd0: conn(
Unconnected -> WFConnection )

After this time the Secondary Server was not reachable via Network.
log from secondary node before network crash.

Feb  8 21:08:52 vserver3-produktiv ntpd[2008]: synchronized to 85.233.96.33,
stratum 3
Feb  8 21:17:01 vserver3-produktiv /USR/SBIN/CRON[6004]: (root) CMD (   cd /
&& run-parts --report /etc/cron.hourly)

the server was not reachable for 30 minutes after this time pings arrived
the machine (nagios) .

Today i checked the network connection between the hosts. From primary to
secondary pings arrived . Secondary system was very strange . 1 ping returns
afterwards you only could interrupt with Control-C.

i tried to disable the network interface for drbd witch ifconfig eth1 down .
it doesnt seems to work and i interrupted with Control-C.
Only the reboot of the secondary system makes the syncronization and the
network working.


with kind regards an thx for your help

Heribert Tockner
KT-NET Communications GmbH
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100209/329cad94/attachment.htm>


More information about the drbd-user mailing list