[DRBD-user] Loss of Connection

Holgilein lists at loomsday.co.nz
Mon Dec 4 23:31:52 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> Just wondering, is drbd set up to reconnect upon a connection failure
> or to go stand alone?

The devices are configured to go standalone in case of a disconnect.

When I initially set up DRBD on those machines, I wanted to avoid
resyncing with swapped Primary <=> Secondary roles, i.e. destroying
the most recent side of the DRBD device pair on such a connection loss/
resync scenario. So I decided to let the device go stand alone in case
of a disconnect, and resync the devices manually.

In hindsight, I am not too sure any more if that was the best decision -
if DRBD made a such mistake while being on "resync", it would most
certainly do exactly the same thing when I reconnect the devices by
hand.

> Perhaps a network failure is occuring and your
> nodes are getting split-brain, thus no failover is being done (just a
> though here)

Currently, I have 8 DRBD device pairs on this server. Only one loses
its connection sporadically, and Heartbeat does not kick in (which
is GOOD, to say the least).

Nevertheless, I had a combination of errors once:

1) Loss of connection, leading to "Standalone" vs "Unknown" state
2) A server problem that caused a reboot
3) After which the failover node came up on the "Unknown" (aka old)
    side of the device pair.
4) After manually resyncing, I finally had finished the good side,
    as now the bad side had been considered to be the most recent one
    and became SyncSource.
5) Having daily disk backups based on rsync is wise.

I was lucky enough to detect this right after reboot, otherwise
I would have been in real trouble, due to the data loss since the
devices had lost connection.

Anyway, nowadays I am using SEC (http://simple-evcorr.sourceforge.net)
to monitor the syslog and send me an email if there are any DRBD
related errors popping up. It's been a life-saver...

To recap, the question remains: Shall I simply use the reconnect
option for these devices and relax, or shall I be concerned about
the periodic loss of connection?

> On 12/4/06, Holgilein <lists at loomsday.co.nz> wrote:
>> Hi there,
>>
>> I am running drbd-0.7.22 on kernel 2.6.17.11 (including Vserver patch
>> set v2.0.2-rc31), and I am using eth1 (tg3 driver on Broadcom BCM5780
>> Gigabit adapter) on a 2-node-cluster to synchronise the DRBD devices.
>>
>> This network interface is also used to pull backups across the
>> two nodes, and when the link is under heavy load I observed that
>> sometimes (every 5-6 days) DRBD loses its inter-node connection:
>>
>> Dec  5 05:40:45 wgr-host1 kernel: drbd0: [reiserfs/3/1830] sock_sendmsg
>> time expired, ko = 3
>> Dec  5 05:40:48 wgr-host1 kernel: drbd0: [reiserfs/3/1830] sock_sendmsg
>> time expired, ko = 2
>> Dec  5 05:40:51 wgr-host1 kernel: drbd0: [reiserfs/3/1830] sock_sendmsg
>> time expired, ko = 1
>> Dec  5 05:40:54 wgr-host1 kernel: drbd0: reiserfs/3 [1830]: cstate
>> Connected --> NetworkFailure
>> Dec  5 05:40:54 wgr-host1 kernel: drbd0: drbd0_receiver [10068]: cstate
>> NetworkFailure --> BrokenPipe
>> Dec  5 05:40:54 wgr-host1 kernel: drbd0: short read expecting header on
>> sock: r=-512
>> Dec  5 05:40:54 wgr-host1 kernel: drbd0: asender terminated
>> Dec  5 05:40:54 wgr-host1 kernel: drbd0: worker terminated
>> Dec  5 05:40:54 wgr-host1 kernel: drbd0: drbd0_receiver [10068]: cstate
>> BrokenPipe --> Unconnected
>> Dec  5 05:40:54 wgr-host1 kernel: drbd0: Connection lost.
>> Dec  5 05:40:54 wgr-host1 kernel: drbd0: drbd0_receiver [10068]: cstate
>> Unconnected --> StandAlone
>> Dec  5 05:40:54 wgr-host1 kernel: drbd0: receiver terminated
>>
>> Is this a known behaviour, and is there anything I can do to remedy?
>>
>> Many thanks,
>>
>> Holger
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
> 
> 




More information about the drbd-user mailing list