Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 6/23/06, Lars Ellenberg <Lars.Ellenberg at linbit.com> wrote:
> / 2006-06-23 11:37:19 +0200
> \ Andreas Schader:
> > after a network failure on the crossover between the primary and
> > secondary nodes the secondary went to inconsistent.
> as soon as synchronization starts, the sync target becomes
> inconsistent, and becomes consistent only once the synchronization
> is successfully completed.
that is exactly where my problem lies. The synchronization does not start.
> anything about drbd in the kernel messages?
this is what I found in the kernel log after secondary came up again
after the network failure/power loss:
drbd0: 0 KB marked out-of-sync by on disk bit-map.
drbd0: Found 6 transactions (10 active extents) in activity log.
drbd0: drbdsetup [6402]: cstate Unconfigured --> StandAlone
drbd0: drbdsetup [6438]: cstate StandAlone --> Unconnected
drbd0: drbd0_receiver [6439]: cstate Unconnected --> WFConnection
drbd0: drbd0_receiver [6439]: cstate WFConnection --> WFReportParams
drbd0: Handshake successful: DRBD Network Protocol version 74
drbd0: Connection established.
drbd0: I am(S): 1:00000002:00000003:00000041:00000002:00
drbd0: Peer(P): 1:00000002:00000003:00000042:00000002:10
drbd0: drbd0_receiver [6439]: cstate WFReportParams --> WFBitMapT
drbd0: Secondary/Unknown --> Secondary/Primary
and then a lot of these messages repeat again and again:
drbd0: [drbd0_receiver/6439] sock_sendmsg time expired, ko = 4294967295
drbd0: [drbd0_receiver/6439] sock_sendmsg time expired, ko = 4294967294
> what does /proc/drbd show on both nodes?
primary shows:
version: 0.7.18 (api:78/proto:74)
SVN Revision: 2176 build by root at nas1, 2006-06-22 22:04:56
0: cs:WFBitMapS st:Primary/Secondary ld:Consistent
ns:1052672 nr:0 dw:614491 dr:1063896 al:460 bm:754 lo:0 pe:0 ua:0 ap:0
secondary shows:
version: 0.7.18 (api:78/proto:74)
SVN Revision: 2176 build by root at nas2, 2006-06-22 22:05:30
0: cs:WFBitMapT st:Secondary/Primary ld:Inconsistent
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
> do the figures move? -> watch -n1 cat /proc/drbd
no, nothing is changing at all. both nodes seem to hang
> how big are the devices?
> how much ram do you have?
> how much ram is "free"?
the df data for drbd0:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/drbd0 233373575 93897165 139476410 41% /data1
primary has 2GB RAM, secondary has 1GB
the output of free:
[root at nas1:~]# free
total used free shared buffers cached
Mem: 2075924 1613984 461940 0 57096 1380424
-/+ buffers/cache: 176464 1899460
Swap: 1951888 0 1951888
[root at nas2:~]# free
total used free shared buffers cached
Mem: 1035760 93348 942412 0 4044 36836
-/+ buffers/cache: 52468 983292
Swap: 1951888 0 1951888
> which drbd version?
> kernel version?
0.7.18 on both nodes, both nodes are debian etch.
primary: Linux nas1 2.6.16-1-686-smp #2 SMP
secondary: Linux nas2 2.6.15-1-686 #2
primary has a newer kernel because of a marvel yukon 2 nic that is not
supported in older kernels.
> what applications are running on the boxes currently?
there is nothing running besides drbd and the nfs-kernel-server. the
cluster is supposed to be a just fileserver.
> anything "hanging" on the primary / secondary?
as soon as drbd goes into inconsistent state no access to /data1 is
possible. every file access hangs.
best regards,
Andreas