Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 6/23/06, Lars Ellenberg <Lars.Ellenberg at linbit.com> wrote: > / 2006-06-23 11:37:19 +0200 > \ Andreas Schader: > > after a network failure on the crossover between the primary and > > secondary nodes the secondary went to inconsistent. > as soon as synchronization starts, the sync target becomes > inconsistent, and becomes consistent only once the synchronization > is successfully completed. that is exactly where my problem lies. The synchronization does not start. > anything about drbd in the kernel messages? this is what I found in the kernel log after secondary came up again after the network failure/power loss: drbd0: 0 KB marked out-of-sync by on disk bit-map. drbd0: Found 6 transactions (10 active extents) in activity log. drbd0: drbdsetup [6402]: cstate Unconfigured --> StandAlone drbd0: drbdsetup [6438]: cstate StandAlone --> Unconnected drbd0: drbd0_receiver [6439]: cstate Unconnected --> WFConnection drbd0: drbd0_receiver [6439]: cstate WFConnection --> WFReportParams drbd0: Handshake successful: DRBD Network Protocol version 74 drbd0: Connection established. drbd0: I am(S): 1:00000002:00000003:00000041:00000002:00 drbd0: Peer(P): 1:00000002:00000003:00000042:00000002:10 drbd0: drbd0_receiver [6439]: cstate WFReportParams --> WFBitMapT drbd0: Secondary/Unknown --> Secondary/Primary and then a lot of these messages repeat again and again: drbd0: [drbd0_receiver/6439] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_receiver/6439] sock_sendmsg time expired, ko = 4294967294 > what does /proc/drbd show on both nodes? primary shows: version: 0.7.18 (api:78/proto:74) SVN Revision: 2176 build by root at nas1, 2006-06-22 22:04:56 0: cs:WFBitMapS st:Primary/Secondary ld:Consistent ns:1052672 nr:0 dw:614491 dr:1063896 al:460 bm:754 lo:0 pe:0 ua:0 ap:0 secondary shows: version: 0.7.18 (api:78/proto:74) SVN Revision: 2176 build by root at nas2, 2006-06-22 22:05:30 0: cs:WFBitMapT st:Secondary/Primary ld:Inconsistent ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 > do the figures move? -> watch -n1 cat /proc/drbd no, nothing is changing at all. both nodes seem to hang > how big are the devices? > how much ram do you have? > how much ram is "free"? the df data for drbd0: Filesystem 1K-blocks Used Available Use% Mounted on /dev/drbd0 233373575 93897165 139476410 41% /data1 primary has 2GB RAM, secondary has 1GB the output of free: [root at nas1:~]# free total used free shared buffers cached Mem: 2075924 1613984 461940 0 57096 1380424 -/+ buffers/cache: 176464 1899460 Swap: 1951888 0 1951888 [root at nas2:~]# free total used free shared buffers cached Mem: 1035760 93348 942412 0 4044 36836 -/+ buffers/cache: 52468 983292 Swap: 1951888 0 1951888 > which drbd version? > kernel version? 0.7.18 on both nodes, both nodes are debian etch. primary: Linux nas1 2.6.16-1-686-smp #2 SMP secondary: Linux nas2 2.6.15-1-686 #2 primary has a newer kernel because of a marvel yukon 2 nic that is not supported in older kernels. > what applications are running on the boxes currently? there is nothing running besides drbd and the nfs-kernel-server. the cluster is supposed to be a just fileserver. > anything "hanging" on the primary / secondary? as soon as drbd goes into inconsistent state no access to /data1 is possible. every file access hangs. best regards, Andreas