[DRBD-user] secondary node is inconsistent

Andreas Schader andreas.schader at gmail.com
Fri Jun 23 14:53:53 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 6/23/06, Lars Ellenberg <Lars.Ellenberg at linbit.com> wrote:
> / 2006-06-23 11:37:19 +0200
> \ Andreas Schader:
> > after a network failure on the crossover between the primary and
> > secondary nodes the secondary went to inconsistent.

> as soon as synchronization starts, the sync target becomes
> inconsistent, and becomes consistent only once the synchronization
> is successfully completed.

that is exactly where my problem lies. The synchronization does not start.

> anything about drbd in the kernel messages?

this is what I found in the kernel log after secondary came up again
after the network failure/power loss:

drbd0: 0 KB marked out-of-sync by on disk bit-map.
drbd0: Found 6 transactions (10 active extents) in activity log.
drbd0: drbdsetup [6402]: cstate Unconfigured --> StandAlone
drbd0: drbdsetup [6438]: cstate StandAlone --> Unconnected
drbd0: drbd0_receiver [6439]: cstate Unconnected --> WFConnection
drbd0: drbd0_receiver [6439]: cstate WFConnection --> WFReportParams
drbd0: Handshake successful: DRBD Network Protocol version 74
drbd0: Connection established.
drbd0: I am(S): 1:00000002:00000003:00000041:00000002:00
drbd0: Peer(P): 1:00000002:00000003:00000042:00000002:10
drbd0: drbd0_receiver [6439]: cstate WFReportParams --> WFBitMapT
drbd0: Secondary/Unknown --> Secondary/Primary

and then a lot of these messages repeat again and again:
drbd0: [drbd0_receiver/6439] sock_sendmsg time expired, ko = 4294967295
drbd0: [drbd0_receiver/6439] sock_sendmsg time expired, ko = 4294967294


> what does /proc/drbd show on both nodes?

primary shows:
version: 0.7.18 (api:78/proto:74)
SVN Revision: 2176 build by root at nas1, 2006-06-22 22:04:56
 0: cs:WFBitMapS st:Primary/Secondary ld:Consistent
    ns:1052672 nr:0 dw:614491 dr:1063896 al:460 bm:754 lo:0 pe:0 ua:0 ap:0

secondary shows:
version: 0.7.18 (api:78/proto:74)
SVN Revision: 2176 build by root at nas2, 2006-06-22 22:05:30
 0: cs:WFBitMapT st:Secondary/Primary ld:Inconsistent
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0

> do the figures move? -> watch -n1 cat /proc/drbd

no, nothing is changing at all. both nodes seem to hang

> how big are the devices?
> how much ram do you have?
> how much ram is "free"?

the df data for drbd0:
Filesystem 1K-blocks  Used     Available Use% Mounted on
/dev/drbd0 233373575  93897165 139476410  41% /data1

primary has 2GB RAM, secondary has 1GB

the output of free:
[root at nas1:~]# free
             total       used       free     shared    buffers     cached
Mem:       2075924    1613984     461940          0      57096    1380424
-/+ buffers/cache:     176464    1899460
Swap:      1951888          0    1951888

[root at nas2:~]# free
             total       used       free     shared    buffers     cached
Mem:       1035760      93348     942412          0       4044      36836
-/+ buffers/cache:      52468     983292
Swap:      1951888          0    1951888

> which drbd version?
> kernel version?

0.7.18 on both nodes, both nodes are debian etch.
primary: Linux nas1 2.6.16-1-686-smp #2 SMP
secondary: Linux nas2 2.6.15-1-686 #2

primary has a newer kernel because of a marvel yukon 2 nic that is not
supported in older kernels.

> what applications are running on the boxes currently?

there is nothing running besides drbd and the nfs-kernel-server. the
cluster is supposed to be a just fileserver.

> anything "hanging" on the primary / secondary?

as soon as drbd goes into inconsistent state no access to /data1 is
possible. every file access hangs.


best regards,
Andreas



More information about the drbd-user mailing list