[DRBD-user] secondary node is inconsistent

Andreas Schader andreas.schader at gmail.com
Mon Jun 26 10:55:46 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


On 6/26/06, Lars Ellenberg <Lars.Ellenberg at linbit.com> wrote:
> / 2006-06-24 12:33:38 +0200
> \ Andreas Schader:
> > I found out, that when I reboot both nodes with "shutdown -r now" at
> > the same time the syncing starts after both are up again and soon
> > after that secondary goes back to "Consistent" in /proc/drbd.
>
> is this a dedicated replication link?

yes, each machine has two NICs, one for the NFS lan and one connected
with a crossover cable with 1Gbit.

> as a workaround for whatever the real problem is with your setup,
> try to minimize nfs-activity, and do <...>

I minimised io load and nfs traffic by turning off the
nfs-kernel-server and doing only some local testing on the primary
node.

Here is what I did to get drbd to stop working:

# create a file on the primary drbd node
[root at nas1:/data1]# echo hello > /data1/testfile1

# unplug the network cable of the crossover link
# to simulate a network failure

# make changes to the primary file system while the secondary is not syncing
[root at nas1:/data1]# echo hello > /data1/testfile2

# reconnect the network cable

# after just a few bytes changed on disk secondary goes Inconsistent
[root at nas2:~]# cat /proc/drbd
version: 0.7.18 (api:78/proto:74)
SVN Revision: 2176 build by root at nas2, 2006-06-22 22:05:30
 0: cs:WFBitMapT st:Secondary/Primary ld:Inconsistent
    ns:0 nr:1623114 dw:1623114 dr:0 al:0 bm:572 lo:0 pe:0 ua:0 ap:0


# some more disk activity on primary while secondary is Inconsistent
[root at nas1:/data1]# echo hello > /data1/testfile3
[root at nas1:/data1]# echo hello > /data1/testfile4
# the testfile4 echo already hangs and never returns to the prompt

# to get primary working again I disconnect the resources
# this causes primary to finish the testfile4 echo and return to the prompt
[root at nas2:~]# drbdadm disconnect all

# now I try the suggested workaround
[root at nas1:/data1]# perl -e '$x = "X" x (500*1024*1024)'
[root at nas2:~]# perl -e '$x = "X" x (500*1024*1024)'
[root at nas2:~]# drbdadm connect all
[root at nas2:~]# cat /proc/drbd
version: 0.7.18 (api:78/proto:74)
SVN Revision: 2176 build by root at nas2, 2006-06-22 22:05:30
 0: cs:WFBitMapT st:Secondary/Primary ld:Inconsistent
    ns:0 nr:0 dw:1623114 dr:0 al:0 bm:572 lo:0 pe:0 ua:0 ap:0


but secondary remains inconsistent and still prints thousands of
drbd0: [drbd0_receiver/9922] sock_sendmsg time expired, ko = 4294967281
lines in syslog.

after I rebooted both nodes it was working again.

> some technical background:

I skiped the io system analysis for now, because I don't think this is
causing the problems because it can be simulated with very small
changes to the filesystem which shouldn't have an impact on the
performance of the disks. And to be honest I lack the experience with
the tools you suggested to know what I am looking for anyway ;-)

I will try to get hold of some other hardware and will try to test it
with smaller drbd devices. But in the meantime any more thoughts on
this would be really appreciated.

best regards,
Andreas



More information about the drbd-user mailing list