[DRBD-user] Primary hangs after secondary was shut down

Lars Ellenberg lars.ellenberg at linbit.com
Tue Aug 21 21:11:54 CEST 2007


On Tue, Aug 21, 2007 at 07:53:30PM +0200, Stefan Seifert wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Hi!
> 
> These are my first steps in Linux-HA and DRBD land, so please feel free
> to assume, that I did everything wrong.
> 
> I'm using drbd 0.7.23 on kernel 2.6.22 and created a setup with two

you mean, plain 2.6.22? or some 2.6.22.4 ?

this is a pretty basic "test",
so I guess someone else should have noticed this, too.

you know, kernel people always point to the external module first,
and module people point to the kernel first...
can you reproduce with different kernel?

> machines, one Primary and one Secondary. Sync was completed before I
> created a file system and mounted it on the primary. I successfully
> wrote to a file. Then I used openSUSE's init script to shut down the
> secondary (rcdrbd stop). After that, everything drbd-related on the
> primary hang. I can neither write, nor read to the mounted filesystem
> and drbdadm disconnect all and sync hang in D state, too.
> 
> I assume, that this is not intended behavior?
> 
> /proc/drbd on the Primary reads:
> version: 0.7.23 (api:79/proto:74)
> SVN Revision: 2686 build by lmb at dale, 2007-01-15 09:41:57
>  0: cs:Unconnected st:Primary/Unknown ld:Consistent
>     ns:2957148 nr:0 dw:2957148 dr:249 al:601 bm:0 lo:2 pe:0 ua:0 ap:0
> 
> dmesg output, after Secondary went down:
> drbd0: sock was shut down by peer
> drbd0: drbd0_receiver [8889]: cstate Connected --> BrokenPipe
> drbd0: short read expecting header on sock: r=0
> drbd0: asender terminated
> drbd0: worker terminated
> drbd0: drbd0_receiver [8889]: cstate BrokenPipe --> Unconnected
> drbd0: Connection lost.
> drbd0: drbd0_receiver [8889]: cstate Unconnected --> WFConnection
> drbd0: drbd0_receiver [8889]: cstate WFConnection --> WFReportParams
> drbd0: Handshake successful: DRBD Network Protocol version 74
> drbd0: Connection established.
> drbd0: I am(P): 1:00000002:00000001:00000006:00000001:10
> drbd0: Peer(S): 1:00000002:00000001:00000005:00000001:01
> drbd0: drbd0_receiver [8889]: cstate WFReportParams --> WFBitMapS
> drbd0: Primary/Unknown --> Primary/Secondary
> drbd0: drbd0_receiver [8889]: cstate WFBitMapS --> SyncSource
> drbd0: Resync started as SyncSource (need to sync 0 KB [0 bits set]).
> drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
> drbd0: drbd0_receiver [8889]: cstate SyncSource --> Connected
> e1000: eth2: e1000_watchdog: NIC Link is Down
> e1000: eth2: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: RX/TX

these e1000 messages may indicate something bad?

> drbd0: meta connection shut down by peer.
> drbd0: drbd0_asender [9407]: cstate Connected --> NetworkFailure
> drbd0: asender terminated
> drbd0: sock was shut down by peer
> drbd0: drbd0_receiver [8889]: cstate NetworkFailure --> BrokenPipe
> drbd0: short read expecting header on sock: r=0
> drbd0: worker terminated
> drbd0: drbd0_receiver [8889]: cstate BrokenPipe --> Unconnected
> drbd0: Connection lost.
> drbd0: drbd0_receiver [8889]: cstate Unconnected --> WFConnection
> drbd0: drbd0_receiver [8889]: cstate WFConnection --> WFReportParams
> drbd0: Handshake successful: DRBD Network Protocol version 74
> drbd0: meta connection shut down by peer.
> drbd0: asender terminated
> drbd0: drbdsetup [11424]: cstate WFReportParams --> Unconnected
> 
> 
> I probably can free my server by just rebooting, but leaving it for now
> in case you want some more information. My drbd.conf is attached.

does it recover if you say
# sync
or 
# echo 1 > /proc/sys/kernel/sysrq
# echo s > /proc/sysrq-trigger
?

>     device     /dev/drbd0;
>     disk       /dev/system/shared;

what are the pvs of vg system?

any snapshots involved?

anything else in "D" state on that box?

-- 
: Lars Ellenberg                            Tel +43-1-8178292-0  :
: LINBIT Information Technologies GmbH      Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe    http://www.linbit.com :
__
please use the "List-Reply" function of your email client.



More information about the drbd-user mailing list