[DRBD-user] LVM crash maybe due to a drbd issue

Maxence DUNNEWIND maxence at dunnewind.net
Tue Dec 8 07:51:27 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> > 16: cs:NetworkFailure ro:Primary/Unknown ds:UpToDate/DUnknown C r---d
> >     ns:24777340 nr:0 dw:87720268 dr:12029753 al:1082 bm:1582 lo:0 pe:23 ua:0 ap:23 ep:1 wo:b oos:0
> 
> So "it" is probably "hanging" on this one.
> 
> kernel logs of drbd16?
When I do "echo t> /proc/sysrq-trigger", I don't get drbd16 trace, I can only find it in the "runnable tasks" :
Dec  8 07:24:17 z2-3 kernel: [8681874.086989] runnable tasks:
Dec  8 07:24:17 z2-3 kernel: [8681874.086989]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
Dec  8 07:24:17 z2-3 kernel: [8681874.086990] ----------------------------------------------------------------------------------------------------------
Dec  8 07:24:17 z2-3 kernel: [8681874.086993]              kvm 32275 388725280.270968 90612590587   120               0               0               0.000000               0.000000               0.000000 /
Dec  8 07:24:17 z2-3 kernel: [8681874.086998]  drbd16_receiver 30799 388725240.270539 144240968909   120               0               0               0.000000               0.000000               0.000000 /
Dec  8 07:24:17 z2-3 kernel: [8681874.087003]         lvchange  4969 388725240.272576 18560770061   120               0               0               0.000000               0.000000               0.000000 /
Dec  8 07:24:17 z2-3 kernel: [8681874.087008] R           bash 20328 388725240.274621      2113   120               0               0               0.000000               0.000000               0.000000 /


> well, what is living on drbd16?
z2-3:~# drbdsetup /dev/drbd16 show
disk {
        size                    20971520s; # bytes
        on-io-error             detach;
        fencing                 dont-care _is_default;
        max-bio-bvecs           0 _is_default;
}
net {
        timeout                 60 _is_default; # 1/10 seconds
        max-epoch-size          2048 _is_default;
        max-buffers             2048 _is_default;
        unplug-watermark        128 _is_default;
        connect-int             10 _is_default; # seconds
        ping-int                10 _is_default; # seconds
        sndbuf-size             131070 _is_default; # bytes
        rcvbuf-size             131070 _is_default; # bytes
        ko-count                0 _is_default;
        cram-hmac-alg           "md5";
        shared-secret           "eae879cc293277b6ac97089d2edf288d2e97f49e";
        after-sb-0pri           discard-zero-changes;
        after-sb-1pri           consensus;
        after-sb-2pri           disconnect _is_default;
        rr-conflict             disconnect _is_default;
        ping-timeout            5 _is_default; # 1/10 seconds
}
syncer {
        rate                    61440k; # bytes/second
        after                   -1 _is_default;
        al-extents              257;
}
protocol C;
_this_host {
        device                  minor 16;
        disk                    "/dev/all/426965fa-291d-4f2b-8aa7-6d990d272376.disk0_data";
        meta-disk               "/dev/all/426965fa-291d-4f2b-8aa7-6d990d272376.disk0_meta" [ 0 ];
        address                 ipv4 10.10.0.3:11221;
}
_remote_host {
        address                 ipv4 10.10.0.1:11221;
> 
> try to get those kernel logs of drbd16.
> 
The lasts drbd logs I ha ve about drbd16 : 

repeated many times :

Dec  4 11:12:39 z2-3 kernel: [8349976.181008] block drbd16: Restarting receiver thread
Dec  4 11:12:39 z2-3 kernel: [8349976.181011] block drbd16: receiver (re)started
Dec  4 11:12:39 z2-3 kernel: [8349976.181014] block drbd16: conn( Unconnected -> WFConnection ) 
Dec  4 11:13:17 z2-3 kernel: [8350015.044022] block drbd16: Handshake successful: Agreed network protocol version 90
Dec  4 11:13:17 z2-3 kernel: [8350015.045118] block drbd16: Peer authenticated using 16 bytes of 'md5' HMAC
Dec  4 11:13:17 z2-3 kernel: [8350015.045124] block drbd16: conn( WFConnection -> WFReportParams ) 
Dec  4 11:13:17 z2-3 kernel: [8350015.045136] block drbd16: Starting asender thread (from drbd16_receiver [30799])
Dec  4 11:13:17 z2-3 kernel: [8350015.046092] block drbd16: data-integrity-alg: <not-used>
Dec  4 11:13:17 z2-3 kernel: [8350015.046215] block drbd16: drbd_sync_handshake:
Dec  4 11:13:17 z2-3 kernel: [8350015.046217] block drbd16: self 9922706A43335E97:E973E700CD85FF0F:8E697C7BA01FEA03:176B27FC60EE9351 bits:128 flags:0
Dec  4 11:13:17 z2-3 kernel: [8350015.046220] block drbd16: peer E973E700CD85FF0E:0000000000000000:8E697C7BA01FEA02:176B27FC60EE9351 bits:0 flags:0
Dec  4 11:13:17 z2-3 kernel: [8350015.046222] block drbd16: uuid_compare()=1 by rule 7
Dec  4 11:13:17 z2-3 kernel: [8350015.046397] block drbd16: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( DUnknown -> UpToDate ) 
Dec  4 11:13:18 z2-3 kernel: [8350015.091276] block drbd16: conn( WFBitMapS -> SyncSource ) pdsk( UpToDate -> Inconsistent ) 
Dec  4 11:13:18 z2-3 kernel: [8350015.091283] block drbd16: Began resync as SyncSource (will sync 512 KB [128 bits set]).
Dec  4 11:13:18 z2-3 kernel: [8350015.156771] block drbd16: Resync done (total 1 sec; paused 0 sec; 512 K/sec)
Dec  4 11:13:18 z2-3 kernel: [8350015.156777] block drbd16: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) 
Dec  4 12:32:39 z2-3 kernel: [8354776.164507] block drbd16: PingAck did not arrive in time.
Dec  4 12:32:39 z2-3 kernel: [8354776.164535] block drbd16: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) 
Dec  4 12:32:39 z2-3 kernel: [8354776.164543] block drbd16: asender terminated
Dec  4 12:32:39 z2-3 kernel: [8354776.164546] block drbd16: short read expecting header on sock: r=-512
Dec  4 12:32:39 z2-3 kernel: [8354776.164548] block drbd16: Terminating asender thread
Dec  4 12:32:39 z2-3 kernel: [8354776.164560] block drbd16: Creating new current UUID
Dec  4 12:32:39 z2-3 kernel: [8354776.165268] block drbd16: Connection closed
Dec  4 12:32:39 z2-3 kernel: [8354776.165272] block drbd16: conn( NetworkFailure -> Unconnected ) 


Then finally :
Dec  4 12:34:19 z2-3 kernel: [8354876.960011] block drbd16: PingAck did not arrive in time.
Dec  4 12:34:19 z2-3 kernel: [8354876.960041] block drbd16: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) 
Dec  4 12:34:19 z2-3 kernel: [8354876.960049] block drbd16: asender terminated
Dec  4 12:34:19 z2-3 kernel: [8354876.960052] block drbd16: Terminating asender thread
Dec  4 12:34:19 z2-3 kernel: [8354876.960063] block drbd16: short read expecting header on sock: r=-512
Dec  4 12:34:25 z2-3 kernel: [8354882.636014] block drbd16: md_sync_timer expired! Worker calls drbd_md_sync().
Dec  4 12:34:25 z2-3 kernel: [8354882.636437] block drbd16: md_sync_timer expired! Worker calls drbd_md_sync().
Dec  4 12:34:25 z2-3 kernel: [8354882.636439] block drbd16: md_sync_timer expired! Worker calls drbd_md_sync().
Dec  4 12:34:25 z2-3 kernel: [8354882.636441] block drbd16: md_sync_timer expired! Worker calls drbd_md_sync().


Cheers,

Maxence

-- 
Maxence DUNNEWIND
Contact : maxence at dunnewind.net
Site : http://www.dunnewind.net
06 32 39 39 93
GPG : 18AE 61E4 D0B0 1C7C AAC9  E40D 4D39 68DB 0D2E B533
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20091208/83004a79/attachment.pgp>


More information about the drbd-user mailing list