[DRBD-user] drbd-0.7.15 one primary device got locked in BrokenPipe status forbidding all i/o operations.

Diego Liziero diego.liziero at comune.carpi.mo.it
Sat Jan 21 08:20:18 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


One of our primary drbd devices (/dev/drbd2) got in BrokenPipe status,
and all processes accessing their files (mostly ldap and mysql)
got frozen, unkillable even with -9.

This didn't permit us to switch /dev/drbd into secondary,
we had to reboot the primary node.

Any hint?

Kernel: 2.4.21-37.EL (rhel3.02)

cat /proc/drbd (now that it's working, after the reboot)
version: 0.7.15 (api:77/proto:74)
SVN Revision: 2020 build by [..], 2005-12-21 18:55:34
 0: cs:Connected st:Primary/Secondary ld:Consistent
    ns:38904 nr:148 dw:39052 dr:525 al:4 bm:1 lo:0 pe:0 ua:0 ap:0
 1: cs:Connected st:Primary/Secondary ld:Consistent
    ns:328612 nr:86164 dw:414776 dr:128877 al:537 bm:118 lo:0 pe:0 ua:0
ap:0
 2: cs:Connected st:Primary/Secondary ld:Consistent
    ns:88544 nr:21716 dw:110260 dr:50253 al:0 bm:35 lo:0 pe:0 ua:0 ap:0
 3: cs:Connected st:Primary/Secondary ld:Consistent
    ns:31700 nr:8304 dw:40004 dr:4349 al:10 bm:54 lo:0 pe:0 ua:0 ap:0
 4: cs:Connected st:Primary/Secondary ld:Consistent
    ns:228 nr:208 dw:436 dr:1449 al:0 bm:2 lo:0 pe:0 ua:0 ap:0
 5: cs:Connected st:Primary/Secondary ld:Consistent
    ns:1567340 nr:2024004 dw:3591344 dr:4765217 al:11799 bm:955 lo:0
pe:0 ua:0 ap:0

before the reboot /dev/drbd2 was FWConnection st:Primary/BrokenPipe

attached our drdb syslog entries.

Regards,
Diego.

-------------- next part --------------
Jan  9 09:40:29 server kernel: drbd2: [kjournald/1392] sock_sendmsg time expired, ko = 3
Jan  9 09:40:29 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 3
Jan  9 09:40:32 server kernel: drbd2: [kjournald/1392] sock_sendmsg time expired, ko = 2
Jan  9 09:40:32 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 2
Jan  9 09:40:35 server kernel: drbd2: [kjournald/1392] sock_sendmsg time expired, ko = 1
Jan  9 09:40:35 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 1
Jan  9 09:40:38 server kernel: drbd2: drbd_main.c:1088: Connected flags=0x130a
Jan  9 09:40:38 server kernel: drbd2: kjournald [1392]: cstate Connected --> NetworkFailure
Jan  9 09:40:38 server kernel: drbd2: drbd2_receiver [1008]: cstate NetworkFailure --> BrokenPipe
Jan  9 09:40:38 server kernel: drbd2: short read expecting header on sock: r=-512
Jan  9 09:40:38 server kernel: drbd2: asender terminated
Jan  9 09:40:38 server kernel: drbd2: short sent UnplugRemote size=8 sent=-1001
Jan  9 09:40:38 server kernel: drbd2: worker terminated
Jan  9 09:40:38 server kernel: drbd1: drbd_main.c:1088: Connected flags=0x130a
Jan  9 09:40:38 server kernel: drbd1: kjournald [1388]: cstate Connected --> NetworkFailure
Jan  9 09:40:38 server kernel: drbd1: drbd1_receiver [1000]: cstate NetworkFailure --> BrokenPipe
Jan  9 09:40:38 server kernel: drbd1: short read expecting header on sock: r=-512
Jan  9 09:40:38 server kernel: drbd1: asender terminated
Jan  9 09:40:39 server kernel: drbd1: worker terminated
Jan  9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate BrokenPipe --> Unconnected
Jan  9 09:40:39 server kernel: drbd1: Connection lost.
Jan  9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate Unconnected --> WFConnection
Jan  9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate WFConnection --> WFReportParams
Jan  9 09:40:39 server kernel: drbd1: Handshake successful: DRBD Network Protocol version 74
Jan  9 09:40:39 server kernel: drbd1: Connection established.
Jan  9 09:40:39 server kernel: drbd1: I am(P): 1:00000003:00000001:00000016:0000000a:10
Jan  9 09:40:39 server kernel: drbd1: Peer(S): 1:00000003:00000001:00000015:0000000a:01
Jan  9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate WFReportParams --> WFBitMapS
Jan  9 09:40:39 server kernel: drbd1: Primary/Unknown --> Primary/Secondary
Jan  9 09:40:40 server kernel: drbd1: drbd1_receiver [1000]: cstate WFBitMapS --> SyncSource
Jan  9 09:40:40 server kernel: drbd1: Resync started as SyncSource (need to sync 1548 KB [387 bits set]).
Jan  9 09:40:40 server kernel: drbd1: Resync done (total 1 sec; paused 0 sec; 1548 K/sec)
Jan  9 09:40:40 server kernel: drbd1: drbd1_worker [26548]: cstate SyncSource --> Connected
Jan  9 09:41:01 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 3
Jan  9 09:53:21 server kernel: drbd5: [kjournald/1404] sock_sendmsg time expired, ko = 3
Jan  9 10:01:10 server kernel: drbd4: PingAck did not arrive in time.
Jan  9 10:01:10 server kernel: drbd4: drbd4_asender [32393]: cstate Connected --> NetworkFailure
Jan  9 10:01:10 server kernel: drbd4: asender terminated
Jan  9 10:01:10 server kernel: drbd4: drbd4_receiver [1024]: cstate NetworkFailure --> BrokenPipe
Jan  9 10:01:10 server kernel: drbd4: short read expecting header on sock: r=-512
Jan  9 10:01:10 server kernel: drbd4: worker terminated
Jan  9 10:01:10 server kernel: drbd4: drbd4_receiver [1024]: cstate BrokenPipe --> Unconnected
Jan  9 10:01:10 server kernel: drbd4: Connection lost.
Jan  9 10:01:10 server kernel: drbd4: drbd4_receiver [1024]: cstate Unconnected --> WFConnection
Jan  9 10:01:12 server kernel: drbd5: PingAck did not arrive in time.
Jan  9 10:01:12 server kernel: drbd3: PingAck did not arrive in time.
Jan  9 10:01:12 server kernel: drbd3: drbd3_asender [32392]: cstate Connected --> NetworkFailure
Jan  9 10:01:12 server kernel: drbd3: asender terminated
Jan  9 10:01:12 server kernel: drbd3: drbd3_receiver [1016]: cstate NetworkFailure --> BrokenPipe
Jan  9 10:01:12 server kernel: drbd3: short read expecting header on sock: r=-512
Jan  9 10:01:12 server kernel: drbd3: worker terminated
Jan  9 10:01:12 server kernel: drbd5: drbd5_asender [32394]: cstate Connected --> NetworkFailure
Jan  9 10:01:12 server kernel: drbd3: drbd3_receiver [1016]: cstate BrokenPipe --> Unconnected
Jan  9 10:01:12 server kernel: drbd5: asender terminated
Jan  9 10:01:12 server kernel: drbd5: drbd5_receiver [1032]: cstate NetworkFailure --> BrokenPipe
Jan  9 10:01:12 server kernel: drbd5: short read expecting header on sock: r=-512
Jan  9 10:01:12 server kernel: drbd5: worker terminated
Jan  9 10:01:12 server kernel: drbd5: drbd5_receiver [1032]: cstate BrokenPipe --> Unconnected
Jan  9 10:01:12 server kernel: drbd3: Connection lost.
Jan  9 10:01:12 server kernel: drbd3: drbd3_receiver [1016]: cstate Unconnected --> WFConnection
Jan  9 10:01:12 server kernel: drbd5: Connection lost.
Jan  9 10:01:12 server kernel: drbd5: drbd5_receiver [1032]: cstate Unconnected --> WFConnection
Jan  9 10:01:12 server kernel: drbd1: PingAck did not arrive in time.
Jan  9 10:01:12 server kernel: drbd1: drbd1_asender [26552]: cstate Connected --> NetworkFailure
Jan  9 10:01:12 server kernel: drbd1: asender terminated
Jan  9 10:01:12 server kernel: drbd1: drbd1_receiver [1000]: cstate NetworkFailure --> BrokenPipe
Jan  9 10:01:12 server kernel: drbd1: short read expecting header on sock: r=-512
Jan  9 10:01:12 server kernel: drbd1: _drbd_send_page: size=4096 len=656 sent=-4
Jan  9 10:01:12 server kernel: drbd1: short sent UnplugRemote size=8 sent=-1001
Jan  9 10:01:12 server kernel: drbd1: worker terminated
Jan  9 10:01:12 server kernel: drbd1: drbd1_receiver [1000]: cstate BrokenPipe --> Unconnected
Jan  9 10:01:12 server kernel: drbd0: PingAck did not arrive in time.
Jan  9 10:01:12 server kernel: drbd0: drbd0_asender [32389]: cstate Connected --> NetworkFailure
Jan  9 10:01:13 server kernel: drbd0: asender terminated
Jan  9 10:01:13 server kernel: drbd0: drbd0_receiver [992]: cstate NetworkFailure --> BrokenPipe
Jan  9 10:01:13 server kernel: drbd0: short read expecting header on sock: r=-512
Jan  9 10:01:13 server kernel: drbd0: worker terminated
Jan  9 10:01:13 server kernel: drbd0: drbd0_receiver [992]: cstate BrokenPipe --> Unconnected
Jan  9 10:01:13 server kernel: drbd1: Connection lost.
Jan  9 10:01:13 server kernel: drbd1: drbd1_receiver [1000]: cstate Unconnected --> WFConnection
Jan  9 10:01:13 server kernel: drbd0: Connection lost.
Jan  9 10:01:13 server kernel: drbd0: drbd0_receiver [992]: cstate Unconnected --> WFConnection
Jan  9 10:03:50 server kernel: drbd0: drbd0_receiver [992]: cstate WFConnection --> WFReportParams
Jan  9 10:03:50 server kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
Jan  9 10:03:50 server kernel: drbd0: Connection established.
Jan  9 10:03:50 server kernel: drbd0: I am(P): 1:00000003:00000001:00000017:0000000a:10
Jan  9 10:03:50 server kernel: drbd0: Peer(S): 1:00000003:00000001:00000016:0000000a:00
Jan  9 10:03:50 server kernel: drbd0: drbd0_receiver [992]: cstate WFReportParams --> WFBitMapS
Jan  9 10:03:50 server kernel: drbd1: drbd1_receiver [1000]: cstate WFConnection --> WFReportParams
Jan  9 10:03:50 server kernel: drbd1: Handshake successful: DRBD Network Protocol version 74
Jan  9 10:03:50 server kernel: drbd1: Connection established.
Jan  9 10:03:50 server kernel: drbd1: I am(P): 1:00000003:00000001:00000017:0000000a:10
Jan  9 10:03:50 server kernel: drbd1: Peer(S): 1:00000003:00000001:00000016:0000000a:01
Jan  9 10:03:50 server kernel: drbd1: drbd1_receiver [1000]: cstate WFReportParams --> WFBitMapS
Jan  9 10:03:50 server kernel: drbd1: Primary/Unknown --> Primary/Secondary
Jan  9 10:03:50 server kernel: drbd3: drbd3_receiver [1016]: cstate WFConnection --> WFReportParams
Jan  9 10:03:50 server kernel: drbd3: Handshake successful: DRBD Network Protocol version 74
Jan  9 10:03:50 server kernel: drbd3: Connection established.
Jan  9 10:03:50 server kernel: drbd3: I am(P): 1:00000003:00000001:00000016:0000000a:10
Jan  9 10:03:50 server kernel: drbd3: Peer(S): 1:00000003:00000001:00000015:0000000a:01
Jan  9 10:03:50 server kernel: drbd3: drbd3_receiver [1016]: cstate WFReportParams --> WFBitMapS
Jan  9 10:03:50 server kernel: drbd1: drbd1_receiver [1000]: cstate WFBitMapS --> SyncSource
Jan  9 10:03:50 server kernel: drbd1: Resync started as SyncSource (need to sync 4004 KB [1001 bits set]).
Jan  9 10:03:50 server kernel: drbd4: drbd4_receiver [1024]: cstate WFConnection --> WFReportParams
Jan  9 10:03:50 server kernel: drbd4: Handshake successful: DRBD Network Protocol version 74
Jan  9 10:03:50 server kernel: drbd4: Connection established.
Jan  9 10:03:50 server kernel: drbd4: I am(P): 1:00000003:00000001:00000016:0000000a:10
Jan  9 10:03:50 server kernel: drbd4: Peer(S): 1:00000003:00000001:00000015:0000000a:01
Jan  9 10:03:50 server kernel: drbd4: drbd4_receiver [1024]: cstate WFReportParams --> WFBitMapS
Jan  9 10:03:51 server kernel: drbd4: Primary/Unknown --> Primary/Secondary
Jan  9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate WFConnection --> WFReportParams
Jan  9 10:03:51 server kernel: drbd5: Handshake successful: DRBD Network Protocol version 74
Jan  9 10:03:51 server kernel: drbd5: Connection established.
Jan  9 10:03:51 server kernel: drbd5: I am(P): 1:00000003:00000001:00000017:0000000a:10
Jan  9 10:03:51 server kernel: drbd5: Peer(S): 1:00000003:00000001:00000016:0000000a:01
Jan  9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate WFReportParams --> WFBitMapS
Jan  9 10:03:51 server kernel: drbd3: Primary/Unknown --> Primary/Secondary
Jan  9 10:03:51 server kernel: drbd4: drbd4_receiver [1024]: cstate WFBitMapS --> SyncSource
Jan  9 10:03:51 server kernel: drbd4: Resync started as SyncSource (need to sync 0 KB [0 bits set]).
Jan  9 10:03:51 server kernel: drbd4: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Jan  9 10:03:51 server kernel: drbd4: drbd4_receiver [1024]: cstate SyncSource --> Connected
Jan  9 10:03:51 server kernel: drbd3: drbd3_receiver [1016]: cstate WFBitMapS --> SyncSource
Jan  9 10:03:51 server kernel: drbd3: Resync started as SyncSource (need to sync 720 KB [180 bits set]).
Jan  9 10:03:51 server kernel: drbd3: drbd3_receiver [1016]: cstate SyncSource --> PausedSyncS
Jan  9 10:03:51 server kernel: drbd3: Syncer waits for sync group.
Jan  9 10:03:51 server kernel: drbd0: Primary/Unknown --> Primary/Secondary
Jan  9 10:03:51 server kernel: drbd0: drbd0_receiver [992]: cstate WFBitMapS --> SyncSource
Jan  9 10:03:51 server kernel: drbd0: Resync started as SyncSource (need to sync 1048 KB [262 bits set]).
Jan  9 10:03:51 server kernel: drbd1: drbd0_receiver [992]: cstate SyncSource --> PausedSyncS
Jan  9 10:03:51 server kernel: drbd1: Syncer waits for sync group.
Jan  9 10:03:51 server kernel: drbd5: Primary/Unknown --> Primary/Secondary
Jan  9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate WFBitMapS --> SyncSource
Jan  9 10:03:51 server kernel: drbd5: Resync started as SyncSource (need to sync 9760 KB [2440 bits set]).
Jan  9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate SyncSource --> PausedSyncS
Jan  9 10:03:51 server kernel: drbd5: Syncer waits for sync group.
Jan  9 10:03:51 server kernel: drbd0: Resync done (total 1 sec; paused 0 sec; 1048 K/sec)
Jan  9 10:03:51 server kernel: drbd0: drbd0_worker [607]: cstate SyncSource --> Connected
Jan  9 10:03:51 server kernel: drbd1: Syncer continues.
Jan  9 10:03:51 server kernel: drbd1: drbd0_worker [607]: cstate PausedSyncS --> SyncSource
Jan  9 10:03:51 server kernel: drbd1: Resync done (total 1 sec; paused 0 sec; 4004 K/sec)
Jan  9 10:03:51 server kernel: drbd1: drbd1_worker [603]: cstate SyncSource --> Connected
Jan  9 10:03:51 server kernel: drbd3: Syncer continues.
Jan  9 10:03:51 server kernel: drbd3: drbd1_worker [603]: cstate PausedSyncS --> SyncSource
Jan  9 10:03:57 server kernel: drbd3: Resync done (total 6 sec; paused 0 sec; 120 K/sec)
Jan  9 10:03:57 server kernel: drbd3: drbd3_worker [601]: cstate SyncSource --> Connected
Jan  9 10:03:57 server kernel: drbd5: Syncer continues.
Jan  9 10:03:57 server kernel: drbd5: drbd3_worker [601]: cstate PausedSyncS --> SyncSource
Jan  9 10:04:05 server kernel: drbd5: Resync done (total 13 sec; paused 6 sec; 1392 K/sec)
Jan  9 10:04:05 server kernel: drbd5: drbd5_worker [602]: cstate SyncSource --> Connected
Jan  9 10:07:53 server kernel: drbd0: Primary/Secondary --> Secondary/Secondary
Jan  9 10:14:07 server kernel: drbd3: Primary/Secondary --> Secondary/Secondary
Jan  9 10:14:07 server kernel: drbd5: Primary/Secondary --> Secondary/Secondary



More information about the drbd-user mailing list