Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
One of our primary drbd devices (/dev/drbd2) got in BrokenPipe status,
and all processes accessing their files (mostly ldap and mysql)
got frozen, unkillable even with -9.
This didn't permit us to switch /dev/drbd into secondary,
we had to reboot the primary node.
Any hint?
Kernel: 2.4.21-37.EL (rhel3.02)
cat /proc/drbd (now that it's working, after the reboot)
version: 0.7.15 (api:77/proto:74)
SVN Revision: 2020 build by [..], 2005-12-21 18:55:34
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:38904 nr:148 dw:39052 dr:525 al:4 bm:1 lo:0 pe:0 ua:0 ap:0
1: cs:Connected st:Primary/Secondary ld:Consistent
ns:328612 nr:86164 dw:414776 dr:128877 al:537 bm:118 lo:0 pe:0 ua:0
ap:0
2: cs:Connected st:Primary/Secondary ld:Consistent
ns:88544 nr:21716 dw:110260 dr:50253 al:0 bm:35 lo:0 pe:0 ua:0 ap:0
3: cs:Connected st:Primary/Secondary ld:Consistent
ns:31700 nr:8304 dw:40004 dr:4349 al:10 bm:54 lo:0 pe:0 ua:0 ap:0
4: cs:Connected st:Primary/Secondary ld:Consistent
ns:228 nr:208 dw:436 dr:1449 al:0 bm:2 lo:0 pe:0 ua:0 ap:0
5: cs:Connected st:Primary/Secondary ld:Consistent
ns:1567340 nr:2024004 dw:3591344 dr:4765217 al:11799 bm:955 lo:0
pe:0 ua:0 ap:0
before the reboot /dev/drbd2 was FWConnection st:Primary/BrokenPipe
attached our drdb syslog entries.
Regards,
Diego.
-------------- next part --------------
Jan 9 09:40:29 server kernel: drbd2: [kjournald/1392] sock_sendmsg time expired, ko = 3
Jan 9 09:40:29 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 3
Jan 9 09:40:32 server kernel: drbd2: [kjournald/1392] sock_sendmsg time expired, ko = 2
Jan 9 09:40:32 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 2
Jan 9 09:40:35 server kernel: drbd2: [kjournald/1392] sock_sendmsg time expired, ko = 1
Jan 9 09:40:35 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 1
Jan 9 09:40:38 server kernel: drbd2: drbd_main.c:1088: Connected flags=0x130a
Jan 9 09:40:38 server kernel: drbd2: kjournald [1392]: cstate Connected --> NetworkFailure
Jan 9 09:40:38 server kernel: drbd2: drbd2_receiver [1008]: cstate NetworkFailure --> BrokenPipe
Jan 9 09:40:38 server kernel: drbd2: short read expecting header on sock: r=-512
Jan 9 09:40:38 server kernel: drbd2: asender terminated
Jan 9 09:40:38 server kernel: drbd2: short sent UnplugRemote size=8 sent=-1001
Jan 9 09:40:38 server kernel: drbd2: worker terminated
Jan 9 09:40:38 server kernel: drbd1: drbd_main.c:1088: Connected flags=0x130a
Jan 9 09:40:38 server kernel: drbd1: kjournald [1388]: cstate Connected --> NetworkFailure
Jan 9 09:40:38 server kernel: drbd1: drbd1_receiver [1000]: cstate NetworkFailure --> BrokenPipe
Jan 9 09:40:38 server kernel: drbd1: short read expecting header on sock: r=-512
Jan 9 09:40:38 server kernel: drbd1: asender terminated
Jan 9 09:40:39 server kernel: drbd1: worker terminated
Jan 9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate BrokenPipe --> Unconnected
Jan 9 09:40:39 server kernel: drbd1: Connection lost.
Jan 9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate Unconnected --> WFConnection
Jan 9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate WFConnection --> WFReportParams
Jan 9 09:40:39 server kernel: drbd1: Handshake successful: DRBD Network Protocol version 74
Jan 9 09:40:39 server kernel: drbd1: Connection established.
Jan 9 09:40:39 server kernel: drbd1: I am(P): 1:00000003:00000001:00000016:0000000a:10
Jan 9 09:40:39 server kernel: drbd1: Peer(S): 1:00000003:00000001:00000015:0000000a:01
Jan 9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate WFReportParams --> WFBitMapS
Jan 9 09:40:39 server kernel: drbd1: Primary/Unknown --> Primary/Secondary
Jan 9 09:40:40 server kernel: drbd1: drbd1_receiver [1000]: cstate WFBitMapS --> SyncSource
Jan 9 09:40:40 server kernel: drbd1: Resync started as SyncSource (need to sync 1548 KB [387 bits set]).
Jan 9 09:40:40 server kernel: drbd1: Resync done (total 1 sec; paused 0 sec; 1548 K/sec)
Jan 9 09:40:40 server kernel: drbd1: drbd1_worker [26548]: cstate SyncSource --> Connected
Jan 9 09:41:01 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 3
Jan 9 09:53:21 server kernel: drbd5: [kjournald/1404] sock_sendmsg time expired, ko = 3
Jan 9 10:01:10 server kernel: drbd4: PingAck did not arrive in time.
Jan 9 10:01:10 server kernel: drbd4: drbd4_asender [32393]: cstate Connected --> NetworkFailure
Jan 9 10:01:10 server kernel: drbd4: asender terminated
Jan 9 10:01:10 server kernel: drbd4: drbd4_receiver [1024]: cstate NetworkFailure --> BrokenPipe
Jan 9 10:01:10 server kernel: drbd4: short read expecting header on sock: r=-512
Jan 9 10:01:10 server kernel: drbd4: worker terminated
Jan 9 10:01:10 server kernel: drbd4: drbd4_receiver [1024]: cstate BrokenPipe --> Unconnected
Jan 9 10:01:10 server kernel: drbd4: Connection lost.
Jan 9 10:01:10 server kernel: drbd4: drbd4_receiver [1024]: cstate Unconnected --> WFConnection
Jan 9 10:01:12 server kernel: drbd5: PingAck did not arrive in time.
Jan 9 10:01:12 server kernel: drbd3: PingAck did not arrive in time.
Jan 9 10:01:12 server kernel: drbd3: drbd3_asender [32392]: cstate Connected --> NetworkFailure
Jan 9 10:01:12 server kernel: drbd3: asender terminated
Jan 9 10:01:12 server kernel: drbd3: drbd3_receiver [1016]: cstate NetworkFailure --> BrokenPipe
Jan 9 10:01:12 server kernel: drbd3: short read expecting header on sock: r=-512
Jan 9 10:01:12 server kernel: drbd3: worker terminated
Jan 9 10:01:12 server kernel: drbd5: drbd5_asender [32394]: cstate Connected --> NetworkFailure
Jan 9 10:01:12 server kernel: drbd3: drbd3_receiver [1016]: cstate BrokenPipe --> Unconnected
Jan 9 10:01:12 server kernel: drbd5: asender terminated
Jan 9 10:01:12 server kernel: drbd5: drbd5_receiver [1032]: cstate NetworkFailure --> BrokenPipe
Jan 9 10:01:12 server kernel: drbd5: short read expecting header on sock: r=-512
Jan 9 10:01:12 server kernel: drbd5: worker terminated
Jan 9 10:01:12 server kernel: drbd5: drbd5_receiver [1032]: cstate BrokenPipe --> Unconnected
Jan 9 10:01:12 server kernel: drbd3: Connection lost.
Jan 9 10:01:12 server kernel: drbd3: drbd3_receiver [1016]: cstate Unconnected --> WFConnection
Jan 9 10:01:12 server kernel: drbd5: Connection lost.
Jan 9 10:01:12 server kernel: drbd5: drbd5_receiver [1032]: cstate Unconnected --> WFConnection
Jan 9 10:01:12 server kernel: drbd1: PingAck did not arrive in time.
Jan 9 10:01:12 server kernel: drbd1: drbd1_asender [26552]: cstate Connected --> NetworkFailure
Jan 9 10:01:12 server kernel: drbd1: asender terminated
Jan 9 10:01:12 server kernel: drbd1: drbd1_receiver [1000]: cstate NetworkFailure --> BrokenPipe
Jan 9 10:01:12 server kernel: drbd1: short read expecting header on sock: r=-512
Jan 9 10:01:12 server kernel: drbd1: _drbd_send_page: size=4096 len=656 sent=-4
Jan 9 10:01:12 server kernel: drbd1: short sent UnplugRemote size=8 sent=-1001
Jan 9 10:01:12 server kernel: drbd1: worker terminated
Jan 9 10:01:12 server kernel: drbd1: drbd1_receiver [1000]: cstate BrokenPipe --> Unconnected
Jan 9 10:01:12 server kernel: drbd0: PingAck did not arrive in time.
Jan 9 10:01:12 server kernel: drbd0: drbd0_asender [32389]: cstate Connected --> NetworkFailure
Jan 9 10:01:13 server kernel: drbd0: asender terminated
Jan 9 10:01:13 server kernel: drbd0: drbd0_receiver [992]: cstate NetworkFailure --> BrokenPipe
Jan 9 10:01:13 server kernel: drbd0: short read expecting header on sock: r=-512
Jan 9 10:01:13 server kernel: drbd0: worker terminated
Jan 9 10:01:13 server kernel: drbd0: drbd0_receiver [992]: cstate BrokenPipe --> Unconnected
Jan 9 10:01:13 server kernel: drbd1: Connection lost.
Jan 9 10:01:13 server kernel: drbd1: drbd1_receiver [1000]: cstate Unconnected --> WFConnection
Jan 9 10:01:13 server kernel: drbd0: Connection lost.
Jan 9 10:01:13 server kernel: drbd0: drbd0_receiver [992]: cstate Unconnected --> WFConnection
Jan 9 10:03:50 server kernel: drbd0: drbd0_receiver [992]: cstate WFConnection --> WFReportParams
Jan 9 10:03:50 server kernel: drbd0: Handshake successful: DRBD Network Protocol version 74
Jan 9 10:03:50 server kernel: drbd0: Connection established.
Jan 9 10:03:50 server kernel: drbd0: I am(P): 1:00000003:00000001:00000017:0000000a:10
Jan 9 10:03:50 server kernel: drbd0: Peer(S): 1:00000003:00000001:00000016:0000000a:00
Jan 9 10:03:50 server kernel: drbd0: drbd0_receiver [992]: cstate WFReportParams --> WFBitMapS
Jan 9 10:03:50 server kernel: drbd1: drbd1_receiver [1000]: cstate WFConnection --> WFReportParams
Jan 9 10:03:50 server kernel: drbd1: Handshake successful: DRBD Network Protocol version 74
Jan 9 10:03:50 server kernel: drbd1: Connection established.
Jan 9 10:03:50 server kernel: drbd1: I am(P): 1:00000003:00000001:00000017:0000000a:10
Jan 9 10:03:50 server kernel: drbd1: Peer(S): 1:00000003:00000001:00000016:0000000a:01
Jan 9 10:03:50 server kernel: drbd1: drbd1_receiver [1000]: cstate WFReportParams --> WFBitMapS
Jan 9 10:03:50 server kernel: drbd1: Primary/Unknown --> Primary/Secondary
Jan 9 10:03:50 server kernel: drbd3: drbd3_receiver [1016]: cstate WFConnection --> WFReportParams
Jan 9 10:03:50 server kernel: drbd3: Handshake successful: DRBD Network Protocol version 74
Jan 9 10:03:50 server kernel: drbd3: Connection established.
Jan 9 10:03:50 server kernel: drbd3: I am(P): 1:00000003:00000001:00000016:0000000a:10
Jan 9 10:03:50 server kernel: drbd3: Peer(S): 1:00000003:00000001:00000015:0000000a:01
Jan 9 10:03:50 server kernel: drbd3: drbd3_receiver [1016]: cstate WFReportParams --> WFBitMapS
Jan 9 10:03:50 server kernel: drbd1: drbd1_receiver [1000]: cstate WFBitMapS --> SyncSource
Jan 9 10:03:50 server kernel: drbd1: Resync started as SyncSource (need to sync 4004 KB [1001 bits set]).
Jan 9 10:03:50 server kernel: drbd4: drbd4_receiver [1024]: cstate WFConnection --> WFReportParams
Jan 9 10:03:50 server kernel: drbd4: Handshake successful: DRBD Network Protocol version 74
Jan 9 10:03:50 server kernel: drbd4: Connection established.
Jan 9 10:03:50 server kernel: drbd4: I am(P): 1:00000003:00000001:00000016:0000000a:10
Jan 9 10:03:50 server kernel: drbd4: Peer(S): 1:00000003:00000001:00000015:0000000a:01
Jan 9 10:03:50 server kernel: drbd4: drbd4_receiver [1024]: cstate WFReportParams --> WFBitMapS
Jan 9 10:03:51 server kernel: drbd4: Primary/Unknown --> Primary/Secondary
Jan 9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate WFConnection --> WFReportParams
Jan 9 10:03:51 server kernel: drbd5: Handshake successful: DRBD Network Protocol version 74
Jan 9 10:03:51 server kernel: drbd5: Connection established.
Jan 9 10:03:51 server kernel: drbd5: I am(P): 1:00000003:00000001:00000017:0000000a:10
Jan 9 10:03:51 server kernel: drbd5: Peer(S): 1:00000003:00000001:00000016:0000000a:01
Jan 9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate WFReportParams --> WFBitMapS
Jan 9 10:03:51 server kernel: drbd3: Primary/Unknown --> Primary/Secondary
Jan 9 10:03:51 server kernel: drbd4: drbd4_receiver [1024]: cstate WFBitMapS --> SyncSource
Jan 9 10:03:51 server kernel: drbd4: Resync started as SyncSource (need to sync 0 KB [0 bits set]).
Jan 9 10:03:51 server kernel: drbd4: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
Jan 9 10:03:51 server kernel: drbd4: drbd4_receiver [1024]: cstate SyncSource --> Connected
Jan 9 10:03:51 server kernel: drbd3: drbd3_receiver [1016]: cstate WFBitMapS --> SyncSource
Jan 9 10:03:51 server kernel: drbd3: Resync started as SyncSource (need to sync 720 KB [180 bits set]).
Jan 9 10:03:51 server kernel: drbd3: drbd3_receiver [1016]: cstate SyncSource --> PausedSyncS
Jan 9 10:03:51 server kernel: drbd3: Syncer waits for sync group.
Jan 9 10:03:51 server kernel: drbd0: Primary/Unknown --> Primary/Secondary
Jan 9 10:03:51 server kernel: drbd0: drbd0_receiver [992]: cstate WFBitMapS --> SyncSource
Jan 9 10:03:51 server kernel: drbd0: Resync started as SyncSource (need to sync 1048 KB [262 bits set]).
Jan 9 10:03:51 server kernel: drbd1: drbd0_receiver [992]: cstate SyncSource --> PausedSyncS
Jan 9 10:03:51 server kernel: drbd1: Syncer waits for sync group.
Jan 9 10:03:51 server kernel: drbd5: Primary/Unknown --> Primary/Secondary
Jan 9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate WFBitMapS --> SyncSource
Jan 9 10:03:51 server kernel: drbd5: Resync started as SyncSource (need to sync 9760 KB [2440 bits set]).
Jan 9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate SyncSource --> PausedSyncS
Jan 9 10:03:51 server kernel: drbd5: Syncer waits for sync group.
Jan 9 10:03:51 server kernel: drbd0: Resync done (total 1 sec; paused 0 sec; 1048 K/sec)
Jan 9 10:03:51 server kernel: drbd0: drbd0_worker [607]: cstate SyncSource --> Connected
Jan 9 10:03:51 server kernel: drbd1: Syncer continues.
Jan 9 10:03:51 server kernel: drbd1: drbd0_worker [607]: cstate PausedSyncS --> SyncSource
Jan 9 10:03:51 server kernel: drbd1: Resync done (total 1 sec; paused 0 sec; 4004 K/sec)
Jan 9 10:03:51 server kernel: drbd1: drbd1_worker [603]: cstate SyncSource --> Connected
Jan 9 10:03:51 server kernel: drbd3: Syncer continues.
Jan 9 10:03:51 server kernel: drbd3: drbd1_worker [603]: cstate PausedSyncS --> SyncSource
Jan 9 10:03:57 server kernel: drbd3: Resync done (total 6 sec; paused 0 sec; 120 K/sec)
Jan 9 10:03:57 server kernel: drbd3: drbd3_worker [601]: cstate SyncSource --> Connected
Jan 9 10:03:57 server kernel: drbd5: Syncer continues.
Jan 9 10:03:57 server kernel: drbd5: drbd3_worker [601]: cstate PausedSyncS --> SyncSource
Jan 9 10:04:05 server kernel: drbd5: Resync done (total 13 sec; paused 6 sec; 1392 K/sec)
Jan 9 10:04:05 server kernel: drbd5: drbd5_worker [602]: cstate SyncSource --> Connected
Jan 9 10:07:53 server kernel: drbd0: Primary/Secondary --> Secondary/Secondary
Jan 9 10:14:07 server kernel: drbd3: Primary/Secondary --> Secondary/Secondary
Jan 9 10:14:07 server kernel: drbd5: Primary/Secondary --> Secondary/Secondary