Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
One of our primary drbd devices (/dev/drbd2) got in BrokenPipe status, and all processes accessing their files (mostly ldap and mysql) got frozen, unkillable even with -9. This didn't permit us to switch /dev/drbd into secondary, we had to reboot the primary node. Any hint? Kernel: 2.4.21-37.EL (rhel3.02) cat /proc/drbd (now that it's working, after the reboot) version: 0.7.15 (api:77/proto:74) SVN Revision: 2020 build by [..], 2005-12-21 18:55:34 0: cs:Connected st:Primary/Secondary ld:Consistent ns:38904 nr:148 dw:39052 dr:525 al:4 bm:1 lo:0 pe:0 ua:0 ap:0 1: cs:Connected st:Primary/Secondary ld:Consistent ns:328612 nr:86164 dw:414776 dr:128877 al:537 bm:118 lo:0 pe:0 ua:0 ap:0 2: cs:Connected st:Primary/Secondary ld:Consistent ns:88544 nr:21716 dw:110260 dr:50253 al:0 bm:35 lo:0 pe:0 ua:0 ap:0 3: cs:Connected st:Primary/Secondary ld:Consistent ns:31700 nr:8304 dw:40004 dr:4349 al:10 bm:54 lo:0 pe:0 ua:0 ap:0 4: cs:Connected st:Primary/Secondary ld:Consistent ns:228 nr:208 dw:436 dr:1449 al:0 bm:2 lo:0 pe:0 ua:0 ap:0 5: cs:Connected st:Primary/Secondary ld:Consistent ns:1567340 nr:2024004 dw:3591344 dr:4765217 al:11799 bm:955 lo:0 pe:0 ua:0 ap:0 before the reboot /dev/drbd2 was FWConnection st:Primary/BrokenPipe attached our drdb syslog entries. Regards, Diego. -------------- next part -------------- Jan 9 09:40:29 server kernel: drbd2: [kjournald/1392] sock_sendmsg time expired, ko = 3 Jan 9 09:40:29 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 3 Jan 9 09:40:32 server kernel: drbd2: [kjournald/1392] sock_sendmsg time expired, ko = 2 Jan 9 09:40:32 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 2 Jan 9 09:40:35 server kernel: drbd2: [kjournald/1392] sock_sendmsg time expired, ko = 1 Jan 9 09:40:35 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 1 Jan 9 09:40:38 server kernel: drbd2: drbd_main.c:1088: Connected flags=0x130a Jan 9 09:40:38 server kernel: drbd2: kjournald [1392]: cstate Connected --> NetworkFailure Jan 9 09:40:38 server kernel: drbd2: drbd2_receiver [1008]: cstate NetworkFailure --> BrokenPipe Jan 9 09:40:38 server kernel: drbd2: short read expecting header on sock: r=-512 Jan 9 09:40:38 server kernel: drbd2: asender terminated Jan 9 09:40:38 server kernel: drbd2: short sent UnplugRemote size=8 sent=-1001 Jan 9 09:40:38 server kernel: drbd2: worker terminated Jan 9 09:40:38 server kernel: drbd1: drbd_main.c:1088: Connected flags=0x130a Jan 9 09:40:38 server kernel: drbd1: kjournald [1388]: cstate Connected --> NetworkFailure Jan 9 09:40:38 server kernel: drbd1: drbd1_receiver [1000]: cstate NetworkFailure --> BrokenPipe Jan 9 09:40:38 server kernel: drbd1: short read expecting header on sock: r=-512 Jan 9 09:40:38 server kernel: drbd1: asender terminated Jan 9 09:40:39 server kernel: drbd1: worker terminated Jan 9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate BrokenPipe --> Unconnected Jan 9 09:40:39 server kernel: drbd1: Connection lost. Jan 9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate Unconnected --> WFConnection Jan 9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate WFConnection --> WFReportParams Jan 9 09:40:39 server kernel: drbd1: Handshake successful: DRBD Network Protocol version 74 Jan 9 09:40:39 server kernel: drbd1: Connection established. Jan 9 09:40:39 server kernel: drbd1: I am(P): 1:00000003:00000001:00000016:0000000a:10 Jan 9 09:40:39 server kernel: drbd1: Peer(S): 1:00000003:00000001:00000015:0000000a:01 Jan 9 09:40:39 server kernel: drbd1: drbd1_receiver [1000]: cstate WFReportParams --> WFBitMapS Jan 9 09:40:39 server kernel: drbd1: Primary/Unknown --> Primary/Secondary Jan 9 09:40:40 server kernel: drbd1: drbd1_receiver [1000]: cstate WFBitMapS --> SyncSource Jan 9 09:40:40 server kernel: drbd1: Resync started as SyncSource (need to sync 1548 KB [387 bits set]). Jan 9 09:40:40 server kernel: drbd1: Resync done (total 1 sec; paused 0 sec; 1548 K/sec) Jan 9 09:40:40 server kernel: drbd1: drbd1_worker [26548]: cstate SyncSource --> Connected Jan 9 09:41:01 server kernel: drbd1: [kjournald/1388] sock_sendmsg time expired, ko = 3 Jan 9 09:53:21 server kernel: drbd5: [kjournald/1404] sock_sendmsg time expired, ko = 3 Jan 9 10:01:10 server kernel: drbd4: PingAck did not arrive in time. Jan 9 10:01:10 server kernel: drbd4: drbd4_asender [32393]: cstate Connected --> NetworkFailure Jan 9 10:01:10 server kernel: drbd4: asender terminated Jan 9 10:01:10 server kernel: drbd4: drbd4_receiver [1024]: cstate NetworkFailure --> BrokenPipe Jan 9 10:01:10 server kernel: drbd4: short read expecting header on sock: r=-512 Jan 9 10:01:10 server kernel: drbd4: worker terminated Jan 9 10:01:10 server kernel: drbd4: drbd4_receiver [1024]: cstate BrokenPipe --> Unconnected Jan 9 10:01:10 server kernel: drbd4: Connection lost. Jan 9 10:01:10 server kernel: drbd4: drbd4_receiver [1024]: cstate Unconnected --> WFConnection Jan 9 10:01:12 server kernel: drbd5: PingAck did not arrive in time. Jan 9 10:01:12 server kernel: drbd3: PingAck did not arrive in time. Jan 9 10:01:12 server kernel: drbd3: drbd3_asender [32392]: cstate Connected --> NetworkFailure Jan 9 10:01:12 server kernel: drbd3: asender terminated Jan 9 10:01:12 server kernel: drbd3: drbd3_receiver [1016]: cstate NetworkFailure --> BrokenPipe Jan 9 10:01:12 server kernel: drbd3: short read expecting header on sock: r=-512 Jan 9 10:01:12 server kernel: drbd3: worker terminated Jan 9 10:01:12 server kernel: drbd5: drbd5_asender [32394]: cstate Connected --> NetworkFailure Jan 9 10:01:12 server kernel: drbd3: drbd3_receiver [1016]: cstate BrokenPipe --> Unconnected Jan 9 10:01:12 server kernel: drbd5: asender terminated Jan 9 10:01:12 server kernel: drbd5: drbd5_receiver [1032]: cstate NetworkFailure --> BrokenPipe Jan 9 10:01:12 server kernel: drbd5: short read expecting header on sock: r=-512 Jan 9 10:01:12 server kernel: drbd5: worker terminated Jan 9 10:01:12 server kernel: drbd5: drbd5_receiver [1032]: cstate BrokenPipe --> Unconnected Jan 9 10:01:12 server kernel: drbd3: Connection lost. Jan 9 10:01:12 server kernel: drbd3: drbd3_receiver [1016]: cstate Unconnected --> WFConnection Jan 9 10:01:12 server kernel: drbd5: Connection lost. Jan 9 10:01:12 server kernel: drbd5: drbd5_receiver [1032]: cstate Unconnected --> WFConnection Jan 9 10:01:12 server kernel: drbd1: PingAck did not arrive in time. Jan 9 10:01:12 server kernel: drbd1: drbd1_asender [26552]: cstate Connected --> NetworkFailure Jan 9 10:01:12 server kernel: drbd1: asender terminated Jan 9 10:01:12 server kernel: drbd1: drbd1_receiver [1000]: cstate NetworkFailure --> BrokenPipe Jan 9 10:01:12 server kernel: drbd1: short read expecting header on sock: r=-512 Jan 9 10:01:12 server kernel: drbd1: _drbd_send_page: size=4096 len=656 sent=-4 Jan 9 10:01:12 server kernel: drbd1: short sent UnplugRemote size=8 sent=-1001 Jan 9 10:01:12 server kernel: drbd1: worker terminated Jan 9 10:01:12 server kernel: drbd1: drbd1_receiver [1000]: cstate BrokenPipe --> Unconnected Jan 9 10:01:12 server kernel: drbd0: PingAck did not arrive in time. Jan 9 10:01:12 server kernel: drbd0: drbd0_asender [32389]: cstate Connected --> NetworkFailure Jan 9 10:01:13 server kernel: drbd0: asender terminated Jan 9 10:01:13 server kernel: drbd0: drbd0_receiver [992]: cstate NetworkFailure --> BrokenPipe Jan 9 10:01:13 server kernel: drbd0: short read expecting header on sock: r=-512 Jan 9 10:01:13 server kernel: drbd0: worker terminated Jan 9 10:01:13 server kernel: drbd0: drbd0_receiver [992]: cstate BrokenPipe --> Unconnected Jan 9 10:01:13 server kernel: drbd1: Connection lost. Jan 9 10:01:13 server kernel: drbd1: drbd1_receiver [1000]: cstate Unconnected --> WFConnection Jan 9 10:01:13 server kernel: drbd0: Connection lost. Jan 9 10:01:13 server kernel: drbd0: drbd0_receiver [992]: cstate Unconnected --> WFConnection Jan 9 10:03:50 server kernel: drbd0: drbd0_receiver [992]: cstate WFConnection --> WFReportParams Jan 9 10:03:50 server kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Jan 9 10:03:50 server kernel: drbd0: Connection established. Jan 9 10:03:50 server kernel: drbd0: I am(P): 1:00000003:00000001:00000017:0000000a:10 Jan 9 10:03:50 server kernel: drbd0: Peer(S): 1:00000003:00000001:00000016:0000000a:00 Jan 9 10:03:50 server kernel: drbd0: drbd0_receiver [992]: cstate WFReportParams --> WFBitMapS Jan 9 10:03:50 server kernel: drbd1: drbd1_receiver [1000]: cstate WFConnection --> WFReportParams Jan 9 10:03:50 server kernel: drbd1: Handshake successful: DRBD Network Protocol version 74 Jan 9 10:03:50 server kernel: drbd1: Connection established. Jan 9 10:03:50 server kernel: drbd1: I am(P): 1:00000003:00000001:00000017:0000000a:10 Jan 9 10:03:50 server kernel: drbd1: Peer(S): 1:00000003:00000001:00000016:0000000a:01 Jan 9 10:03:50 server kernel: drbd1: drbd1_receiver [1000]: cstate WFReportParams --> WFBitMapS Jan 9 10:03:50 server kernel: drbd1: Primary/Unknown --> Primary/Secondary Jan 9 10:03:50 server kernel: drbd3: drbd3_receiver [1016]: cstate WFConnection --> WFReportParams Jan 9 10:03:50 server kernel: drbd3: Handshake successful: DRBD Network Protocol version 74 Jan 9 10:03:50 server kernel: drbd3: Connection established. Jan 9 10:03:50 server kernel: drbd3: I am(P): 1:00000003:00000001:00000016:0000000a:10 Jan 9 10:03:50 server kernel: drbd3: Peer(S): 1:00000003:00000001:00000015:0000000a:01 Jan 9 10:03:50 server kernel: drbd3: drbd3_receiver [1016]: cstate WFReportParams --> WFBitMapS Jan 9 10:03:50 server kernel: drbd1: drbd1_receiver [1000]: cstate WFBitMapS --> SyncSource Jan 9 10:03:50 server kernel: drbd1: Resync started as SyncSource (need to sync 4004 KB [1001 bits set]). Jan 9 10:03:50 server kernel: drbd4: drbd4_receiver [1024]: cstate WFConnection --> WFReportParams Jan 9 10:03:50 server kernel: drbd4: Handshake successful: DRBD Network Protocol version 74 Jan 9 10:03:50 server kernel: drbd4: Connection established. Jan 9 10:03:50 server kernel: drbd4: I am(P): 1:00000003:00000001:00000016:0000000a:10 Jan 9 10:03:50 server kernel: drbd4: Peer(S): 1:00000003:00000001:00000015:0000000a:01 Jan 9 10:03:50 server kernel: drbd4: drbd4_receiver [1024]: cstate WFReportParams --> WFBitMapS Jan 9 10:03:51 server kernel: drbd4: Primary/Unknown --> Primary/Secondary Jan 9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate WFConnection --> WFReportParams Jan 9 10:03:51 server kernel: drbd5: Handshake successful: DRBD Network Protocol version 74 Jan 9 10:03:51 server kernel: drbd5: Connection established. Jan 9 10:03:51 server kernel: drbd5: I am(P): 1:00000003:00000001:00000017:0000000a:10 Jan 9 10:03:51 server kernel: drbd5: Peer(S): 1:00000003:00000001:00000016:0000000a:01 Jan 9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate WFReportParams --> WFBitMapS Jan 9 10:03:51 server kernel: drbd3: Primary/Unknown --> Primary/Secondary Jan 9 10:03:51 server kernel: drbd4: drbd4_receiver [1024]: cstate WFBitMapS --> SyncSource Jan 9 10:03:51 server kernel: drbd4: Resync started as SyncSource (need to sync 0 KB [0 bits set]). Jan 9 10:03:51 server kernel: drbd4: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Jan 9 10:03:51 server kernel: drbd4: drbd4_receiver [1024]: cstate SyncSource --> Connected Jan 9 10:03:51 server kernel: drbd3: drbd3_receiver [1016]: cstate WFBitMapS --> SyncSource Jan 9 10:03:51 server kernel: drbd3: Resync started as SyncSource (need to sync 720 KB [180 bits set]). Jan 9 10:03:51 server kernel: drbd3: drbd3_receiver [1016]: cstate SyncSource --> PausedSyncS Jan 9 10:03:51 server kernel: drbd3: Syncer waits for sync group. Jan 9 10:03:51 server kernel: drbd0: Primary/Unknown --> Primary/Secondary Jan 9 10:03:51 server kernel: drbd0: drbd0_receiver [992]: cstate WFBitMapS --> SyncSource Jan 9 10:03:51 server kernel: drbd0: Resync started as SyncSource (need to sync 1048 KB [262 bits set]). Jan 9 10:03:51 server kernel: drbd1: drbd0_receiver [992]: cstate SyncSource --> PausedSyncS Jan 9 10:03:51 server kernel: drbd1: Syncer waits for sync group. Jan 9 10:03:51 server kernel: drbd5: Primary/Unknown --> Primary/Secondary Jan 9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate WFBitMapS --> SyncSource Jan 9 10:03:51 server kernel: drbd5: Resync started as SyncSource (need to sync 9760 KB [2440 bits set]). Jan 9 10:03:51 server kernel: drbd5: drbd5_receiver [1032]: cstate SyncSource --> PausedSyncS Jan 9 10:03:51 server kernel: drbd5: Syncer waits for sync group. Jan 9 10:03:51 server kernel: drbd0: Resync done (total 1 sec; paused 0 sec; 1048 K/sec) Jan 9 10:03:51 server kernel: drbd0: drbd0_worker [607]: cstate SyncSource --> Connected Jan 9 10:03:51 server kernel: drbd1: Syncer continues. Jan 9 10:03:51 server kernel: drbd1: drbd0_worker [607]: cstate PausedSyncS --> SyncSource Jan 9 10:03:51 server kernel: drbd1: Resync done (total 1 sec; paused 0 sec; 4004 K/sec) Jan 9 10:03:51 server kernel: drbd1: drbd1_worker [603]: cstate SyncSource --> Connected Jan 9 10:03:51 server kernel: drbd3: Syncer continues. Jan 9 10:03:51 server kernel: drbd3: drbd1_worker [603]: cstate PausedSyncS --> SyncSource Jan 9 10:03:57 server kernel: drbd3: Resync done (total 6 sec; paused 0 sec; 120 K/sec) Jan 9 10:03:57 server kernel: drbd3: drbd3_worker [601]: cstate SyncSource --> Connected Jan 9 10:03:57 server kernel: drbd5: Syncer continues. Jan 9 10:03:57 server kernel: drbd5: drbd3_worker [601]: cstate PausedSyncS --> SyncSource Jan 9 10:04:05 server kernel: drbd5: Resync done (total 13 sec; paused 6 sec; 1392 K/sec) Jan 9 10:04:05 server kernel: drbd5: drbd5_worker [602]: cstate SyncSource --> Connected Jan 9 10:07:53 server kernel: drbd0: Primary/Secondary --> Secondary/Secondary Jan 9 10:14:07 server kernel: drbd3: Primary/Secondary --> Secondary/Secondary Jan 9 10:14:07 server kernel: drbd5: Primary/Secondary --> Secondary/Secondary