Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars, I can simulate it and I think it's related to getting a: "PingAck did not arrive in time" on the slave, during a 'proper or not so proper' reboot of the master. => both systems are consistent, but the HA software put's the slave into primary (because the master is dead) and when the master commes back, they both think they are consistent although they are different. Maybe the 'longest a-live' should be consisered as real- consistent.... Please enjoy the scenario below. Regards alex PS: let me know if I can help with further testing to make 0.7.x rock-solid. Master =================================== This is the master and no cmd-line parameters are executed. The slave was rebooted ... 10:43:11 SYSLOG err kernel drbd0: meta connection shut down by peer. 10:43:11 SYSLOG info kernel drbd0: drbd0_asender [2173]: cstate Connected --> NetworkFailure 10:43:11 SYSLOG info kernel drbd0: asender terminated 10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate NetworkFailure --> BrokenPipe 10:43:11 SYSLOG err kernel drbd0: short read expecting header on sock: r=-512 10:43:11 SYSLOG info kernel drbd0: worker terminated 10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate BrokenPipe --> Unconnected 10:43:11 SYSLOG info kernel drbd0: Connection lost. 10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Unconnected --> WFConnection and comes back ... 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFConnection --> WFReportParams 10:44:45 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 10:44:45 SYSLOG info kernel drbd0: Connection established. 10:44:45 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:00000009:00000006:10 10:44:45 SYSLOG info kernel drbd0: Peer(S): 1:00000002:00000001:00000008:00000006:01 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFReportParams --> WFBitMapS 10:44:45 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFBitMapS --> SyncSource 10:44:45 SYSLOG info kernel drbd0: Resync started as SyncSource (need to sync 228 KB [57 bits set]). but the slave's network driver blocked ... 10:44:58 SYSLOG err kernel drbd0: PingAck did not arrive in time. 10:44:58 SYSLOG info kernel drbd0: drbd0_asender [30515]: cstate SyncSource --> NetworkFailure 10:44:58 SYSLOG info kernel drbd0: asender terminated 10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate NetworkFailure --> BrokenPipe 10:44:58 SYSLOG err kernel drbd0: short read expecting header on sock: r=-512 10:44:58 SYSLOG info kernel drbd0: worker terminated 10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate BrokenPipe --> Unconnected 10:44:58 SYSLOG info kernel drbd0: Connection lost. 10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Unconnected --> WFConnection and the slave commes back with another network driver .... 11:35:28 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFConnection --> WFReportParams 11:35:28 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:35:28 SYSLOG info kernel drbd0: Connection established. 11:35:28 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:0000000a:00000006:10 11:35:28 SYSLOG info kernel drbd0: Peer(S): 0:00000002:00000001:00000009:00000006:01 11:35:28 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFReportParams --> WFBitMapS 11:35:28 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFBitMapS --> SyncSource 11:35:29 SYSLOG info kernel drbd0: Resync started as SyncSource (need to sync 3456 KB [864 bits set]). 11:35:29 SYSLOG info kernel drbd: sendpage() omitted: 77/1562295 [XFS' broken IO requests?] 11:35:39 SYSLOG info kernel drbd0: Resync done (total 10 sec; paused 0 sec; 344 K/sec) 11:35:39 SYSLOG info kernel drbd0: drbd0_worker [30516]: cstate SyncSource --> Connected slave is rebooted again with another network driver 11:50:56 SYSLOG info kernel drbd0: sock was shut down by peer 11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Connected --> BrokenPipe 11:50:56 SYSLOG err kernel drbd0: short read expecting header on sock: r=0 11:50:56 SYSLOG info kernel drbd0: asender terminated 11:50:56 SYSLOG info kernel drbd0: worker terminated 11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate BrokenPipe --> Unconnected 11:50:56 SYSLOG info kernel drbd0: Connection lost. 11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Unconnected --> WFConnection and here he's back 11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFConnection --> WFReportParams 11:52:25 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:52:25 SYSLOG info kernel drbd0: Connection established. 11:52:25 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:0000000b:00000006:10 11:52:25 SYSLOG info kernel drbd0: Peer(S): 1:00000002:00000001:0000000a:00000006:01 11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFReportParams --> WFBitMapS 11:52:25 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary 11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFBitMapS --> SyncSource 11:52:25 SYSLOG info kernel drbd0: Resync started as SyncSource (need to sync 152 KB [38 bits set]). 11:52:26 SYSLOG info kernel drbd0: Resync done (total 1 sec; paused 0 sec; 152 K/sec) 11:52:26 SYSLOG info kernel drbd0: drbd0_worker [2050]: cstate SyncSource --> Connected BUT now, I reboot te master 11:55:52 SYSLOG info init Switching to runlevel: 6 11:57:06 SYSLOG info init Re-reading inittab and follow this and around the same moment in the slave's log 11:57:35 AXSDBG debug boot.pl[327] Service::system(170) run modprobe drbd 11:57:35 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74) 11:57:35 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 11:57:35 SYSLOG info kernel drbd: registered as block device major 147 11:57:35 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0 11:57:35 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128 11:57:35 SYSLOG info kernel drbd0: size = 6160384 KB 11:57:36 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map. 11:57:36 SYSLOG info kernel drbd0: Found 4 transactions (112 active extents) in activity log. 11:57:36 SYSLOG info kernel drbd0: Marked additional 30208 KB as out-of-sync based on AL. 11:57:36 SYSLOG info kernel drbd0: drbdsetup [701]: cstate Unconfigured --> StandAlone 11:57:36 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000 11:57:36 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.228 192.168.5.229 C 11:57:36 SYSLOG info kernel drbd0: drbdsetup [714]: cstate StandAlone --> Unconnected 11:57:36 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate Unconnected --> WFConnection 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) > drbdStatus 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) cs => WFConnection 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) st => Secondary/Unknown 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ld => Consistent 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ns => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) nr => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dw => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dr => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) al => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) bm => 59 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) lo => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) pe => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ua => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ap => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) < drbdStatus 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) > drbdStatus 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) cs => WFConnection 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) st => Secondary/Unknown 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ld => Consistent 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ns => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) nr => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dw => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dr => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) al => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) bm => 59 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) lo => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) pe => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ua => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ap => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) < drbdStatus 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate WFConnection --> WFReportParams 11:57:37 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:57:37 SYSLOG info kernel drbd0: Connection established. 11:57:37 SYSLOG info kernel drbd0: I am(S): 1:00000002:00000001:0000000c:00000006:10 11:57:37 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:0000000b:00000007:10 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate WFReportParams --> WFBitMapS 11:57:37 SYSLOG err kernel drbd0: sock_sendmsg returned -104 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate WFBitMapS --> BrokenPipe 11:57:37 SYSLOG err kernel drbd0: short sent ReportBitMap size=4096 sent=240 11:57:37 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary 11:57:37 SYSLOG err kernel drbd0: meta connection shut down by peer. 11:57:37 SYSLOG info kernel drbd0: asender terminated 11:57:37 SYSLOG info kernel drbd0: sock was shut down by peer 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate BrokenPipe --> BrokenPipe 11:57:37 SYSLOG err kernel drbd0: short read expecting header on sock: r=0 11:57:37 SYSLOG info kernel drbd0: worker terminated 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate BrokenPipe --> Unconnected 11:57:37 SYSLOG info kernel drbd0: Connection lost. 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate Unconnected --> WFConnection 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) > drbdStatus 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) cs => WFConnection 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) st => Secondary/Unknown 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ld => Consistent 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ns => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) nr => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dw => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dr => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) al => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) bm => 59 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) lo => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) pe => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ua => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ap => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) < drbdStatus # cat /proc/drbd version: 0.7.2 (api:75/proto:74) SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 0: cs:WFConnection st:Primary/Unknown ld:Consistent ns:0 nr:0 dw:18136 dr:1029 al:0 bm:59 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured # drbdsetup /dev/drbd0 show Lower device: 58:02 (lvm(58,2)) Meta device: 58:03 (lvm(58,3)) Meta index: 0 Disk options: Local address: 192.168.5.228:7788 Remote address: 192.168.5.229:7788 Wire protocol: C Net options: timeout = 6.0 sec (default) connect-int = 10 sec (default) ping-int = 10 sec (default) max-epoch-size = 2048 (default) max-buffers = 2048 (default) sndbuf-size = 131070 (default) ko-count = 0 (default) Syncer options: rate = 10000 KB/sec group = 0 (default) al-extents = 127 (default) # lsmod | grep drbd drbd 128548 3 Slave =================================== slave is booting here and you can also see the used commands (drbdsetup ...) (/dev/lv/data is LVM1 data partition, /dev/lv/drbd-meta is a LVM1 partition for the meta data) 10:44:45 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74) 10:44:45 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 10:44:45 SYSLOG info kernel drbd: registered as block device major 147 10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0 10:44:45 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128 10:44:45 SYSLOG info kernel drbd0: size = 6160384 KB 10:44:45 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map. 10:44:45 SYSLOG warning kernel drbd0: No usable activity log found. 10:44:45 SYSLOG info kernel drbd0: drbdsetup [733]: cstate Unconfigured --> StandAlone 10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000 10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.229 192.168.5.228 C 10:44:45 SYSLOG info kernel drbd0: drbdsetup [746]: cstate StandAlone --> Unconnected 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFConnection --> WFReportParams 10:44:45 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 10:44:45 SYSLOG info kernel drbd0: Connection established. 10:44:45 SYSLOG info kernel drbd0: I am(S): 1:00000002:00000001:00000008:00000006:01 10:44:45 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:00000009:00000006:10 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFReportParams --> WFBitMapT 10:44:45 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary 10:44:46 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFBitMapT --> SyncTarget 10:44:46 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to sync 228 KB [57 bits set]). 10:44:46 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 wait_sync and here the slave's (this one) network device died. 10:44:59 SYSLOG err kernel drbd0: meta connection shut down by peer. 10:44:59 SYSLOG info kernel drbd0: drbd0_asender [748]: cstate SyncTarget --> NetworkFailure 10:44:59 SYSLOG info kernel drbd0: asender terminated 10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate NetworkFailure --> BrokenPipe 10:44:59 SYSLOG warning kernel drbd0: short read receiving data block: read 1040 expected 4096 10:44:59 SYSLOG err kernel drbd0: error receiving RSDataReply, l: 4112! 10:44:59 SYSLOG info kernel drbd0: worker terminated 10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate BrokenPipe --> Unconnected 10:44:59 SYSLOG info kernel drbd0: Connection lost. 10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection reboot of the slave with another network driver 11:35:29 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74) 11:35:29 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 11:35:29 SYSLOG info kernel drbd: registered as block device major 147 11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0 11:35:29 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128 11:35:29 SYSLOG info kernel drbd0: size = 6160384 KB 11:35:29 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map. 11:35:29 SYSLOG warning kernel drbd0: No usable activity log found. 11:35:29 SYSLOG info kernel drbd0: drbdsetup [734]: cstate Unconfigured --> StandAlone 11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000 11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.229 192.168.5.228 C 11:35:29 SYSLOG info kernel drbd0: drbdsetup [747]: cstate StandAlone --> Unconnected 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate Unconnected --> WFConnection 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate WFConnection --> WFReportParams 11:35:29 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:35:29 SYSLOG info kernel drbd0: Connection established. 11:35:29 SYSLOG info kernel drbd0: I am(S): 0:00000002:00000001:00000009:00000006:01 11:35:29 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:0000000a:00000006:10 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate WFReportParams --> WFBitMapT 11:35:29 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate WFBitMapT --> SyncTarget 11:35:29 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to sync 3456 KB [864 bits set]). 11:35:30 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 wait_sync 11:35:40 SYSLOG info kernel drbd0: Resync done (total 10 sec; paused 0 sec; 344 K/sec) 11:35:40 SYSLOG info kernel drbd0: drbd0_worker [735]: cstate SyncTarget --> Connected another reboot with yet another network driver 11:52:25 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74) 11:52:25 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 11:52:25 SYSLOG info kernel drbd: registered as block device major 147 11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0 11:52:25 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128 11:52:25 SYSLOG info kernel drbd0: size = 6160384 KB 11:52:25 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map. 11:52:25 SYSLOG warning kernel drbd0: No usable activity log found. 11:52:25 SYSLOG info kernel drbd0: drbdsetup [733]: cstate Unconfigured --> StandAlone 11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000 11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.229 192.168.5.228 C 11:52:25 SYSLOG info kernel drbd0: drbdsetup [746]: cstate StandAlone --> Unconnected 11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection 11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFConnection --> WFReportParams 11:52:26 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:52:26 SYSLOG info kernel drbd0: Connection established. 11:52:26 SYSLOG info kernel drbd0: I am(S): 1:00000002:00000001:0000000a:00000006:01 11:52:26 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:0000000b:00000006:10 11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFReportParams --> WFBitMapT 11:52:26 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary 11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFBitMapT --> SyncTarget 11:52:26 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to sync 152 KB [38 bits set]). 11:52:27 SYSLOG info kernel drbd0: Resync done (total 1 sec; paused 0 sec; 152 K/sec) 11:52:27 SYSLOG info kernel drbd0: drbd0_worker [734]: cstate SyncTarget --> Connected 11:52:27 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 wait_sync from other logging I know that wait_sync return and /dev/drbd reports that the ld is consistent. but then suddenly...(this is in the middle of the master's reboot) 11:56:15 SYSLOG err kernel drbd0: PingAck did not arrive in time. 11:56:15 SYSLOG info kernel drbd0: drbd0_asender [748]: cstate Connected --> NetworkFailure 11:56:15 SYSLOG info kernel drbd0: asender terminated 11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate NetworkFailure --> BrokenPipe 11:56:15 SYSLOG err kernel drbd0: short read expecting header on sock: r=-512 11:56:15 SYSLOG info kernel drbd0: worker terminated 11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate BrokenPipe --> Unconnected 11:56:15 SYSLOG info kernel drbd0: Connection lost. 11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(219) > drbdStatus 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) cs => WFConnection 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) st => Secondary/Unknown 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) ld => Consistent 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) ns => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) nr => 756 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) dw => 756 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) dr => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) al => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) bm => 6 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) lo => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) pe => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) ua => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) ap => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(239) < drbdStatus my HA software detects a 'Consistent slave' and a 'Master gone', so he decides to put this slave in 'primary' 11:56:15 AXSDBG debug tsrv-request[2171] Service::system(170) run drbdsetup /dev/drbd0 primary 11:56:15 SYSLOG info kernel drbd0: Secondary/Unknown --> Primary/Unknown 11:56:15 AXSDBG debug tsrv-request[2171] Service::system(170) run mount /dev/drbd0 /data 11:56:15 SYSLOG info kernel EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(147,0), internal journal but here the other is back and look what's happening.... 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFConnection --> WFReportParams 11:57:37 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:57:37 SYSLOG info kernel drbd0: Connection established. 11:57:37 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:0000000b:00000007:10 11:57:37 SYSLOG info kernel drbd0: Peer(S): 1:00000002:00000001:0000000c:00000006:10 11:57:37 SYSLOG err kernel drbd0: Current Primary shall become sync TARGET! Aborting to prevent data corruption. 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFReportParams --> StandAlone 11:57:37 SYSLOG err kernel drbd0: error receiving ReportParams, l: 72! 11:57:37 SYSLOG info kernel drbd0: asender terminated 11:57:37 SYSLOG info kernel drbd0: worker terminated 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate StandAlone --> StandAlone 11:57:37 SYSLOG info kernel drbd0: Connection lost. 11:57:37 SYSLOG info kernel drbd0: receiver terminated 11:57:56 AXSDBG debug tsrv-request[3257] Service::system(170) run drbdsetup /dev/drbd0 secondary 11:57:56 SYSLOG info kernel drbd0: Primary/Unknown --> Secondary/Unknown # cat /proc/drbd version: 0.7.2 (api:75/proto:74) SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 0: cs:StandAlone st:Secondary/Unknown ld:Consistent ns:0 nr:756 dw:15180 dr:941 al:49 bm:55 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured # drbdsetup /dev/drbd0 show Lower device: 58:00 (lvm(58,0)) Meta device: 58:03 (lvm(58,3)) Meta index: 0 Disk options: # lsmod | grep drbd drbd 128548 1 On Wed, 2004-08-18 at 11:53, Alex Ongena wrote: > Lars, > > I have exactly the same problem again, I will try to figure > out how it happend and will gather all info for you. > > alex > > -- > aXs GUARD has completed security and anti-virus checks on this e-mail > (http://www.axsguard.com) > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user -- aXs GUARD has completed security and anti-virus checks on this e-mail (http://www.axsguard.com) -------------- next part -------------- This is the master and no cmd-line parameters are executed. The slave was rebooted ... 10:43:11 SYSLOG err kernel drbd0: meta connection shut down by peer. 10:43:11 SYSLOG info kernel drbd0: drbd0_asender [2173]: cstate Connected --> NetworkFailure 10:43:11 SYSLOG info kernel drbd0: asender terminated 10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate NetworkFailure --> BrokenPipe 10:43:11 SYSLOG err kernel drbd0: short read expecting header on sock: r=-512 10:43:11 SYSLOG info kernel drbd0: worker terminated 10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate BrokenPipe --> Unconnected 10:43:11 SYSLOG info kernel drbd0: Connection lost. 10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Unconnected --> WFConnection and comes back ... 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFConnection --> WFReportParams 10:44:45 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 10:44:45 SYSLOG info kernel drbd0: Connection established. 10:44:45 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:00000009:00000006:10 10:44:45 SYSLOG info kernel drbd0: Peer(S): 1:00000002:00000001:00000008:00000006:01 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFReportParams --> WFBitMapS 10:44:45 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFBitMapS --> SyncSource 10:44:45 SYSLOG info kernel drbd0: Resync started as SyncSource (need to sync 228 KB [57 bits set]). but the slave's network driver blocked ... 10:44:58 SYSLOG err kernel drbd0: PingAck did not arrive in time. 10:44:58 SYSLOG info kernel drbd0: drbd0_asender [30515]: cstate SyncSource --> NetworkFailure 10:44:58 SYSLOG info kernel drbd0: asender terminated 10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate NetworkFailure --> BrokenPipe 10:44:58 SYSLOG err kernel drbd0: short read expecting header on sock: r=-512 10:44:58 SYSLOG info kernel drbd0: worker terminated 10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate BrokenPipe --> Unconnected 10:44:58 SYSLOG info kernel drbd0: Connection lost. 10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Unconnected --> WFConnection and the slave commes back with another network driver .... 11:35:28 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFConnection --> WFReportParams 11:35:28 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:35:28 SYSLOG info kernel drbd0: Connection established. 11:35:28 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:0000000a:00000006:10 11:35:28 SYSLOG info kernel drbd0: Peer(S): 0:00000002:00000001:00000009:00000006:01 11:35:28 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFReportParams --> WFBitMapS 11:35:28 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFBitMapS --> SyncSource 11:35:29 SYSLOG info kernel drbd0: Resync started as SyncSource (need to sync 3456 KB [864 bits set]). 11:35:29 SYSLOG info kernel drbd: sendpage() omitted: 77/1562295 [XFS' broken IO requests?] 11:35:39 SYSLOG info kernel drbd0: Resync done (total 10 sec; paused 0 sec; 344 K/sec) 11:35:39 SYSLOG info kernel drbd0: drbd0_worker [30516]: cstate SyncSource --> Connected slave is rebooted again with another network driver 11:50:56 SYSLOG info kernel drbd0: sock was shut down by peer 11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Connected --> BrokenPipe 11:50:56 SYSLOG err kernel drbd0: short read expecting header on sock: r=0 11:50:56 SYSLOG info kernel drbd0: asender terminated 11:50:56 SYSLOG info kernel drbd0: worker terminated 11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate BrokenPipe --> Unconnected 11:50:56 SYSLOG info kernel drbd0: Connection lost. 11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Unconnected --> WFConnection and here he's back 11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFConnection --> WFReportParams 11:52:25 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:52:25 SYSLOG info kernel drbd0: Connection established. 11:52:25 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:0000000b:00000006:10 11:52:25 SYSLOG info kernel drbd0: Peer(S): 1:00000002:00000001:0000000a:00000006:01 11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFReportParams --> WFBitMapS 11:52:25 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary 11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFBitMapS --> SyncSource 11:52:25 SYSLOG info kernel drbd0: Resync started as SyncSource (need to sync 152 KB [38 bits set]). 11:52:26 SYSLOG info kernel drbd0: Resync done (total 1 sec; paused 0 sec; 152 K/sec) 11:52:26 SYSLOG info kernel drbd0: drbd0_worker [2050]: cstate SyncSource --> Connected BUT now, I reboot te master 11:55:52 SYSLOG info init Switching to runlevel: 6 11:57:06 SYSLOG info init Re-reading inittab and follow this and around the same moment in the slave's log 11:57:35 AXSDBG debug boot.pl[327] Service::system(170) run modprobe drbd 11:57:35 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74) 11:57:35 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 11:57:35 SYSLOG info kernel drbd: registered as block device major 147 11:57:35 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0 11:57:35 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128 11:57:35 SYSLOG info kernel drbd0: size = 6160384 KB 11:57:36 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map. 11:57:36 SYSLOG info kernel drbd0: Found 4 transactions (112 active extents) in activity log. 11:57:36 SYSLOG info kernel drbd0: Marked additional 30208 KB as out-of-sync based on AL. 11:57:36 SYSLOG info kernel drbd0: drbdsetup [701]: cstate Unconfigured --> StandAlone 11:57:36 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000 11:57:36 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.228 192.168.5.229 C 11:57:36 SYSLOG info kernel drbd0: drbdsetup [714]: cstate StandAlone --> Unconnected 11:57:36 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate Unconnected --> WFConnection 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) > drbdStatus 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) cs => WFConnection 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) st => Secondary/Unknown 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ld => Consistent 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ns => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) nr => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dw => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dr => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) al => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) bm => 59 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) lo => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) pe => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ua => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ap => 0 11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) < drbdStatus 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) > drbdStatus 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) cs => WFConnection 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) st => Secondary/Unknown 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ld => Consistent 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ns => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) nr => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dw => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dr => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) al => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) bm => 59 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) lo => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) pe => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ua => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ap => 0 11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) < drbdStatus 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate WFConnection --> WFReportParams 11:57:37 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:57:37 SYSLOG info kernel drbd0: Connection established. 11:57:37 SYSLOG info kernel drbd0: I am(S): 1:00000002:00000001:0000000c:00000006:10 11:57:37 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:0000000b:00000007:10 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate WFReportParams --> WFBitMapS 11:57:37 SYSLOG err kernel drbd0: sock_sendmsg returned -104 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate WFBitMapS --> BrokenPipe 11:57:37 SYSLOG err kernel drbd0: short sent ReportBitMap size=4096 sent=240 11:57:37 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary 11:57:37 SYSLOG err kernel drbd0: meta connection shut down by peer. 11:57:37 SYSLOG info kernel drbd0: asender terminated 11:57:37 SYSLOG info kernel drbd0: sock was shut down by peer 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate BrokenPipe --> BrokenPipe 11:57:37 SYSLOG err kernel drbd0: short read expecting header on sock: r=0 11:57:37 SYSLOG info kernel drbd0: worker terminated 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate BrokenPipe --> Unconnected 11:57:37 SYSLOG info kernel drbd0: Connection lost. 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate Unconnected --> WFConnection 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) > drbdStatus 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) cs => WFConnection 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) st => Secondary/Unknown 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ld => Consistent 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ns => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) nr => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dw => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) dr => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) al => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) bm => 59 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) lo => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) pe => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ua => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232) ap => 0 11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) < drbdStatus # cat /proc/drbd version: 0.7.2 (api:75/proto:74) SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 0: cs:WFConnection st:Primary/Unknown ld:Consistent ns:0 nr:0 dw:18136 dr:1029 al:0 bm:59 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured # drbdsetup /dev/drbd0 show Lower device: 58:02 (lvm(58,2)) Meta device: 58:03 (lvm(58,3)) Meta index: 0 Disk options: Local address: 192.168.5.228:7788 Remote address: 192.168.5.229:7788 Wire protocol: C Net options: timeout = 6.0 sec (default) connect-int = 10 sec (default) ping-int = 10 sec (default) max-epoch-size = 2048 (default) max-buffers = 2048 (default) sndbuf-size = 131070 (default) ko-count = 0 (default) Syncer options: rate = 10000 KB/sec group = 0 (default) al-extents = 127 (default) # lsmod | grep drbd drbd 128548 3 -------------- next part -------------- slave is booting here and you can also see the used commands (drbdsetup ...) (/dev/lv/data is LVM1 data partition, /dev/lv/drbd-meta is a LVM1 partition for the meta data) 10:44:45 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74) 10:44:45 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 10:44:45 SYSLOG info kernel drbd: registered as block device major 147 10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0 10:44:45 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128 10:44:45 SYSLOG info kernel drbd0: size = 6160384 KB 10:44:45 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map. 10:44:45 SYSLOG warning kernel drbd0: No usable activity log found. 10:44:45 SYSLOG info kernel drbd0: drbdsetup [733]: cstate Unconfigured --> StandAlone 10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000 10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.229 192.168.5.228 C 10:44:45 SYSLOG info kernel drbd0: drbdsetup [746]: cstate StandAlone --> Unconnected 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFConnection --> WFReportParams 10:44:45 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 10:44:45 SYSLOG info kernel drbd0: Connection established. 10:44:45 SYSLOG info kernel drbd0: I am(S): 1:00000002:00000001:00000008:00000006:01 10:44:45 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:00000009:00000006:10 10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFReportParams --> WFBitMapT 10:44:45 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary 10:44:46 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFBitMapT --> SyncTarget 10:44:46 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to sync 228 KB [57 bits set]). 10:44:46 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 wait_sync and here the slave's (this one) network device died. 10:44:59 SYSLOG err kernel drbd0: meta connection shut down by peer. 10:44:59 SYSLOG info kernel drbd0: drbd0_asender [748]: cstate SyncTarget --> NetworkFailure 10:44:59 SYSLOG info kernel drbd0: asender terminated 10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate NetworkFailure --> BrokenPipe 10:44:59 SYSLOG warning kernel drbd0: short read receiving data block: read 1040 expected 4096 10:44:59 SYSLOG err kernel drbd0: error receiving RSDataReply, l: 4112! 10:44:59 SYSLOG info kernel drbd0: worker terminated 10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate BrokenPipe --> Unconnected 10:44:59 SYSLOG info kernel drbd0: Connection lost. 10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection reboot of the slave with another network driver 11:35:29 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74) 11:35:29 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 11:35:29 SYSLOG info kernel drbd: registered as block device major 147 11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0 11:35:29 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128 11:35:29 SYSLOG info kernel drbd0: size = 6160384 KB 11:35:29 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map. 11:35:29 SYSLOG warning kernel drbd0: No usable activity log found. 11:35:29 SYSLOG info kernel drbd0: drbdsetup [734]: cstate Unconfigured --> StandAlone 11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000 11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.229 192.168.5.228 C 11:35:29 SYSLOG info kernel drbd0: drbdsetup [747]: cstate StandAlone --> Unconnected 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate Unconnected --> WFConnection 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate WFConnection --> WFReportParams 11:35:29 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:35:29 SYSLOG info kernel drbd0: Connection established. 11:35:29 SYSLOG info kernel drbd0: I am(S): 0:00000002:00000001:00000009:00000006:01 11:35:29 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:0000000a:00000006:10 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate WFReportParams --> WFBitMapT 11:35:29 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary 11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate WFBitMapT --> SyncTarget 11:35:29 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to sync 3456 KB [864 bits set]). 11:35:30 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 wait_sync 11:35:40 SYSLOG info kernel drbd0: Resync done (total 10 sec; paused 0 sec; 344 K/sec) 11:35:40 SYSLOG info kernel drbd0: drbd0_worker [735]: cstate SyncTarget --> Connected another reboot with yet another network driver 11:52:25 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74) 11:52:25 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 11:52:25 SYSLOG info kernel drbd: registered as block device major 147 11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0 11:52:25 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128 11:52:25 SYSLOG info kernel drbd0: size = 6160384 KB 11:52:25 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map. 11:52:25 SYSLOG warning kernel drbd0: No usable activity log found. 11:52:25 SYSLOG info kernel drbd0: drbdsetup [733]: cstate Unconfigured --> StandAlone 11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000 11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.229 192.168.5.228 C 11:52:25 SYSLOG info kernel drbd0: drbdsetup [746]: cstate StandAlone --> Unconnected 11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection 11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFConnection --> WFReportParams 11:52:26 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:52:26 SYSLOG info kernel drbd0: Connection established. 11:52:26 SYSLOG info kernel drbd0: I am(S): 1:00000002:00000001:0000000a:00000006:01 11:52:26 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:0000000b:00000006:10 11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFReportParams --> WFBitMapT 11:52:26 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary 11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFBitMapT --> SyncTarget 11:52:26 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to sync 152 KB [38 bits set]). 11:52:27 SYSLOG info kernel drbd0: Resync done (total 1 sec; paused 0 sec; 152 K/sec) 11:52:27 SYSLOG info kernel drbd0: drbd0_worker [734]: cstate SyncTarget --> Connected 11:52:27 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 wait_sync from other logging I know that wait_sync return and /dev/drbd reports that the ld is consistent. but then suddenly...(this is in the middle of the master's reboot) 11:56:15 SYSLOG err kernel drbd0: PingAck did not arrive in time. 11:56:15 SYSLOG info kernel drbd0: drbd0_asender [748]: cstate Connected --> NetworkFailure 11:56:15 SYSLOG info kernel drbd0: asender terminated 11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate NetworkFailure --> BrokenPipe 11:56:15 SYSLOG err kernel drbd0: short read expecting header on sock: r=-512 11:56:15 SYSLOG info kernel drbd0: worker terminated 11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate BrokenPipe --> Unconnected 11:56:15 SYSLOG info kernel drbd0: Connection lost. 11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(219) > drbdStatus 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) cs => WFConnection 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) st => Secondary/Unknown 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) ld => Consistent 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) ns => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) nr => 756 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) dw => 756 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) dr => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) al => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) bm => 6 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) lo => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) pe => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) ua => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232) ap => 0 11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(239) < drbdStatus my HA software detects a 'Consistent slave' and a 'Master gone', so he decides to put this slave in 'primary' 11:56:15 AXSDBG debug tsrv-request[2171] Service::system(170) run drbdsetup /dev/drbd0 primary 11:56:15 SYSLOG info kernel drbd0: Secondary/Unknown --> Primary/Unknown 11:56:15 AXSDBG debug tsrv-request[2171] Service::system(170) run mount /dev/drbd0 /data 11:56:15 SYSLOG info kernel EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(147,0), internal journal but here the other is back and look what's happening.... 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFConnection --> WFReportParams 11:57:37 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74 11:57:37 SYSLOG info kernel drbd0: Connection established. 11:57:37 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:0000000b:00000007:10 11:57:37 SYSLOG info kernel drbd0: Peer(S): 1:00000002:00000001:0000000c:00000006:10 11:57:37 SYSLOG err kernel drbd0: Current Primary shall become sync TARGET! Aborting to prevent data corruption. 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFReportParams --> StandAlone 11:57:37 SYSLOG err kernel drbd0: error receiving ReportParams, l: 72! 11:57:37 SYSLOG info kernel drbd0: asender terminated 11:57:37 SYSLOG info kernel drbd0: worker terminated 11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate StandAlone --> StandAlone 11:57:37 SYSLOG info kernel drbd0: Connection lost. 11:57:37 SYSLOG info kernel drbd0: receiver terminated 11:57:56 AXSDBG debug tsrv-request[3257] Service::system(170) run drbdsetup /dev/drbd0 secondary 11:57:56 SYSLOG info kernel drbd0: Primary/Unknown --> Secondary/Unknown # cat /proc/drbd version: 0.7.2 (api:75/proto:74) SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52 0: cs:StandAlone st:Secondary/Unknown ld:Consistent ns:0 nr:756 dw:15180 dr:941 al:49 bm:55 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured # drbdsetup /dev/drbd0 show Lower device: 58:00 (lvm(58,0)) Meta device: 58:03 (lvm(58,3)) Meta index: 0 Disk options: # lsmod | grep drbd drbd 128548 1