[DRBD-user] 0.7.2 Split-Brain, unrecoverable - update -reconstruction

Alex Ongena Alex.Ongena at able.be
Wed Aug 18 12:38:01 CEST 2004


Lars,

I can simulate it and I think it's related to getting a:
"PingAck did not arrive in time" on the slave, during
a 'proper or not so proper' reboot of the master.
=> both systems are consistent, but the HA software
put's the slave into primary (because the master is dead)
and when the master commes back, they both think they
are consistent although they are different.
Maybe the 'longest a-live' should be consisered as real-
consistent....

Please enjoy the scenario below.

Regards
alex
PS: let me know if I can help with further testing to make
0.7.x rock-solid.

Master
===================================
This is the master and no cmd-line parameters are executed.

The slave was rebooted ...

10:43:11 SYSLOG err kernel drbd0: meta connection shut down by peer.
10:43:11 SYSLOG info kernel drbd0: drbd0_asender [2173]: cstate
Connected --> NetworkFailure
10:43:11 SYSLOG info kernel drbd0: asender terminated
10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
NetworkFailure --> BrokenPipe
10:43:11 SYSLOG err kernel drbd0: short read expecting header on sock:
r=-512
10:43:11 SYSLOG info kernel drbd0: worker terminated
10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
BrokenPipe --> Unconnected
10:43:11 SYSLOG info kernel drbd0: Connection lost.
10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
Unconnected --> WFConnection

and comes back ...

10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
WFConnection --> WFReportParams
10:44:45 SYSLOG info kernel drbd0: Handshake successful: DRBD Network
Protocol version 74
10:44:45 SYSLOG info kernel drbd0: Connection established.
10:44:45 SYSLOG info kernel drbd0: I am(P):
1:00000002:00000001:00000009:00000006:10
10:44:45 SYSLOG info kernel drbd0: Peer(S):
1:00000002:00000001:00000008:00000006:01
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
WFReportParams --> WFBitMapS
10:44:45 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
WFBitMapS --> SyncSource
10:44:45 SYSLOG info kernel drbd0: Resync started as SyncSource (need to
sync 228 KB [57 bits set]).

but the slave's network driver blocked ...

10:44:58 SYSLOG err kernel drbd0: PingAck did not arrive in time.
10:44:58 SYSLOG info kernel drbd0: drbd0_asender [30515]: cstate
SyncSource --> NetworkFailure
10:44:58 SYSLOG info kernel drbd0: asender terminated
10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
NetworkFailure --> BrokenPipe
10:44:58 SYSLOG err kernel drbd0: short read expecting header on sock:
r=-512
10:44:58 SYSLOG info kernel drbd0: worker terminated
10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
BrokenPipe --> Unconnected
10:44:58 SYSLOG info kernel drbd0: Connection lost.
10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
Unconnected --> WFConnection

and the slave commes back with another network driver ....

11:35:28 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
WFConnection --> WFReportParams
11:35:28 SYSLOG info kernel drbd0: Handshake successful: DRBD Network
Protocol version 74
11:35:28 SYSLOG info kernel drbd0: Connection established.
11:35:28 SYSLOG info kernel drbd0: I am(P):
1:00000002:00000001:0000000a:00000006:10
11:35:28 SYSLOG info kernel drbd0: Peer(S):
0:00000002:00000001:00000009:00000006:01
11:35:28 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
WFReportParams --> WFBitMapS
11:35:28 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
WFBitMapS --> SyncSource
11:35:29 SYSLOG info kernel drbd0: Resync started as SyncSource (need to
sync 3456 KB [864 bits set]).
11:35:29 SYSLOG info kernel drbd: sendpage() omitted: 77/1562295 [XFS'
broken IO requests?]
11:35:39 SYSLOG info kernel drbd0: Resync done (total 10 sec; paused 0
sec; 344 K/sec)
11:35:39 SYSLOG info kernel drbd0: drbd0_worker [30516]: cstate
SyncSource --> Connected

slave is rebooted again with another network driver

11:50:56 SYSLOG info kernel drbd0: sock was shut down by peer
11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
Connected --> BrokenPipe
11:50:56 SYSLOG err kernel drbd0: short read expecting header on sock:
r=0
11:50:56 SYSLOG info kernel drbd0: asender terminated
11:50:56 SYSLOG info kernel drbd0: worker terminated
11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
BrokenPipe --> Unconnected
11:50:56 SYSLOG info kernel drbd0: Connection lost.
11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
Unconnected --> WFConnection

and here he's back

11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
WFConnection --> WFReportParams
11:52:25 SYSLOG info kernel drbd0: Handshake successful: DRBD Network
Protocol version 74
11:52:25 SYSLOG info kernel drbd0: Connection established.
11:52:25 SYSLOG info kernel drbd0: I am(P):
1:00000002:00000001:0000000b:00000006:10
11:52:25 SYSLOG info kernel drbd0: Peer(S):
1:00000002:00000001:0000000a:00000006:01
11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
WFReportParams --> WFBitMapS
11:52:25 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary
11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate
WFBitMapS --> SyncSource
11:52:25 SYSLOG info kernel drbd0: Resync started as SyncSource (need to
sync 152 KB [38 bits set]).
11:52:26 SYSLOG info kernel drbd0: Resync done (total 1 sec; paused 0
sec; 152 K/sec)
11:52:26 SYSLOG info kernel drbd0: drbd0_worker [2050]: cstate
SyncSource --> Connected

BUT now, I reboot te master
11:55:52 SYSLOG info init Switching to runlevel: 6

11:57:06 SYSLOG info init Re-reading inittab

and follow this and around the same moment in the slave's log

11:57:35 AXSDBG debug boot.pl[327] Service::system(170) run modprobe
drbd
11:57:35 SYSLOG info kernel drbd: initialised. Version: 0.7.2
(api:75/proto:74)
11:57:35 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by
ao at build, 2004-08-09 14:00:52
11:57:35 SYSLOG info kernel drbd: registered as block device major 147
11:57:35 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup
/dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0
11:57:35 SYSLOG info kernel drbd0: resync bitmap: bits=1540096
words=48128
11:57:35 SYSLOG info kernel drbd0: size = 6160384 KB
11:57:36 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk
bit-map.
11:57:36 SYSLOG info kernel drbd0: Found 4 transactions (112 active
extents) in activity log.
11:57:36 SYSLOG info kernel drbd0: Marked additional 30208 KB as
out-of-sync based on AL.
11:57:36 SYSLOG info kernel drbd0: drbdsetup [701]: cstate Unconfigured
--> StandAlone
11:57:36 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup
/dev/drbd0 syncer -r 10000
11:57:36 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup
/dev/drbd0 net 192.168.5.228 192.168.5.229 C
11:57:36 SYSLOG info kernel drbd0: drbdsetup [714]: cstate StandAlone
--> Unconnected
11:57:36 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate
Unconnected --> WFConnection
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) >
drbdStatus
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   cs =>
WFConnection
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   st =>
Secondary/Unknown
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ld =>
Consistent
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ns =>
0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   nr =>
0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dw =>
0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dr =>
0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   al =>
0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   bm =>
59
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   lo =>
0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   pe =>
0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ua =>
0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ap =>
0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) <
drbdStatus
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) >
drbdStatus
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   cs =>
WFConnection
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   st =>
Secondary/Unknown
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ld =>
Consistent
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ns =>
0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   nr =>
0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dw =>
0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dr =>
0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   al =>
0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   bm =>
59
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   lo =>
0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   pe =>
0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ua =>
0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ap =>
0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) <
drbdStatus
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate
WFConnection --> WFReportParams
11:57:37 SYSLOG info kernel drbd0: Handshake successful: DRBD Network
Protocol version 74
11:57:37 SYSLOG info kernel drbd0: Connection established.
11:57:37 SYSLOG info kernel drbd0: I am(S):
1:00000002:00000001:0000000c:00000006:10
11:57:37 SYSLOG info kernel drbd0: Peer(P):
1:00000002:00000001:0000000b:00000007:10
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate
WFReportParams --> WFBitMapS
11:57:37 SYSLOG err kernel drbd0: sock_sendmsg returned -104
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate
WFBitMapS --> BrokenPipe
11:57:37 SYSLOG err kernel drbd0: short sent ReportBitMap size=4096
sent=240
11:57:37 SYSLOG info kernel drbd0: Secondary/Unknown -->
Secondary/Primary
11:57:37 SYSLOG err kernel drbd0: meta connection shut down by peer.
11:57:37 SYSLOG info kernel drbd0: asender terminated
11:57:37 SYSLOG info kernel drbd0: sock was shut down by peer
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate
BrokenPipe --> BrokenPipe
11:57:37 SYSLOG err kernel drbd0: short read expecting header on sock:
r=0
11:57:37 SYSLOG info kernel drbd0: worker terminated
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate
BrokenPipe --> Unconnected
11:57:37 SYSLOG info kernel drbd0: Connection lost.
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate
Unconnected --> WFConnection
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) >
drbdStatus
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   cs =>
WFConnection
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   st =>
Secondary/Unknown
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ld =>
Consistent
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ns =>
0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   nr =>
0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dw =>
0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dr =>
0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   al =>
0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   bm =>
59
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   lo =>
0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   pe =>
0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ua =>
0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ap =>
0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) <
drbdStatus


# cat /proc/drbd 
version: 0.7.2 (api:75/proto:74)
SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52
 0: cs:WFConnection st:Primary/Unknown ld:Consistent
    ns:0 nr:0 dw:18136 dr:1029 al:0 bm:59 lo:0 pe:0 ua:0 ap:0
 1: cs:Unconfigured

# drbdsetup /dev/drbd0 show
Lower device: 58:02   (lvm(58,2))
Meta device: 58:03   (lvm(58,3))
Meta index: 0
Disk options:
Local address: 192.168.5.228:7788
Remote address: 192.168.5.229:7788
Wire protocol: C
Net options:
 timeout = 6.0 sec (default)
 connect-int = 10 sec (default)
 ping-int = 10 sec (default)
 max-epoch-size = 2048  (default)
 max-buffers = 2048  (default)
 sndbuf-size = 131070  (default)
 ko-count = 0  (default)
Syncer options:
 rate = 10000 KB/sec 
 group = 0  (default)
 al-extents = 127  (default)

# lsmod | grep drbd
drbd                  128548   3 

Slave
===================================
slave is booting here and you can also see the used commands (drbdsetup
...)

(/dev/lv/data is LVM1 data partition, /dev/lv/drbd-meta is a LVM1
partition for the meta data)

10:44:45 SYSLOG info kernel drbd: initialised. Version: 0.7.2
(api:75/proto:74)
10:44:45 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by
ao at build, 2004-08-09 14:00:52
10:44:45 SYSLOG info kernel drbd: registered as block device major 147
10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup
/dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0
10:44:45 SYSLOG info kernel drbd0: resync bitmap: bits=1540096
words=48128
10:44:45 SYSLOG info kernel drbd0: size = 6160384 KB
10:44:45 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk
bit-map.
10:44:45 SYSLOG warning kernel drbd0: No usable activity log found.
10:44:45 SYSLOG info kernel drbd0: drbdsetup [733]: cstate Unconfigured
--> StandAlone
10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup
/dev/drbd0 syncer -r 10000
10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup
/dev/drbd0 net 192.168.5.229 192.168.5.228 C
10:44:45 SYSLOG info kernel drbd0: drbdsetup [746]: cstate StandAlone
--> Unconnected
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
Unconnected --> WFConnection
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
WFConnection --> WFReportParams
10:44:45 SYSLOG info kernel drbd0: Handshake successful: DRBD Network
Protocol version 74
10:44:45 SYSLOG info kernel drbd0: Connection established.
10:44:45 SYSLOG info kernel drbd0: I am(S):
1:00000002:00000001:00000008:00000006:01
10:44:45 SYSLOG info kernel drbd0: Peer(P):
1:00000002:00000001:00000009:00000006:10
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
WFReportParams --> WFBitMapT
10:44:45 SYSLOG info kernel drbd0: Secondary/Unknown -->
Secondary/Primary
10:44:46 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
WFBitMapT --> SyncTarget
10:44:46 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to
sync 228 KB [57 bits set]).
10:44:46 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup
/dev/drbd0 wait_sync

and here the slave's (this one) network device died.

10:44:59 SYSLOG err kernel drbd0: meta connection shut down by peer.
10:44:59 SYSLOG info kernel drbd0: drbd0_asender [748]: cstate
SyncTarget --> NetworkFailure
10:44:59 SYSLOG info kernel drbd0: asender terminated
10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
NetworkFailure --> BrokenPipe
10:44:59 SYSLOG warning kernel drbd0: short read receiving data block:
read 1040 expected 4096
10:44:59 SYSLOG err kernel drbd0: error receiving RSDataReply, l: 4112!
10:44:59 SYSLOG info kernel drbd0: worker terminated
10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
BrokenPipe --> Unconnected
10:44:59 SYSLOG info kernel drbd0: Connection lost.
10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
Unconnected --> WFConnection

reboot of the slave with another network driver

11:35:29 SYSLOG info kernel drbd: initialised. Version: 0.7.2
(api:75/proto:74)
11:35:29 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by
ao at build, 2004-08-09 14:00:52
11:35:29 SYSLOG info kernel drbd: registered as block device major 147
11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup
/dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0
11:35:29 SYSLOG info kernel drbd0: resync bitmap: bits=1540096
words=48128
11:35:29 SYSLOG info kernel drbd0: size = 6160384 KB
11:35:29 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk
bit-map.
11:35:29 SYSLOG warning kernel drbd0: No usable activity log found.
11:35:29 SYSLOG info kernel drbd0: drbdsetup [734]: cstate Unconfigured
--> StandAlone
11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup
/dev/drbd0 syncer -r 10000
11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup
/dev/drbd0 net 192.168.5.229 192.168.5.228 C
11:35:29 SYSLOG info kernel drbd0: drbdsetup [747]: cstate StandAlone
--> Unconnected
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate
Unconnected --> WFConnection
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate
WFConnection --> WFReportParams
11:35:29 SYSLOG info kernel drbd0: Handshake successful: DRBD Network
Protocol version 74
11:35:29 SYSLOG info kernel drbd0: Connection established.
11:35:29 SYSLOG info kernel drbd0: I am(S):
0:00000002:00000001:00000009:00000006:01
11:35:29 SYSLOG info kernel drbd0: Peer(P):
1:00000002:00000001:0000000a:00000006:10
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate
WFReportParams --> WFBitMapT
11:35:29 SYSLOG info kernel drbd0: Secondary/Unknown -->
Secondary/Primary
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate
WFBitMapT --> SyncTarget
11:35:29 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to
sync 3456 KB [864 bits set]).
11:35:30 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup
/dev/drbd0 wait_sync
11:35:40 SYSLOG info kernel drbd0: Resync done (total 10 sec; paused 0
sec; 344 K/sec)
11:35:40 SYSLOG info kernel drbd0: drbd0_worker [735]: cstate SyncTarget
--> Connected

another reboot with yet another network driver

11:52:25 SYSLOG info kernel drbd: initialised. Version: 0.7.2
(api:75/proto:74)
11:52:25 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by
ao at build, 2004-08-09 14:00:52
11:52:25 SYSLOG info kernel drbd: registered as block device major 147
11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup
/dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0
11:52:25 SYSLOG info kernel drbd0: resync bitmap: bits=1540096
words=48128
11:52:25 SYSLOG info kernel drbd0: size = 6160384 KB
11:52:25 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk
bit-map.
11:52:25 SYSLOG warning kernel drbd0: No usable activity log found.
11:52:25 SYSLOG info kernel drbd0: drbdsetup [733]: cstate Unconfigured
--> StandAlone
11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup
/dev/drbd0 syncer -r 10000
11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup
/dev/drbd0 net 192.168.5.229 192.168.5.228 C
11:52:25 SYSLOG info kernel drbd0: drbdsetup [746]: cstate StandAlone
--> Unconnected
11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
Unconnected --> WFConnection
11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
WFConnection --> WFReportParams
11:52:26 SYSLOG info kernel drbd0: Handshake successful: DRBD Network
Protocol version 74
11:52:26 SYSLOG info kernel drbd0: Connection established.
11:52:26 SYSLOG info kernel drbd0: I am(S):
1:00000002:00000001:0000000a:00000006:01
11:52:26 SYSLOG info kernel drbd0: Peer(P):
1:00000002:00000001:0000000b:00000006:10
11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
WFReportParams --> WFBitMapT
11:52:26 SYSLOG info kernel drbd0: Secondary/Unknown -->
Secondary/Primary
11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
WFBitMapT --> SyncTarget
11:52:26 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to
sync 152 KB [38 bits set]).
11:52:27 SYSLOG info kernel drbd0: Resync done (total 1 sec; paused 0
sec; 152 K/sec)
11:52:27 SYSLOG info kernel drbd0: drbd0_worker [734]: cstate SyncTarget
--> Connected
11:52:27 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup
/dev/drbd0 wait_sync

from other logging I know that wait_sync return and /dev/drbd reports
that the ld is
consistent.

but then suddenly...(this is in the middle of the master's reboot)

11:56:15 SYSLOG err kernel drbd0: PingAck did not arrive in time.
11:56:15 SYSLOG info kernel drbd0: drbd0_asender [748]: cstate Connected
--> NetworkFailure
11:56:15 SYSLOG info kernel drbd0: asender terminated
11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
NetworkFailure --> BrokenPipe
11:56:15 SYSLOG err kernel drbd0: short read expecting header on sock:
r=-512
11:56:15 SYSLOG info kernel drbd0: worker terminated
11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
BrokenPipe --> Unconnected
11:56:15 SYSLOG info kernel drbd0: Connection lost.
11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
Unconnected --> WFConnection
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(219) >
drbdStatus
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
cs => WFConnection
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
st => Secondary/Unknown
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
ld => Consistent
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
ns => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
nr => 756
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
dw => 756
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
dr => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
al => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
bm => 6
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
lo => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
pe => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
ua => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)  
ap => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(239) <
drbdStatus

my HA software detects a 'Consistent slave' and a 'Master gone', so he
decides
to put this slave in 'primary'

11:56:15 AXSDBG debug tsrv-request[2171] Service::system(170) run
drbdsetup /dev/drbd0 primary
11:56:15 SYSLOG info kernel drbd0: Secondary/Unknown --> Primary/Unknown
11:56:15 AXSDBG debug tsrv-request[2171] Service::system(170) run mount
/dev/drbd0 /data
11:56:15 SYSLOG info kernel EXT3 FS 2.4-0.9.19, 19 August 2002 on
drbd(147,0), internal journal

but here the other is back and look what's happening....

11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
WFConnection --> WFReportParams
11:57:37 SYSLOG info kernel drbd0: Handshake successful: DRBD Network
Protocol version 74
11:57:37 SYSLOG info kernel drbd0: Connection established.
11:57:37 SYSLOG info kernel drbd0: I am(P):
1:00000002:00000001:0000000b:00000007:10
11:57:37 SYSLOG info kernel drbd0: Peer(S):
1:00000002:00000001:0000000c:00000006:10
11:57:37 SYSLOG err kernel drbd0: Current Primary shall become sync
TARGET! Aborting to prevent data corruption.
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
WFReportParams --> StandAlone
11:57:37 SYSLOG err kernel drbd0: error receiving ReportParams, l: 72!
11:57:37 SYSLOG info kernel drbd0: asender terminated
11:57:37 SYSLOG info kernel drbd0: worker terminated
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate
StandAlone --> StandAlone
11:57:37 SYSLOG info kernel drbd0: Connection lost.
11:57:37 SYSLOG info kernel drbd0: receiver terminated
11:57:56 AXSDBG debug tsrv-request[3257] Service::system(170) run
drbdsetup /dev/drbd0 secondary
11:57:56 SYSLOG info kernel drbd0: Primary/Unknown --> Secondary/Unknown


# cat /proc/drbd 
version: 0.7.2 (api:75/proto:74)
SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52
 0: cs:StandAlone st:Secondary/Unknown ld:Consistent
    ns:0 nr:756 dw:15180 dr:941 al:49 bm:55 lo:0 pe:0 ua:0 ap:0
 1: cs:Unconfigured

# drbdsetup /dev/drbd0 show
Lower device: 58:00   (lvm(58,0))
Meta device: 58:03   (lvm(58,3))
Meta index: 0
Disk options:

# lsmod | grep drbd
drbd                  128548   1 


On Wed, 2004-08-18 at 11:53, Alex Ongena wrote:
> Lars,
> 
> I have exactly the same problem again, I will try to figure
> out how it happend and will gather all info for you.
> 
> alex
> 
> --
> aXs GUARD has completed security and anti-virus checks on this e-mail
> (http://www.axsguard.com)
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

--
aXs GUARD has completed security and anti-virus checks on this e-mail
(http://www.axsguard.com)
-------------- next part --------------
This is the master and no cmd-line parameters are executed.

The slave was rebooted ...

10:43:11 SYSLOG err kernel drbd0: meta connection shut down by peer.
10:43:11 SYSLOG info kernel drbd0: drbd0_asender [2173]: cstate Connected --> NetworkFailure
10:43:11 SYSLOG info kernel drbd0: asender terminated
10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate NetworkFailure --> BrokenPipe
10:43:11 SYSLOG err kernel drbd0: short read expecting header on sock: r=-512
10:43:11 SYSLOG info kernel drbd0: worker terminated
10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate BrokenPipe --> Unconnected
10:43:11 SYSLOG info kernel drbd0: Connection lost.
10:43:11 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Unconnected --> WFConnection

and comes back ...

10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFConnection --> WFReportParams
10:44:45 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74
10:44:45 SYSLOG info kernel drbd0: Connection established.
10:44:45 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:00000009:00000006:10
10:44:45 SYSLOG info kernel drbd0: Peer(S): 1:00000002:00000001:00000008:00000006:01
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFReportParams --> WFBitMapS
10:44:45 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFBitMapS --> SyncSource
10:44:45 SYSLOG info kernel drbd0: Resync started as SyncSource (need to sync 228 KB [57 bits set]).

but the slave's network driver blocked ...

10:44:58 SYSLOG err kernel drbd0: PingAck did not arrive in time.
10:44:58 SYSLOG info kernel drbd0: drbd0_asender [30515]: cstate SyncSource --> NetworkFailure
10:44:58 SYSLOG info kernel drbd0: asender terminated
10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate NetworkFailure --> BrokenPipe
10:44:58 SYSLOG err kernel drbd0: short read expecting header on sock: r=-512
10:44:58 SYSLOG info kernel drbd0: worker terminated
10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate BrokenPipe --> Unconnected
10:44:58 SYSLOG info kernel drbd0: Connection lost.
10:44:58 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Unconnected --> WFConnection

and the slave commes back with another network driver ....

11:35:28 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFConnection --> WFReportParams
11:35:28 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74
11:35:28 SYSLOG info kernel drbd0: Connection established.
11:35:28 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:0000000a:00000006:10
11:35:28 SYSLOG info kernel drbd0: Peer(S): 0:00000002:00000001:00000009:00000006:01
11:35:28 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFReportParams --> WFBitMapS
11:35:28 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFBitMapS --> SyncSource
11:35:29 SYSLOG info kernel drbd0: Resync started as SyncSource (need to sync 3456 KB [864 bits set]).
11:35:29 SYSLOG info kernel drbd: sendpage() omitted: 77/1562295 [XFS' broken IO requests?]
11:35:39 SYSLOG info kernel drbd0: Resync done (total 10 sec; paused 0 sec; 344 K/sec)
11:35:39 SYSLOG info kernel drbd0: drbd0_worker [30516]: cstate SyncSource --> Connected

slave is rebooted again with another network driver

11:50:56 SYSLOG info kernel drbd0: sock was shut down by peer
11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Connected --> BrokenPipe
11:50:56 SYSLOG err kernel drbd0: short read expecting header on sock: r=0
11:50:56 SYSLOG info kernel drbd0: asender terminated
11:50:56 SYSLOG info kernel drbd0: worker terminated
11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate BrokenPipe --> Unconnected
11:50:56 SYSLOG info kernel drbd0: Connection lost.
11:50:56 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate Unconnected --> WFConnection

and here he's back

11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFConnection --> WFReportParams
11:52:25 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74
11:52:25 SYSLOG info kernel drbd0: Connection established.
11:52:25 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:0000000b:00000006:10
11:52:25 SYSLOG info kernel drbd0: Peer(S): 1:00000002:00000001:0000000a:00000006:01
11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFReportParams --> WFBitMapS
11:52:25 SYSLOG info kernel drbd0: Primary/Unknown --> Primary/Secondary
11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [714]: cstate WFBitMapS --> SyncSource
11:52:25 SYSLOG info kernel drbd0: Resync started as SyncSource (need to sync 152 KB [38 bits set]).
11:52:26 SYSLOG info kernel drbd0: Resync done (total 1 sec; paused 0 sec; 152 K/sec)
11:52:26 SYSLOG info kernel drbd0: drbd0_worker [2050]: cstate SyncSource --> Connected

BUT now, I reboot te master
11:55:52 SYSLOG info init Switching to runlevel: 6

11:57:06 SYSLOG info init Re-reading inittab

and follow this and around the same moment in the slave's log

11:57:35 AXSDBG debug boot.pl[327] Service::system(170) run modprobe drbd
11:57:35 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74)
11:57:35 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52
11:57:35 SYSLOG info kernel drbd: registered as block device major 147
11:57:35 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0
11:57:35 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128
11:57:35 SYSLOG info kernel drbd0: size = 6160384 KB
11:57:36 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map.
11:57:36 SYSLOG info kernel drbd0: Found 4 transactions (112 active extents) in activity log.
11:57:36 SYSLOG info kernel drbd0: Marked additional 30208 KB as out-of-sync based on AL.
11:57:36 SYSLOG info kernel drbd0: drbdsetup [701]: cstate Unconfigured --> StandAlone
11:57:36 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000
11:57:36 AXSDBG debug boot.pl[327] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.228 192.168.5.229 C
11:57:36 SYSLOG info kernel drbd0: drbdsetup [714]: cstate StandAlone --> Unconnected
11:57:36 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate Unconnected --> WFConnection
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) > drbdStatus
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   cs => WFConnection
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   st => Secondary/Unknown
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ld => Consistent
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ns => 0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   nr => 0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dw => 0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dr => 0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   al => 0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   bm => 59
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   lo => 0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   pe => 0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ua => 0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ap => 0
11:57:36 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) < drbdStatus
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) > drbdStatus
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   cs => WFConnection
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   st => Secondary/Unknown
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ld => Consistent
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ns => 0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   nr => 0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dw => 0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dr => 0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   al => 0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   bm => 59
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   lo => 0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   pe => 0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ua => 0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ap => 0
11:57:37 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) < drbdStatus
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate WFConnection --> WFReportParams
11:57:37 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74
11:57:37 SYSLOG info kernel drbd0: Connection established.
11:57:37 SYSLOG info kernel drbd0: I am(S): 1:00000002:00000001:0000000c:00000006:10
11:57:37 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:0000000b:00000007:10
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate WFReportParams --> WFBitMapS
11:57:37 SYSLOG err kernel drbd0: sock_sendmsg returned -104
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate WFBitMapS --> BrokenPipe
11:57:37 SYSLOG err kernel drbd0: short sent ReportBitMap size=4096 sent=240
11:57:37 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary
11:57:37 SYSLOG err kernel drbd0: meta connection shut down by peer.
11:57:37 SYSLOG info kernel drbd0: asender terminated
11:57:37 SYSLOG info kernel drbd0: sock was shut down by peer
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate BrokenPipe --> BrokenPipe
11:57:37 SYSLOG err kernel drbd0: short read expecting header on sock: r=0
11:57:37 SYSLOG info kernel drbd0: worker terminated
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate BrokenPipe --> Unconnected
11:57:37 SYSLOG info kernel drbd0: Connection lost.
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [715]: cstate Unconnected --> WFConnection
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(219) > drbdStatus
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   cs => WFConnection
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   st => Secondary/Unknown
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ld => Consistent
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ns => 0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   nr => 0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dw => 0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   dr => 0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   al => 0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   bm => 59
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   lo => 0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   pe => 0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ua => 0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(232)   ap => 0
11:57:38 AXSDBG debug boot.pl[327] Ha::Service::drbdStatus(239) < drbdStatus


# cat /proc/drbd 
version: 0.7.2 (api:75/proto:74)
SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52
 0: cs:WFConnection st:Primary/Unknown ld:Consistent
    ns:0 nr:0 dw:18136 dr:1029 al:0 bm:59 lo:0 pe:0 ua:0 ap:0
 1: cs:Unconfigured

# drbdsetup /dev/drbd0 show
Lower device: 58:02   (lvm(58,2))
Meta device: 58:03   (lvm(58,3))
Meta index: 0
Disk options:
Local address: 192.168.5.228:7788
Remote address: 192.168.5.229:7788
Wire protocol: C
Net options:
 timeout = 6.0 sec (default)
 connect-int = 10 sec (default)
 ping-int = 10 sec (default)
 max-epoch-size = 2048  (default)
 max-buffers = 2048  (default)
 sndbuf-size = 131070  (default)
 ko-count = 0  (default)
Syncer options:
 rate = 10000 KB/sec 
 group = 0  (default)
 al-extents = 127  (default)

# lsmod | grep drbd
drbd                  128548   3 
-------------- next part --------------
slave is booting here and you can also see the used commands (drbdsetup ...)

(/dev/lv/data is LVM1 data partition, /dev/lv/drbd-meta is a LVM1 partition for the meta data)

10:44:45 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74)
10:44:45 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52
10:44:45 SYSLOG info kernel drbd: registered as block device major 147
10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0
10:44:45 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128
10:44:45 SYSLOG info kernel drbd0: size = 6160384 KB
10:44:45 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map.
10:44:45 SYSLOG warning kernel drbd0: No usable activity log found.
10:44:45 SYSLOG info kernel drbd0: drbdsetup [733]: cstate Unconfigured --> StandAlone
10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000
10:44:45 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.229 192.168.5.228 C
10:44:45 SYSLOG info kernel drbd0: drbdsetup [746]: cstate StandAlone --> Unconnected
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFConnection --> WFReportParams
10:44:45 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74
10:44:45 SYSLOG info kernel drbd0: Connection established.
10:44:45 SYSLOG info kernel drbd0: I am(S): 1:00000002:00000001:00000008:00000006:01
10:44:45 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:00000009:00000006:10
10:44:45 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFReportParams --> WFBitMapT
10:44:45 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary
10:44:46 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFBitMapT --> SyncTarget
10:44:46 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to sync 228 KB [57 bits set]).
10:44:46 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 wait_sync

and here the slave's (this one) network device died.

10:44:59 SYSLOG err kernel drbd0: meta connection shut down by peer.
10:44:59 SYSLOG info kernel drbd0: drbd0_asender [748]: cstate SyncTarget --> NetworkFailure
10:44:59 SYSLOG info kernel drbd0: asender terminated
10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate NetworkFailure --> BrokenPipe
10:44:59 SYSLOG warning kernel drbd0: short read receiving data block: read 1040 expected 4096
10:44:59 SYSLOG err kernel drbd0: error receiving RSDataReply, l: 4112!
10:44:59 SYSLOG info kernel drbd0: worker terminated
10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate BrokenPipe --> Unconnected
10:44:59 SYSLOG info kernel drbd0: Connection lost.
10:44:59 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection

reboot of the slave with another network driver

11:35:29 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74)
11:35:29 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52
11:35:29 SYSLOG info kernel drbd: registered as block device major 147
11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0
11:35:29 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128
11:35:29 SYSLOG info kernel drbd0: size = 6160384 KB
11:35:29 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map.
11:35:29 SYSLOG warning kernel drbd0: No usable activity log found.
11:35:29 SYSLOG info kernel drbd0: drbdsetup [734]: cstate Unconfigured --> StandAlone
11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000
11:35:29 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.229 192.168.5.228 C
11:35:29 SYSLOG info kernel drbd0: drbdsetup [747]: cstate StandAlone --> Unconnected
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate Unconnected --> WFConnection
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate WFConnection --> WFReportParams
11:35:29 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74
11:35:29 SYSLOG info kernel drbd0: Connection established.
11:35:29 SYSLOG info kernel drbd0: I am(S): 0:00000002:00000001:00000009:00000006:01
11:35:29 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:0000000a:00000006:10
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate WFReportParams --> WFBitMapT
11:35:29 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary
11:35:29 SYSLOG info kernel drbd0: drbd0_receiver [748]: cstate WFBitMapT --> SyncTarget
11:35:29 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to sync 3456 KB [864 bits set]).
11:35:30 AXSDBG debug boot.pl[332] Service::system(170) run drbdsetup /dev/drbd0 wait_sync
11:35:40 SYSLOG info kernel drbd0: Resync done (total 10 sec; paused 0 sec; 344 K/sec)
11:35:40 SYSLOG info kernel drbd0: drbd0_worker [735]: cstate SyncTarget --> Connected

another reboot with yet another network driver

11:52:25 SYSLOG info kernel drbd: initialised. Version: 0.7.2 (api:75/proto:74)
11:52:25 SYSLOG info kernel drbd: SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52
11:52:25 SYSLOG info kernel drbd: registered as block device major 147
11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 disk /dev/lv/data /dev/lv/drbd-meta 0
11:52:25 SYSLOG info kernel drbd0: resync bitmap: bits=1540096 words=48128
11:52:25 SYSLOG info kernel drbd0: size = 6160384 KB
11:52:25 SYSLOG info kernel drbd0: 0 KB marked out-of-sync by on disk bit-map.
11:52:25 SYSLOG warning kernel drbd0: No usable activity log found.
11:52:25 SYSLOG info kernel drbd0: drbdsetup [733]: cstate Unconfigured --> StandAlone
11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 syncer -r 10000
11:52:25 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 net 192.168.5.229 192.168.5.228 C
11:52:25 SYSLOG info kernel drbd0: drbdsetup [746]: cstate StandAlone --> Unconnected
11:52:25 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection
11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFConnection --> WFReportParams
11:52:26 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74
11:52:26 SYSLOG info kernel drbd0: Connection established.
11:52:26 SYSLOG info kernel drbd0: I am(S): 1:00000002:00000001:0000000a:00000006:01
11:52:26 SYSLOG info kernel drbd0: Peer(P): 1:00000002:00000001:0000000b:00000006:10
11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFReportParams --> WFBitMapT
11:52:26 SYSLOG info kernel drbd0: Secondary/Unknown --> Secondary/Primary
11:52:26 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFBitMapT --> SyncTarget
11:52:26 SYSLOG info kernel drbd0: Resync started as SyncTarget (need to sync 152 KB [38 bits set]).
11:52:27 SYSLOG info kernel drbd0: Resync done (total 1 sec; paused 0 sec; 152 K/sec)
11:52:27 SYSLOG info kernel drbd0: drbd0_worker [734]: cstate SyncTarget --> Connected
11:52:27 AXSDBG debug boot.pl[331] Service::system(170) run drbdsetup /dev/drbd0 wait_sync

from other logging I know that wait_sync return and /dev/drbd reports that the ld is
consistent.

but then suddenly...(this is in the middle of the master's reboot)

11:56:15 SYSLOG err kernel drbd0: PingAck did not arrive in time.
11:56:15 SYSLOG info kernel drbd0: drbd0_asender [748]: cstate Connected --> NetworkFailure
11:56:15 SYSLOG info kernel drbd0: asender terminated
11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate NetworkFailure --> BrokenPipe
11:56:15 SYSLOG err kernel drbd0: short read expecting header on sock: r=-512
11:56:15 SYSLOG info kernel drbd0: worker terminated
11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate BrokenPipe --> Unconnected
11:56:15 SYSLOG info kernel drbd0: Connection lost.
11:56:15 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate Unconnected --> WFConnection
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(219) > drbdStatus
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   cs => WFConnection
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   st => Secondary/Unknown
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   ld => Consistent
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   ns => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   nr => 756
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   dw => 756
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   dr => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   al => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   bm => 6
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   lo => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   pe => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   ua => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(232)   ap => 0
11:56:15 AXSDBG debug tsrv-request[2171] Ha::Service::drbdStatus(239) < drbdStatus

my HA software detects a 'Consistent slave' and a 'Master gone', so he decides
to put this slave in 'primary'

11:56:15 AXSDBG debug tsrv-request[2171] Service::system(170) run drbdsetup /dev/drbd0 primary
11:56:15 SYSLOG info kernel drbd0: Secondary/Unknown --> Primary/Unknown
11:56:15 AXSDBG debug tsrv-request[2171] Service::system(170) run mount /dev/drbd0 /data
11:56:15 SYSLOG info kernel EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(147,0), internal journal

but here the other is back and look what's happening....

11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFConnection --> WFReportParams
11:57:37 SYSLOG info kernel drbd0: Handshake successful: DRBD Network Protocol version 74
11:57:37 SYSLOG info kernel drbd0: Connection established.
11:57:37 SYSLOG info kernel drbd0: I am(P): 1:00000002:00000001:0000000b:00000007:10
11:57:37 SYSLOG info kernel drbd0: Peer(S): 1:00000002:00000001:0000000c:00000006:10
11:57:37 SYSLOG err kernel drbd0: Current Primary shall become sync TARGET! Aborting to prevent data corruption.
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate WFReportParams --> StandAlone
11:57:37 SYSLOG err kernel drbd0: error receiving ReportParams, l: 72!
11:57:37 SYSLOG info kernel drbd0: asender terminated
11:57:37 SYSLOG info kernel drbd0: worker terminated
11:57:37 SYSLOG info kernel drbd0: drbd0_receiver [747]: cstate StandAlone --> StandAlone
11:57:37 SYSLOG info kernel drbd0: Connection lost.
11:57:37 SYSLOG info kernel drbd0: receiver terminated
11:57:56 AXSDBG debug tsrv-request[3257] Service::system(170) run drbdsetup /dev/drbd0 secondary
11:57:56 SYSLOG info kernel drbd0: Primary/Unknown --> Secondary/Unknown


# cat /proc/drbd 
version: 0.7.2 (api:75/proto:74)
SVN Revision: 1483:1485 build by ao at build, 2004-08-09 14:00:52
 0: cs:StandAlone st:Secondary/Unknown ld:Consistent
    ns:0 nr:756 dw:15180 dr:941 al:49 bm:55 lo:0 pe:0 ua:0 ap:0
 1: cs:Unconfigured

# drbdsetup /dev/drbd0 show
Lower device: 58:00   (lvm(58,0))
Meta device: 58:03   (lvm(58,3))
Meta index: 0
Disk options:

# lsmod | grep drbd
drbd                  128548   1 


More information about the drbd-user mailing list