[DRBD-user] Restarting IPtables caused split-brain and OCFS2 corruption?

Herman herman6x9 at ymail.com
Tue May 17 18:19:01 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi, been trying to google for this, but haven't found anything quite
matching.  Sorry if this is covered elsewhere.

Using:  RHEL 6 / DRBD 8.3.10-2 kmod from ElRepo / OCFS2 compiled from
Redhat's kernel source.

I have a setup with a Primary/Primary OCFS2 setup, which I set up using
the following instructions:

http://wiki.virtastic.com/display/howto/Clustered+Filesystem+with+DRBD
+and+OCFS2+on+CentOS+5.5

Anyways, I have a pair of bonded GigE network ports hooked up to a
switch to link up the two servers.  I wanted it to be on a dedicated
network, but networking just gave me regular IPs which are on the same
subnet as the primary interfaces.  SELinux is permissive.  Everything
was working, but then...

I made a change to IPTables, and did a "service iptables restart", and
next thing I knew, I had a split brain.

And worse, somehow the OCFS2 filesystem started giving errors.  I don't
know if it really was a corruption, but the error messages came in
pretty fast.  I recovered from the split-brain manually, but that didn't
stop the messages.  It didn't clear up even with fsck.ocfs2.  I finally
had to find out what that inode was pointing to and remove it before the
messages stopped.

Testing later, it looks like sometimes when I restart IPtables, I get a
split-brain.  But I haven't replicated the OCFS2 corruption.  I would
have thought that short time that IPtables restarts in wouldn't cause a
split-brain, but I guess it does sometimes.  Not sure why it sometimes
gets a split and sometimes not.  Is this normal?   Should I use
"iptables -A" to add rules instead of doing a restart?

Would posting the /etc/drbd.conf and /etc/sysconfig/iptables help?  Any
other info?

I got the following /var/log/messages after restarting iptables:

Apr 18 07:52:27 server-2 kernel: block drbd1: asender terminated
Apr 18 07:52:27 server-2 kernel: block drbd1: Terminating asender thread
Apr 18 07:52:27 server-2 kernel: block drbd1: sock_sendmsg returned -32
Apr 18 07:52:27 server-2 kernel: block drbd1: short sent ReportUUIDs
size=56 sent=0
Apr 18 07:52:27 server-2 kernel: block drbd1: Connection closed
Apr 18 07:52:28 server-2 kernel: block drbd1: conn( NetworkFailure ->
Unconnected )
Apr 18 07:52:28 server-2 kernel: block drbd1: receiver terminated
Apr 18 07:52:28 server-2 kernel: block drbd1: Restarting receiver thread
Apr 18 07:52:28 server-2 kernel: block drbd1: receiver (re)started
Apr 18 07:52:28 server-2 kernel: block drbd1: conn( Unconnected ->
WFConnection )
Apr 18 07:52:28 server-2 kernel: block drbd1: Handshake successful:
Agreed network protocol version 96
Apr 18 07:52:28 server-2 kernel: block drbd1: conn( WFConnection ->
WFReportParams )
Apr 18 07:52:28 server-2 kernel: block drbd1: Starting asender thread
(from drbd1_receiver [5944])
Apr 18 07:52:28 server-2 kernel: block drbd1: data-integrity-alg:
<not-used>
Apr 18 07:52:28 server-2 kernel: block drbd1: drbd_sync_handshake:
Apr 18 07:52:28 server-2 kernel: block drbd1: self
7891B6FC1469AE31:F7F25E6B00607741:571973CB1489F5B9:571873CB1489F5B9
bits:1 flags:0
Apr 18 07:52:28 server-2 kernel: block drbd1: peer
AA6330CFB23C2663:F7F25E6B00607741:571973CB1489F5B9:571873CB1489F5B9
bits:73 flags:0
Apr 18 07:52:28 server-2 kernel: block drbd1: uuid_compare()=100 by rule
90
Apr 18 07:52:28 server-2 kernel: block drbd1: helper
command: /sbin/drbdadm initial-split-brain minor-1
Apr 18 07:52:29 server-2 kernel: block drbd1: helper
command: /sbin/drbdadm initial-split-brain minor-1 exit code 0 (0x0)
Apr 18 07:52:29 server-2 kernel: block drbd1: Split-Brain detected but
unresolved, dropping connection!
Apr 18 07:52:29 server-2 kernel: block drbd1: helper
command: /sbin/drbdadm split-brain minor-1
Apr 18 07:52:29 server-2 notify-split-brain.sh[8606]: invoked for res0
Apr 18 07:52:29 server-2 kernel: block drbd1: helper
command: /sbin/drbdadm split-brain minor-1 exit code 0 (0x0)
Apr 18 07:52:29 server-2 kernel: block drbd1: conn( WFReportParams ->
Disconnecting )
Apr 18 07:52:29 server-2 kernel: block drbd1: error receiving
ReportState, l: 4!
Apr 18 07:52:29 server-2 kernel: block drbd1: meta connection shut down
by peer.
Apr 18 07:52:29 server-2 kernel: block drbd1: asender terminated
Apr 18 07:52:29 server-2 kernel: block drbd1: Terminating asender thread
Apr 18 07:52:29 server-2 kernel: block drbd1: Connection closed
Apr 18 07:52:29 server-2 kernel: block drbd1: conn( Disconnecting ->
StandAlone )
Apr 18 07:52:29 server-2 kernel: block drbd1: receiver terminated
Apr 18 07:52:29 server-2 kernel: block drbd1: Terminating receiver
thread

And after that, the following messages kept coming in really fast...
rebooting, switching primary nodes, fsck, all didn't work.  Only finding
the actual owner of the inode and removing it worked:

Apr 18 07:53:07 server-2 kernel: (8163,0):ocfs2_read_virt_blocks:853
ERROR: Inode #5377026 contains a hole at offset 466944
Apr 18 07:53:07 server-2 kernel: (8163,0):ocfs2_read_dir_block:533
ERROR: status = -5
Apr 18 07:53:08 server-2 kernel: (8163,12):ocfs2_read_virt_blocks:853
ERROR: Inode #5377026 contains a hole at offset 466944
Apr 18 07:53:08 server-2 kernel: (8163,12):ocfs2_read_dir_block:533
ERROR: status = -5
Apr 18 07:53:08 server-2 kernel: (8508,0):ocfs2_read_virt_blocks:853
ERROR: Inode #5377026 contains a hole at offset 466944
Apr 18 07:53:08 server-2 kernel: (8508,0):ocfs2_read_dir_block:533
ERROR: status = -5
Apr 18 07:53:08 server-2 kernel: (8508,0):ocfs2_read_virt_blocks:853
ERROR: Inode #5377026 contains a hole at offset 466944
Apr 18 07:53:08 server-2 kernel: (8508,0):ocfs2_read_dir_block:533
ERROR: status = -5
Apr 18 07:53:08 server-2 kernel: (8163,16):ocfs2_read_virt_blocks:853
ERROR: Inode #5377026 contains a hole at offset 466944
Apr 18 07:53:08 server-2 kernel: (8163,16):ocfs2_read_dir_block:533
ERROR: status = -5

Thanks!
Herman

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110517/c17f9136/attachment.htm>


More information about the drbd-user mailing list