[DRBD-user] [Ocfs2-users] Unexplained reboots in DRBD82 + OCFS2 setup

Sunil Mushran sunil.mushran at oracle.com
Wed Jun 24 21:02:12 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Do you have a separate network path for drbd traffic? If you do
not, then you are probably overloading the network. In this case,
I believe drbd is unable to replicate the ios fast enough and thus
is blocking the o2cb disk heartbeat. One workaround is to increase
the O2CB_HEARTBEAT_THRESHOLD to more than the default of 60 secs.
Refer to the ocfs2 faq or ocfs2 1.4 user's guide for more on this.

And if you want to capture the logs, setup netconsole.

Kris Buytaert wrote:
> We're trying to setup a dual-primary DRBD environment, with a shared
> disk with either OCFS2 or GFS.   The environment is a Centos 5.3 with
> DRBD82 (but also tried with DRBD83 from testing) .
>
> Setting up a single primary disk and running bonnie++ on it works.
> Setting up a dual-primary disk, only mounting it on one node (ext3) and
> running bonnie++  works
>
> When setting up ocfs2 on the /dev/drbd0 disk and mounting it on both
> nodes, basic functionality seems in place but usually less than 5-10
> minutes after I start bonnie++ as a test on one of the nodes , both
> nodes power cycle  with no errors in the logfiles, just a crash.
>
> When at the console at the time of crash it looks like a disk IO (you
> can type , but actions happen)  block happens  then a reboot, no panics,
> no oops , nothing. ( sysctl panic values set to timeouts etc )
> Setting up a dual-primary disk , with ocfs2 only mounting it on one node
> and starting bonnie++ causes only that node to crash.
>
> On DRBD level I get the following error when that node dissapears
>
> drbd0: PingAck did not arrive in time.
> drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure )
> pdsk(UpToDate -> DUnknown )
> drbd0: asender terminated
> drbd0: Terminating asender thread
>
> That however is an expected error because of the reboot.
>
> At first I assumed OCFS2 to be the root of this problem ..so I moved
> forward and setup an ISCSI target on a 3rd node, and used that device
> with the same OCFS2 setup. There no crashes occured and bonnie++
> flawlessly completed it test run.
>
> So my attention went  back to the combination of DRBD and OCFS 
>
> I tried both DRBD 8.2 drbd82-8.2.6-1.el5.centos kmod-drbd82-8.2.6-2  and
> the 83 variant from Centos Testing
>
> At first I was trying with the ocfs2 1.4.1-1.el5.i386.rpm verson but
> upgrading to  1.4.2-1.el5.i386.rpm didn't change the behaviour
>
>
> Anyone has an idea on this ? 
> How can we get more debug info from OCFS2  , apart from heartbeat
> tracing which doesn't learn me nothing yet ..  in order to potentially
> file a valuable bug report.
>   




More information about the drbd-user mailing list