[DRBD-user] tips for debugging random reboots

Jon Duggan jon at host-it.co.uk
Thu Dec 18 14:31:13 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Guys,

I've recently been testing DRBD & OCFS2 in dual primary mode, all setup went
as expected and have had no problems until attempting to transfer a larger
amount of data onto the filesystem.  The data is about 25gig across approx
400k files.  I'm using rsync to copy the files from the original host.

I've attempted 3 times to copy the data and each time the node i'm copying
the data to randomly reboots during rsync with absolutely nothing written in
logs.

I've done the same copy to a local ext3 filesystem on the cluster node which
ran successfully (the drbd device is a partition on the same physical disk
as the local /  ext3 filesystem, so unless the issue is specific to the
sectors in that partition i believe this test rules out the hardware being
an issue)

Drbd sits on top of an md device in raid 1, I'm using etch with backported
`etchnhalf` 2.6.24 kernel. DRBD version;

node-2:~# cat /proc/drbd
version: 8.2.7 (api:88/proto:86-88)
GIT-hash: 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by root at node-2,
2008-11-18 18:29:12


Below is the drbd.conf (same on both machines), we're not currently using
any handlers to reboot the machine if it detects split brain/issues and the
config is simple at this stage.

The two machines are in geographically different locations with 3 router
hops and about 2ms latency between them (dedicated fibre backhauls between
the two), we'll be playing with the protocol types to speed things up but at
this stage the reboots are a concern.

The question is, there's nothing in the logs prior to reboot, what can one
do to start debugging the issue? I understand the issue is possibly ocfs2
specific but I have to start somewhere.. any pointers or suggestions are
greatly appreciated

Regards

Jon Duggan

# cat /etc/drbd.conf
global {
  usage-count no;
  disable-ip-verification;
}
common {
  protocol C;
}
resource r0 {
  startup {
   become-primary-on both;
  }

  syncer {
   rate 50M;
  }

  net {
   allow-two-primaries;
  }

  on node-2 {
    device    /dev/drbd1;
    disk      /dev/md2;
    address   192.168.1.2:7789;
    meta-disk internal;
  }

  on node-1 {
    device    /dev/drbd1;
    disk      /dev/md2;
    address   194.150.x.x:7789;
    meta-disk internal;
  }
}








More information about the drbd-user mailing list