[DRBD-user] network startup failure

Thu Mar 8 17:18:20 CET 2012

Hi Felix,

Unfortunately there aren't any logs from the failure.  When I restarted the server it went into recovery mode because it couldn't mount the drbd disk.  At that point the filesystem is mounted read-only.  I had to remount the / partition read-write and remove the drbd mount from fstab, then reboot again to get the machine back to a working state.  The only logs I can offer are from trying to start it now.

Mar  8 09:11:54 retv3130 kernel: events: mcg drbd: 2
Mar  8 09:11:54 retv3130 kernel: drbd: initialized. Version: 8.4.1 (api:1/proto:86-100)
Mar  8 09:11:54 retv3130 kernel: drbd: GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root at retv3130.na.lzb.hq, 2012-03-07 10:30:49
Mar  8 09:11:54 retv3130 kernel: drbd: registered as block device major 147
Mar  8 09:11:54 retv3130 kernel: d-con drbd0: Starting worker thread (from drbdsetup [2070])
Mar  8 09:11:54 retv3130 kernel: block drbd1: disk( Diskless -> Attaching )
Mar  8 09:11:54 retv3130 kernel: d-con drbd0: Method to ensure write ordering: barrier
Mar  8 09:11:54 retv3130 kernel: block drbd1: max BIO size = 4096
Mar  8 09:11:54 retv3130 kernel: block drbd1: drbd_bm_resize called with capacity == 314554936
Mar  8 09:11:54 retv3130 kernel: block drbd1: resync bitmap: bits=39319367 words=614366 pages=1200
Mar  8 09:11:54 retv3130 kernel: block drbd1: size = 150 GB (157277468 KB)
Mar  8 09:11:54 retv3130 kernel: block drbd1: bitmap READ of 1200 pages took 18 jiffies
Mar  8 09:11:54 retv3130 kernel: block drbd1: recounting of set bits took additional 4 jiffies
Mar  8 09:11:54 retv3130 kernel: block drbd1: 150 GB (39319367 bits) marked out-of-sync by on disk bit-map.
Mar  8 09:11:54 retv3130 kernel: block drbd1: Suspended AL updates
Mar  8 09:11:54 retv3130 kernel: block drbd1: disk( Attaching -> UpToDate )
Mar  8 09:11:54 retv3130 kernel: block drbd1: attached to UUIDs 1C6D2A54B80C1533:0000000000000004:0000000000000000:0000000000000000
Mar  8 09:11:54 retv3130 kernel: block drbd1: role( Secondary -> Primary )

The addresses are definitely present.  I can ssh to them from my workstation.  That same configuration was working right up to the reboot also.

[root at retv3130 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:50:56:A5:01:1C
          inet addr:10.170.1.221  Bcast:10.170.15.255  Mask:255.255.240.0

[root at retv3131 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:50:56:A5:01:1D
          inet addr:10.170.1.222  Bcast:10.170.15.255  Mask:255.255.240.0

Scot Kreienkamp
skreien at la-z-boy.com

-----Original Message-----
From: Felix Frank [mailto:ff at mpexnet.de]
Sent: Thursday, March 08, 2012 10:47 AM
To: Scot Kreienkamp
Cc: drbd-user at lists.linbit.com
Subject: Re: [DRBD-user] network startup failure

Hi,

On 03/08/2012 03:30 PM, Scot Kreienkamp wrote:
> Question 1: I set DRBD on each node to start automatically and to mount
> /dev/drbd1 from fstab.  Is that the correct way to do things?  This is
> in a lab instance just to learn about DRBD, I don't want to mess with
> Pacemaker, Heartbeat, or anything like that right now.  Manual failover
> is ok for now.

that's perfectly fine.

>      adjust net: drbd0:failed(connect:20)
>
> ]
>
>
>
> Which makes things seem like a network problem.

Indeed. Your provided info so far is pretty good, what would be even
more helpful are kernel logs at the time of failure.

One question to rule out the obvious:

>   on retv3130.na.lzb.hq {
>
>     device    /dev/drbd1;
>
>     disk      /dev/mapper/vg_linuxtemplate-NFS;
>
>     address   10.170.1.221:7789;
>
>     meta-disk internal;
>
>   }
>
>   on retv3131.na.lzb.hq {
>
>     device    /dev/drbd1;
>
>     disk      /dev/mapper/vg_linuxtemplate-NFS;
>
>     address   10.170.1.222:7789;
>
>     meta-disk internal;
>
>   }

Are these addresses indeed present on the respective machines?

Cheers,
Felix

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.