Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Felix, Unfortunately there aren't any logs from the failure. When I restarted the server it went into recovery mode because it couldn't mount the drbd disk. At that point the filesystem is mounted read-only. I had to remount the / partition read-write and remove the drbd mount from fstab, then reboot again to get the machine back to a working state. The only logs I can offer are from trying to start it now. Mar 8 09:11:54 retv3130 kernel: events: mcg drbd: 2 Mar 8 09:11:54 retv3130 kernel: drbd: initialized. Version: 8.4.1 (api:1/proto:86-100) Mar 8 09:11:54 retv3130 kernel: drbd: GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root at retv3130.na.lzb.hq, 2012-03-07 10:30:49 Mar 8 09:11:54 retv3130 kernel: drbd: registered as block device major 147 Mar 8 09:11:54 retv3130 kernel: d-con drbd0: Starting worker thread (from drbdsetup [2070]) Mar 8 09:11:54 retv3130 kernel: block drbd1: disk( Diskless -> Attaching ) Mar 8 09:11:54 retv3130 kernel: d-con drbd0: Method to ensure write ordering: barrier Mar 8 09:11:54 retv3130 kernel: block drbd1: max BIO size = 4096 Mar 8 09:11:54 retv3130 kernel: block drbd1: drbd_bm_resize called with capacity == 314554936 Mar 8 09:11:54 retv3130 kernel: block drbd1: resync bitmap: bits=39319367 words=614366 pages=1200 Mar 8 09:11:54 retv3130 kernel: block drbd1: size = 150 GB (157277468 KB) Mar 8 09:11:54 retv3130 kernel: block drbd1: bitmap READ of 1200 pages took 18 jiffies Mar 8 09:11:54 retv3130 kernel: block drbd1: recounting of set bits took additional 4 jiffies Mar 8 09:11:54 retv3130 kernel: block drbd1: 150 GB (39319367 bits) marked out-of-sync by on disk bit-map. Mar 8 09:11:54 retv3130 kernel: block drbd1: Suspended AL updates Mar 8 09:11:54 retv3130 kernel: block drbd1: disk( Attaching -> UpToDate ) Mar 8 09:11:54 retv3130 kernel: block drbd1: attached to UUIDs 1C6D2A54B80C1533:0000000000000004:0000000000000000:0000000000000000 Mar 8 09:11:54 retv3130 kernel: block drbd1: role( Secondary -> Primary ) The addresses are definitely present. I can ssh to them from my workstation. That same configuration was working right up to the reboot also. [root at retv3130 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:50:56:A5:01:1C inet addr:10.170.1.221 Bcast:10.170.15.255 Mask:255.255.240.0 [root at retv3131 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:50:56:A5:01:1D inet addr:10.170.1.222 Bcast:10.170.15.255 Mask:255.255.240.0 Scot Kreienkamp skreien at la-z-boy.com -----Original Message----- From: Felix Frank [mailto:ff at mpexnet.de] Sent: Thursday, March 08, 2012 10:47 AM To: Scot Kreienkamp Cc: drbd-user at lists.linbit.com Subject: Re: [DRBD-user] network startup failure Hi, On 03/08/2012 03:30 PM, Scot Kreienkamp wrote: > Question 1: I set DRBD on each node to start automatically and to mount > /dev/drbd1 from fstab. Is that the correct way to do things? This is > in a lab instance just to learn about DRBD, I don't want to mess with > Pacemaker, Heartbeat, or anything like that right now. Manual failover > is ok for now. that's perfectly fine. > adjust net: drbd0:failed(connect:20) > > ] > > > > Which makes things seem like a network problem. Indeed. Your provided info so far is pretty good, what would be even more helpful are kernel logs at the time of failure. One question to rule out the obvious: > on retv3130.na.lzb.hq { > > device /dev/drbd1; > > disk /dev/mapper/vg_linuxtemplate-NFS; > > address 10.170.1.221:7789; > > meta-disk internal; > > } > > on retv3131.na.lzb.hq { > > device /dev/drbd1; > > disk /dev/mapper/vg_linuxtemplate-NFS; > > address 10.170.1.222:7789; > > meta-disk internal; > > } Are these addresses indeed present on the respective machines? Cheers, Felix This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, please note that you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.