[DRBD-user] drbd turned on me, and bit my hand this morn

Tue Dec 6 17:56:52 CET 2005

Just to add to this.

When drbdadm is in this "locked" state, I can not kill it, or kill -9 it. 
This means that I must reboot the box, instead of rmmod the module and
make my configuration changes.

Because of this, I do not think this is the expected behaviour.

I'm using 0.7.10-3 (The latest Debian stable version), and found the same
behaviour with 0.7.14 freshly installed from tarballs.

On Tue, 6 Dec 2005 10:46:30 -0500
Brad Barnett <lists at l8r.net> wrote:

> 
> 
> Hey all,
> 
> drbd has been working fairly well for me until now, but something
> bizarre happened this morning.
> 
> My secondary was taken offline, for regular maintenance.  Out of the
> blue, the primary stopped serving files via NFS.  I could ping the box,
> but before I had a chance to login, the box was rebooted locally (since
> it was not responding...)
> 
> When our primary drbd box came back, drbd "locked" on boot.  The
> Debian /etc/rc2.d/S70drbd startup script would run, spawning drbdadm. 
> This would run, and just sit there for an unlimited period of time, as
> if waiting for the secondary (or something else).
> 
> I moved the initial rc2.d startup script, and tried all variety of drbd
> commands after fresh reboots, in order to get a response of some sort. 
> I tried "drbdadm primary all" on this node, and so on, but drbdadm would
> never return or provide any sort of error message upon start. 
> Typically, I would get something like:
> 
> "drbd starting [d0 "
> 
> At which point, the process would lock... and eventually (minutes.. 15
> or more at one point in time) just sit there.  Commands given after this
> happened, would timeout.
> 
> Eventually, I tried to remove the secondary from the /etc/drbd.conf
> config file, but this resulted in drbd failing to run at all.  Returning
> the lines, resulted in _two_ unconfigured lines appearing in my
> /proc/drbd file when trying to restart drbd.
> 
> Prior to this, I only had one.
> 
> Eventually I had to move to the raw ext3 partition, to restore access
> for my users.
> 
> So, how could this have been avoided.  Could anyone clue me into what I
> did wrong?  At one point in time, I did edit the drbd config file, and
> set any timeout value therein, to 10 seconds.  However, this was after I
> edited the drbd.conf file, as mentioned above, and I had two nodes at
> this point.
> 
> Still, there must be a way to make a drbd partition primary.. no matter
> what, regardless of any other circumstances.  Otherwise, drbd seems very
> risky to me. :(
> 
> I'm going to use this downtime to move from raid5 to raid10.  However,
> I'm concerned about this conversion process, now.  I am worried that I
> will not be able to take my node, and get it to run in standalone mode,
> so I can do my initial raid format, drbd prep, etc.  That is, I want to:
> 
> - take my secondary, prep the raid
> - setup the raid as a drbd partition with a secondary configured, but
> never connected
> - prep the drbd partition, copy data from my ext3 (old drbd primary)
> partition via rsync, setting up my new drbd partition
> - make this live
> - setup my old primary as secondary, do a full sync, and be on my way
> with a new drbd setup
> 
> However, I am very worried as to what will happen without a secondary
> now!
> 
> Any help / guidance is greatly appreciated, here!
> 
> Thanks
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user