[DRBD-user] drbd turned on me, and bit my hand this morn

Brad Barnett lists at l8r.net
Tue Dec 6 16:46:30 CET 2005

Hey all,

drbd has been working fairly well for me until now, but something bizarre
happened this morning.

My secondary was taken offline, for regular maintenance.  Out of the blue,
the primary stopped serving files via NFS.  I could ping the box, but
before I had a chance to login, the box was rebooted locally (since it was
not responding...)

When our primary drbd box came back, drbd "locked" on boot.  The
Debian /etc/rc2.d/S70drbd startup script would run, spawning drbdadm. 
This would run, and just sit there for an unlimited period of time, as
if waiting for the secondary (or something else).

I moved the initial rc2.d startup script, and tried all variety of drbd
commands after fresh reboots, in order to get a response of some sort.  I
tried "drbdadm primary all" on this node, and so on, but drbdadm would
never return or provide any sort of error message upon start.  Typically,
I would get something like:

"drbd starting [d0 "

At which point, the process would lock... and eventually (minutes.. 15 or
more at one point in time) just sit there.  Commands given after this
happened, would timeout.

Eventually, I tried to remove the secondary from the /etc/drbd.conf config
file, but this resulted in drbd failing to run at all.  Returning the
lines, resulted in _two_ unconfigured lines appearing in my /proc/drbd
file when trying to restart drbd.

Prior to this, I only had one.

Eventually I had to move to the raw ext3 partition, to restore access for
my users.

So, how could this have been avoided.  Could anyone clue me into what I
did wrong?  At one point in time, I did edit the drbd config file, and set
any timeout value therein, to 10 seconds.  However, this was after I
edited the drbd.conf file, as mentioned above, and I had two nodes at this

Still, there must be a way to make a drbd partition primary.. no matter
what, regardless of any other circumstances.  Otherwise, drbd seems very
risky to me. :(

I'm going to use this downtime to move from raid5 to raid10.  However, I'm
concerned about this conversion process, now.  I am worried that I will
not be able to take my node, and get it to run in standalone mode, so I
can do my initial raid format, drbd prep, etc.  That is, I want to:

- take my secondary, prep the raid
- setup the raid as a drbd partition with a secondary configured, but
never connected
- prep the drbd partition, copy data from my ext3 (old drbd primary)
partition via rsync, setting up my new drbd partition
- make this live
- setup my old primary as secondary, do a full sync, and be on my way
with a new drbd setup

However, I am very worried as to what will happen without a secondary

Any help / guidance is greatly appreciated, here!


