[DRBD-user] problem with drbd reconnection

nick nick at mobilia.it
Wed Jan 11 11:15:43 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.

Christoph Mitasch wrote:
> Hi Nick!
> Have you tried
> drbdadm connect all
> on the Primary?
> Christoph
> On Wed, 2006-01-11 at 10:47 +0100, nick wrote:
>>I have two identical systems with drbd 0.7.10 (api:77) on two systems 
>>running CentOS 4 kernel 2.6.9-5.0.3, they have one resource in common 
>>hadisk, which is a software raid partition used for mail storage.
>>Last night, something went funky on the secondary, and it had to be 
>>restarted (completely ran out of memory, it's not the first time it has 
>>happened), and when it came up, intead of connecting like it normally 
>>does, it just sat there waiting for a connection
>>a cat /proc/drbd on the primary gives me this:
>>  0: cs:WFReportParams st:Primary/Unknown ld:Consistent
>>     ns:75710352 nr:0 dw:80996564 dr:25603549 al:156060 bm:15221 lo:0 
>>pe:0 ua:0 ap:0
>>on the secondary, I get this:
>>0: cs:WFConnection st:Secondary/Unknown ld:Consistent
>>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
>>as it says, waiting for connection, the same message it gives me if I 
>>run drbdadm cstate all.
>>However, if I run drbdam (anything) on the primary, it gives me this 
>>Child process does not terminate!
>>Which is not good.
>>The problem is, the primary is still up, and accepting data, so I really 
>>don't want to do anything rash, as I can't afford to lose mail (the 
>>secondary is out of date, by how much I'm not sure). What options do I have?
>>I'll gladly give you any information you need, please help me get these 
>>two back and talking, possibly with as little data loss as possible.
>>drbd-user mailing list
>>drbd-user at lists.linbit.com
Yes, I have:

Child process does not terminate!

if I do this:

  ps ax | grep drbd
21789 pts/0    D      0:00 /sbin/drbdsetup /dev/drbd0 net C --sndbuf-size=512k --timeout=60 --connect-int=10 
--ping-int=10 --ko-count=4 --on-disconnect=reconnect
22101 pts/0    S      0:00 /sbin/drbdsetup /dev/drbd0 cstate
22426 pts/0    S      0:00 /sbin/drbdsetup /dev/drbd0 state
25502 pts/0    S      0:00 /sbin/drbdsetup /dev/drbd0 net C --sndbuf-size=512k --timeout=60 --connect-int=10 
--ping-int=10 --ko-count=4 --on-disconnect=reconnect
25665 pts/0    S+     0:00 grep drbd

it seems like process 21789 is the culprit, do you think I can kill it 
without risking corruption/data loss?

More information about the drbd-user mailing list