[DRBD-user] problem with drbd reconnection

nick nick at mobilia.it
Wed Jan 11 10:47:40 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


I have two identical systems with drbd 0.7.10 (api:77) on two systems 
running CentOS 4 kernel 2.6.9-5.0.3, they have one resource in common 
hadisk, which is a software raid partition used for mail storage.

Last night, something went funky on the secondary, and it had to be 
restarted (completely ran out of memory, it's not the first time it has 
happened), and when it came up, intead of connecting like it normally 
does, it just sat there waiting for a connection

a cat /proc/drbd on the primary gives me this:

  0: cs:WFReportParams st:Primary/Unknown ld:Consistent
     ns:75710352 nr:0 dw:80996564 dr:25603549 al:156060 bm:15221 lo:0 
pe:0 ua:0 ap:0

on the secondary, I get this:

0: cs:WFConnection st:Secondary/Unknown ld:Consistent
     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0



as it says, waiting for connection, the same message it gives me if I 
run drbdadm cstate all.

However, if I run drbdam (anything) on the primary, it gives me this 
message:

Child process does not terminate!
Exiting.

Which is not good.

The problem is, the primary is still up, and accepting data, so I really 
don't want to do anything rash, as I can't afford to lose mail (the 
secondary is out of date, by how much I'm not sure). What options do I have?

I'll gladly give you any information you need, please help me get these 
two back and talking, possibly with as little data loss as possible.


Nick




More information about the drbd-user mailing list