Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Christoph Mitasch wrote: > Hi Nick! > > Have you tried > drbdadm connect all > on the Primary? > > Christoph > > On Wed, 2006-01-11 at 10:47 +0100, nick wrote: > >>I have two identical systems with drbd 0.7.10 (api:77) on two systems >>running CentOS 4 kernel 2.6.9-5.0.3, they have one resource in common >>hadisk, which is a software raid partition used for mail storage. >> >>Last night, something went funky on the secondary, and it had to be >>restarted (completely ran out of memory, it's not the first time it has >>happened), and when it came up, intead of connecting like it normally >>does, it just sat there waiting for a connection >> >>a cat /proc/drbd on the primary gives me this: >> >> 0: cs:WFReportParams st:Primary/Unknown ld:Consistent >> ns:75710352 nr:0 dw:80996564 dr:25603549 al:156060 bm:15221 lo:0 >>pe:0 ua:0 ap:0 >> >>on the secondary, I get this: >> >>0: cs:WFConnection st:Secondary/Unknown ld:Consistent >> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 >> >> >> >>as it says, waiting for connection, the same message it gives me if I >>run drbdadm cstate all. >> >>However, if I run drbdam (anything) on the primary, it gives me this >>message: >> >>Child process does not terminate! >>Exiting. >> >>Which is not good. >> >>The problem is, the primary is still up, and accepting data, so I really >>don't want to do anything rash, as I can't afford to lose mail (the >>secondary is out of date, by how much I'm not sure). What options do I have? >> >>I'll gladly give you any information you need, please help me get these >>two back and talking, possibly with as little data loss as possible. >> >> >>Nick >> >>_______________________________________________ >>drbd-user mailing list >>drbd-user at lists.linbit.com >>http://lists.linbit.com/mailman/listinfo/drbd-user > > Yes, I have: Child process does not terminate! Exiting. if I do this: ps ax | grep drbd 21789 pts/0 D 0:00 /sbin/drbdsetup /dev/drbd0 net 10.0.0.1:7788 10.0.0.2:7788 C --sndbuf-size=512k --timeout=60 --connect-int=10 --ping-int=10 --ko-count=4 --on-disconnect=reconnect 22101 pts/0 S 0:00 /sbin/drbdsetup /dev/drbd0 cstate 22426 pts/0 S 0:00 /sbin/drbdsetup /dev/drbd0 state 25502 pts/0 S 0:00 /sbin/drbdsetup /dev/drbd0 net 10.0.0.1:7788 10.0.0.2:7788 C --sndbuf-size=512k --timeout=60 --connect-int=10 --ping-int=10 --ko-count=4 --on-disconnect=reconnect 25665 pts/0 S+ 0:00 grep drbd it seems like process 21789 is the culprit, do you think I can kill it without risking corruption/data loss?