[DRBD-user] Split brain problem.

Ivan Pavlenko i.pavlenko at unsw.edu.au
Mon Dec 5 04:39:38 CET 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


There is not easy.

# lsof | grep drbd
drbd1_wor  3414        root  cwd       DIR              253,0      
4096          2 /
drbd1_wor  3414        root  rtd       DIR              253,0      
4096          2 /
drbd1_wor  3414        root  txt   
unknown                                         /proc/3414/exe

# ps aux  |grep 3414
root      3414  0.0  0.0      0     0 ?        S    Dec02   0:00 
[drbd1_worker]
root      4690  0.0  0.0  61232   744 pts/1    R+   14:00   0:00 grep 3414
# lsof -p 3414
COMMAND    PID USER   FD      TYPE DEVICE SIZE NODE NAME
drbd1_wor 3414 root  cwd       DIR  253,0 4096    2 /
drbd1_wor 3414 root  rtd       DIR  253,0 4096    2 /
drbd1_wor 3414 root  txt   unknown                  /proc/3414/exe

kill -9 3414 doesn't do anything.  I even tried to restart two nodes 
simultaneously - no luck.

Ivan.

On 12/05/2011 01:50 PM, Digimer wrote:
> On 12/04/2011 09:25 PM, Ivan Pavlenko wrote:
>> Hi ALL,
>>
>> Digimer, thank you again for your answer I'm really appreciate it!
>> Unfortunately, I've tried to fixes split brain manually several times.
>> It doesn't work.
>>
>> # drbdadm disconnect r0
>> [root at infplsm017 ~]# drbdadm secondary r0
>> 1: State change failed: (-12) Device is held open by someone
>> Command 'drbdsetup 1 secondary' terminated with exit code 11
>> # drbdadm -- --discard-my-data connect r0
>> 1: Failure: (123) --discard-my-data not allowed when primary.
>> Command 'drbdsetup 1 net 10.10.24.10:7789 10.10.24.11:7789 C
>> --set-defaults --create-device --ping-timeout=20
>> --after-sb-2pri=disconnect --after-sb-1pri=discard-secondary
>> --after-sb-0pri=discard-zero-changes --allow-two-primaries
>> --discard-my-data' terminated with exit code 10
>> #
>>
>> I guess I need to stop cluster daemons, don't I?
>>
>> Thank you again,
>> Ivan
> Something is, as the error indicates, still trying to use the DRBD
> resource. Find it, stop it, and then you can demote the resource. Look
> at the 'lsof' command, that will probably help you find the program
> still using it.
>



More information about the drbd-user mailing list