Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, Feb 24, 2009 at 04:51:44PM +0100, GAUTIER Hervé wrote: > > Hi, > > Well, I will try to be clear as possible. > Sorry if it is not the case. > It is a bit long. > > On a system (RH 4.2U6), I had a DRBD 8.2.5 built from source, installed > as following: > - Sources were in /usr/local/drbd-8.2.5/source/drbd-8.2.5 > - Symbolic link /usr/local/drbd -> /usr/local/drbd-8.2.5 > - Configuration file was /usr/local/drbd-8.2.5/etc/drbd.conf > - DRBD module was /lib/modules/`uname -r`/kernel/drivers/block/drbd.ko > - Userland tools were in /sbin > > > I was happy with this configuration, but a problem appears with one of > my tests. I can produce this problem in our labs, not every time but > enought often. > So, I said to myself, lets install the last version (8.3.0) in order to > test if the problem is fixed or not. > In the same time, I would to do a proper installation to be sure to use > my DRBD, that is: > - Sources are in /usr/local/drbd-8.3.0/source/drbd-8.3.0 > - Symbolic link /usr/local/drbd -> /usr/local/drbd-8.3.0 > - Build with "make PREFIX=/usr/local/drbd all > && make PREFIX=/usr/local/drbd install". > - Configuration file is /usr/local/drbd-8.3.0/etc/drbd.conf > (same than the 8.2.5 version) > - DRBD module is now in /usr/local/drbd-8.3.0/\ > lib/modules/`uname -r`/kernel/drivers/block/drbd.ko > - Userland tools are now in /usr/local/drbd-8.3.0/sbin > > My first problem was that the drbdadm tool search the new > /var/lib/drbd//drbd-minor-0.conf file, but this file was in > /usr/local/drbd-8.3.0/var/lib/drbd directory. > So, I have modified sources to take in account the make PREFIX=... > command for this directory. I had posted a patch on the dev mailing list. > > In order to be able to roll back to the 8.2.5 version, I have left the > userland tools in /sbin and modules in > /lib/modules/`uname -r`/kernel/drivers/block/drbd.ko > I have modified my PATH in order to take first the > /usr/local/drbd-8.3.0/sbin we never did that. it probably won't work. it may have worked in the old days where we did not have any kernel->userland callbacks. but it possibly can be made to work, anyways. > I am positive that the loaded kernel module is the right one, and that > used userland tools are the right one. > > All was OK, but after some tests, a new problem has appeared, on a "drbd > connect resource" command: > -8<--------------------------------------------------------------------------- > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( StandAlone -> Unconnected ) > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Starting receiver thread (from drbd1_worker [31777]) > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: receiver (re)started > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( Unconnected -> WFConnection ) > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Handshake successful: Agreed network protocol version 89 > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( WFConnection -> WFReportParams ) > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Starting asender thread (from drbd1_receiver [713]) > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: data-integrity-alg: <not-used> > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: drbd_sync_handshake: > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: self C0B886A078EDF8CE:0000000000000000:1BA32C69B29346D5:3191C4C20BDF4701 > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: peer EEE9BF7DEF3EEA38:C0B886A078EDF8CE:9B8B7DCBC6A7F5D2:6889940B860C7E9A > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: uuid_compare()=-1 by rule 5 > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID ) > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: helper command: */sbin/drbdadm before-resync-target minor-0 exit code 3 (0x300)* > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: before-resync-target handler returned 3, dropping connection. > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: peer( Secondary -> Unknown ) conn( WFSyncUUID -> Disconnecting ) pdsk( UpToDate -> DUnknown ) > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: asender terminated > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Terminating asender thread > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Connection closed > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( Disconnecting -> StandAlone ) > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: receiver terminated > Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Terminating receiver thread > -8<--------------------------------------------------------------------------- > > Seeing these messages, I thought that the DRBD module was calling the > wrong drbdadm userland command. drbd modules calls the usermode_helper program, which is a module parameter, defaulting to hardcoded /sbin/drbdadm. you can change that at runtime by echo /usr/local/whatever > /sys/module/drbd/parameters/usermode_helper > So I have renamed previous userland tools and kernel module: > mv -i /sbin/drbdadm in /sbin/drbdadm-8.5.2 > mv -i /sbin/drbdsetup in /sbin/drbdsetup-8.5.2 > mv -i /sbin/drbdmeta in /sbin/drbdmeta-8.5.2 > mv -i /lib/modules/`uname -r`/kernel/drivers/block/drbd.ko > /lib/modules/`uname -r`/kernel/drivers/block/drbd-8.2.5.ko > > So now, there is no way to call the old installation. > I have issued a new "drbd connect resource" command: > -8<--------------------------------------------------------------------------- > Feb 23 10:59:41 rh4-2_1 kernel: drbd0: conn( StandAlone -> Unconnected ) > Feb 23 10:59:41 rh4-2_1 kernel: drbd0: Starting receiver thread (from drbd1_worker [31777]) > Feb 23 10:59:41 rh4-2_1 kernel: drbd0: receiver (re)started > Feb 23 10:59:41 rh4-2_1 kernel: drbd0: conn( Unconnected -> WFConnection ) > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: Handshake successful: Agreed network protocol version 89 > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: conn( WFConnection -> WFReportParams ) > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: Starting asender thread (from drbd1_receiver [4357]) > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: data-integrity-alg: <not-used> > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: drbd_sync_handshake: > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: self 5B1E45C971132AAA:0000000000000000:1BA32C69B29346D5:3191C4C20BDF4701 > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: peer EEE9BF7DEF3EEA38:5B1E45C971132AAB:C0B886A078EDF8CE:9B8B7DCBC6A7F5D2 > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: uuid_compare()=-1 by rule 5 > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID ) > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: helper command: */sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)* > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) > Feb 23 10:59:42 rh4-2_1 kernel: drbd0: Began resync as SyncTarget (will sync 7177172 KB [1794293 bits set]). > -8<--------------------------------------------------------------------------- > > It has worked, the synchronization was OK, but the message gives again > /sbin/drbdadm, and it is not possible ! > I think that the right /usr/local/drbd-8.3.0/sbin/drbdadm was used, > else, I don't know how it can find an other drbdadm binary. apparently, the call_usermode_helper in your kernel version silently ignores exec failures. this may correspond to upstream (kernel.org) commit 111dbe0c8a21dffa473239861be47ebc87f593b3 Author: Björn Steinbrink <B.Steinbrink at gmx.de> Date: Fri Sep 29 02:00:46 2006 -0700 [PATCH] Fix ____call_usermodehelper errors being silently ignored If ____call_usermodehelper fails, we're not interested in the child process' exit value, but the real error, so let's stop wait_for_helper from overwriting it in that case. Issue discovered by Benedikt Böhm while working on a Linux-VServer usermode helper. Signed-off-by: Björn Steinbrink <B.Steinbrink at gmx.de> Cc: Rusty Russell <rusty at rustcorp.com.au> Signed-off-by: Andrew Morton <akpm at osdl.org> Signed-off-by: Linus Torvalds <torvalds at osdl.org> which happened sometime after 2.6.18, and may or may not have been backported to vendor kernels. > -8<--------------------------------------------------------------------------- > Feb 23 11:03:43 rh4-2_1 kernel: drbd0: Resync done (total 241 sec; paused 0 sec; 29780 K/sec) > Feb 23 11:03:43 rh4-2_1 kernel: drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) > Feb 23 11:03:43 rh4-2_1 kernel: drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 > Feb 23 11:03:43 rh4-2_1 kernel: drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) > -8<--------------------------------------------------------------------------- > > And then, since this time, I've got the strange message on drbdadm > {role|cstate|dstate}: > > -8<--------------------------------------------------------------------------- > # drbdadm -c ../etc/drbd.conf cstate drbd_res0; > Secondary/Secondary > # (43) sync_progress = (integer) 83 [len: 4] > -8<--------------------------------------------------------------------------- ask your bash which drbdadm it uses. maybe you need to "hash -r" ? as I wrote earlier, this is an expected symptom of using the older drbdadm/drbdsetup against the newer module. which drbdadm type drbdadm maybe do an strace -e execve -f drbdadm state state drbd_res0 > Is there anything I can do that could help me to understand why I have > got this message, don't install into some PREFIX. we never did that, therefore it is likely to break in funny ways. > because I have checked a bit the source, but it is not so easy to > understand them (NL_PACKET(...)...). > My first feeling is that this is the kernel module which prints the > message. Is it possible ? no. > Hope that was clear. perfectly. ;) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed