Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Well, I will try to be clear as possible. Sorry if it is not the case. It is a bit long. On a system (RH 4.2U6), I had a DRBD 8.2.5 built from source, installed as following: - Sources were in /usr/local/drbd-8.2.5/source/drbd-8.2.5 - Symbolic link /usr/local/drbd -> /usr/local/drbd-8.2.5 - Configuration file was /usr/local/drbd-8.2.5/etc/drbd.conf - DRBD module was /lib/modules/`uname -r`/kernel/drivers/block/drbd.ko - Userland tools were in /sbin I was happy with this configuration, but a problem appears with one of my tests. I can produce this problem in our labs, not every time but enought often. So, I said to myself, lets install the last version (8.3.0) in order to test if the problem is fixed or not. In the same time, I would to do a proper installation to be sure to use my DRBD, that is: - Sources are in /usr/local/drbd-8.3.0/source/drbd-8.3.0 - Symbolic link /usr/local/drbd -> /usr/local/drbd-8.3.0 - Build with "make PREFIX=/usr/local/drbd all && make PREFIX=/usr/local/drbd install". - Configuration file is /usr/local/drbd-8.3.0/etc/drbd.conf (same than the 8.2.5 version) - DRBD module is now in /usr/local/drbd-8.3.0/\ lib/modules/`uname -r`/kernel/drivers/block/drbd.ko - Userland tools are now in /usr/local/drbd-8.3.0/sbin My first problem was that the drbdadm tool search the new /var/lib/drbd//drbd-minor-0.conf file, but this file was in /usr/local/drbd-8.3.0/var/lib/drbd directory. So, I have modified sources to take in account the make PREFIX=... command for this directory. I had posted a patch on the dev mailing list. In order to be able to roll back to the 8.2.5 version, I have left the userland tools in /sbin and modules in /lib/modules/`uname -r`/kernel/drivers/block/drbd.ko I have modified my PATH in order to take first the /usr/local/drbd-8.3.0/sbin I am positive that the loaded kernel module is the right one, and that used userland tools are the right one. All was OK, but after some tests, a new problem has appeared, on a "drbd connect resource" command: -8<--------------------------------------------------------------------------- Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( StandAlone -> Unconnected ) Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Starting receiver thread (from drbd1_worker [31777]) Feb 23 10:55:42 rh4-2_1 kernel: drbd0: receiver (re)started Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( Unconnected -> WFConnection ) Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Handshake successful: Agreed network protocol version 89 Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( WFConnection -> WFReportParams ) Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Starting asender thread (from drbd1_receiver [713]) Feb 23 10:55:42 rh4-2_1 kernel: drbd0: data-integrity-alg: <not-used> Feb 23 10:55:42 rh4-2_1 kernel: drbd0: drbd_sync_handshake: Feb 23 10:55:42 rh4-2_1 kernel: drbd0: self C0B886A078EDF8CE:0000000000000000:1BA32C69B29346D5:3191C4C20BDF4701 Feb 23 10:55:42 rh4-2_1 kernel: drbd0: peer EEE9BF7DEF3EEA38:C0B886A078EDF8CE:9B8B7DCBC6A7F5D2:6889940B860C7E9A Feb 23 10:55:42 rh4-2_1 kernel: drbd0: uuid_compare()=-1 by rule 5 Feb 23 10:55:42 rh4-2_1 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID ) Feb 23 10:55:42 rh4-2_1 kernel: drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Feb 23 10:55:42 rh4-2_1 kernel: drbd0: helper command: */sbin/drbdadm before-resync-target minor-0 exit code 3 (0x300)* Feb 23 10:55:42 rh4-2_1 kernel: drbd0: before-resync-target handler returned 3, dropping connection. Feb 23 10:55:42 rh4-2_1 kernel: drbd0: peer( Secondary -> Unknown ) conn( WFSyncUUID -> Disconnecting ) pdsk( UpToDate -> DUnknown ) Feb 23 10:55:42 rh4-2_1 kernel: drbd0: asender terminated Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Terminating asender thread Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Connection closed Feb 23 10:55:42 rh4-2_1 kernel: drbd0: conn( Disconnecting -> StandAlone ) Feb 23 10:55:42 rh4-2_1 kernel: drbd0: receiver terminated Feb 23 10:55:42 rh4-2_1 kernel: drbd0: Terminating receiver thread -8<--------------------------------------------------------------------------- Seeing these messages, I thought that the DRBD module was calling the wrong drbdadm userland command. So I have renamed previous userland tools and kernel module: mv -i /sbin/drbdadm in /sbin/drbdadm-8.5.2 mv -i /sbin/drbdsetup in /sbin/drbdsetup-8.5.2 mv -i /sbin/drbdmeta in /sbin/drbdmeta-8.5.2 mv -i /lib/modules/`uname -r`/kernel/drivers/block/drbd.ko /lib/modules/`uname -r`/kernel/drivers/block/drbd-8.2.5.ko So now, there is no way to call the old installation. I have issued a new "drbd connect resource" command: -8<--------------------------------------------------------------------------- Feb 23 10:59:41 rh4-2_1 kernel: drbd0: conn( StandAlone -> Unconnected ) Feb 23 10:59:41 rh4-2_1 kernel: drbd0: Starting receiver thread (from drbd1_worker [31777]) Feb 23 10:59:41 rh4-2_1 kernel: drbd0: receiver (re)started Feb 23 10:59:41 rh4-2_1 kernel: drbd0: conn( Unconnected -> WFConnection ) Feb 23 10:59:42 rh4-2_1 kernel: drbd0: Handshake successful: Agreed network protocol version 89 Feb 23 10:59:42 rh4-2_1 kernel: drbd0: conn( WFConnection -> WFReportParams ) Feb 23 10:59:42 rh4-2_1 kernel: drbd0: Starting asender thread (from drbd1_receiver [4357]) Feb 23 10:59:42 rh4-2_1 kernel: drbd0: data-integrity-alg: <not-used> Feb 23 10:59:42 rh4-2_1 kernel: drbd0: drbd_sync_handshake: Feb 23 10:59:42 rh4-2_1 kernel: drbd0: self 5B1E45C971132AAA:0000000000000000:1BA32C69B29346D5:3191C4C20BDF4701 Feb 23 10:59:42 rh4-2_1 kernel: drbd0: peer EEE9BF7DEF3EEA38:5B1E45C971132AAB:C0B886A078EDF8CE:9B8B7DCBC6A7F5D2 Feb 23 10:59:42 rh4-2_1 kernel: drbd0: uuid_compare()=-1 by rule 5 Feb 23 10:59:42 rh4-2_1 kernel: drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) Feb 23 10:59:42 rh4-2_1 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID ) Feb 23 10:59:42 rh4-2_1 kernel: drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 Feb 23 10:59:42 rh4-2_1 kernel: drbd0: helper command: */sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)* Feb 23 10:59:42 rh4-2_1 kernel: drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) Feb 23 10:59:42 rh4-2_1 kernel: drbd0: Began resync as SyncTarget (will sync 7177172 KB [1794293 bits set]). -8<--------------------------------------------------------------------------- It has worked, the synchronization was OK, but the message gives again /sbin/drbdadm, and it is not possible ! I think that the right /usr/local/drbd-8.3.0/sbin/drbdadm was used, else, I don't know how it can find an other drbdadm binary. -8<--------------------------------------------------------------------------- Feb 23 11:03:43 rh4-2_1 kernel: drbd0: Resync done (total 241 sec; paused 0 sec; 29780 K/sec) Feb 23 11:03:43 rh4-2_1 kernel: drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) Feb 23 11:03:43 rh4-2_1 kernel: drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 Feb 23 11:03:43 rh4-2_1 kernel: drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0) -8<--------------------------------------------------------------------------- And then, since this time, I've got the strange message on drbdadm {role|cstate|dstate}: -8<--------------------------------------------------------------------------- # drbdadm -c ../etc/drbd.conf cstate drbd_res0; Secondary/Secondary # (43) sync_progress = (integer) 83 [len: 4] -8<--------------------------------------------------------------------------- Is there anything I can do that could help me to understand why I have got this message, because I have checked a bit the source, but it is not so easy to understand them (NL_PACKET(...)...). My first feeling is that this is the kernel module which prints the message. Is it possible ? Hope that was clear. Many thanks in advance. Best regards. -- Hervé GAUTIER