Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Philipp, I have spent a bit more time looking into this. I am now working from the latest version of the git repository, which seems to have had your changes applied. I have gone back to basics, since I was concered I was feeding you mis-information. I have built the userland binaries, and the kernel module from the latest source (git HEAD, no patches) I have copied them onto both machines (both exactly the same machines) My DRBD config is as follows: *********************** global { usage-count yes; } common { syncer { rate 10M; } } resource r0 { protocol C; device /dev/drbd0; meta-disk internal; net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } startup { # become-primary-on both; } on nasty { address XXX.XXX.XXX.XXX:7788; disk /dev/sda7; } on coral { address XXX.XXX.XXX.XXX:7788; disk /dev/sda7; } } ***************************** I am running the following commands on both machines dd if=/dev/zero of=/dev/sda7 bs=10M nasty:~# drbdadm create-md r0 Writing meta data... initializing activity log NOT initialized bitmap New drbd meta data block successfully created. success nasty:~# modprobe drbd </var/log/kern.log Jun 16 22:32:14 nasty kernel: drbd: initialised. Version: 8.3.2rc1 (api:88/proto:86-90) Jun 16 22:32:14 nasty kernel: drbd: GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, 2009-06-16 22:04:24 Jun 16 22:32:14 nasty kernel: drbd: registered as block device major 147 Jun 16 22:32:14 nasty kernel: drbd: minor_table @ 0xc6f80620 nasty:~# drbdadm up r0 0: Failure: (126) UnknownMandatoryTag Command 'drbdsetup 0 net XXX.XXX.XXX.XXX:7788 XXX.XXX.XXX.XXX:7788 C --set-defaults --create-device --allow-two-primaries --after-sb-0pri=discard-zero-changes --after-sb-1pri=discard-secondary --after-sb-2pri=disconnect' terminated with exit code 10 </var/log/kern.log Jun 16 22:34:19 nasty kernel: block drbd0: Starting worker thread (from cqueue [8513]) Jun 16 22:34:19 nasty kernel: block drbd0: disk( Diskless -> Attaching ) Jun 16 22:34:19 nasty kernel: block drbd0: No usable activity log found. Jun 16 22:34:19 nasty kernel: block drbd0: Method to ensure write ordering: barrier Jun 16 22:34:19 nasty kernel: block drbd0: max_segment_size ( = BIO size ) = 32768 Jun 16 22:34:19 nasty kernel: block drbd0: drbd_bm_resize called with capacity == 208696 Jun 16 22:34:19 nasty kernel: block drbd0: resync bitmap: bits=26087 words=816 Jun 16 22:34:19 nasty kernel: block drbd0: size = 102 MB (104348 KB) Jun 16 22:34:19 nasty kernel: block drbd0: Writing the whole bitmap, size changed Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked out-of-sync by on disk bit-map. Jun 16 22:34:20 nasty kernel: block drbd0: recounting of set bits took additional 0 jiffies Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked out-of-sync by on disk bit-map. Jun 16 22:34:20 nasty kernel: block drbd0: disk( Attaching -> Inconsistent ) Jun 16 22:34:20 nasty kernel: block drbd0: Unknown tag: 3526 As you can see, this is not good! If I try the verbose method; nasty:~# drbdadm attach r0 </var/log/kern.log Jun 16 22:38:05 nasty kernel: block drbd0: Starting worker thread (from cqueue [8513]) Jun 16 22:38:05 nasty kernel: block drbd0: disk( Diskless -> Attaching ) Jun 16 22:38:05 nasty kernel: block drbd0: No usable activity log found. Jun 16 22:38:05 nasty kernel: block drbd0: Method to ensure write ordering: barrier Jun 16 22:38:05 nasty kernel: block drbd0: max_segment_size ( = BIO size ) = 32768 Jun 16 22:38:05 nasty kernel: block drbd0: drbd_bm_resize called with capacity == 208696 Jun 16 22:38:05 nasty kernel: block drbd0: resync bitmap: bits=26087 words=816 Jun 16 22:38:05 nasty kernel: block drbd0: size = 102 MB (104348 KB) Jun 16 22:38:05 nasty kernel: block drbd0: recounting of set bits took additional 0 jiffies Jun 16 22:38:05 nasty kernel: block drbd0: 102 MB (26087 bits) marked out-of-sync by on disk bit-map. Jun 16 22:38:05 nasty kernel: block drbd0: disk( Attaching -> Inconsistent ) nasty:~# cat /proc/drbd version: 8.3.2rc1 (api:88/proto:86-90) GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, 2009-06-16 22:04:24 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r---- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348 nasty:~# drbdadm syncer r0 nasty:~# drbdadm connect r0 </var/log/kern.log Jun 16 22:40:35 nasty kernel: block drbd0: conn( StandAlone -> Unconnected ) Jun 16 22:40:35 nasty kernel: block drbd0: Starting receiver thread (from drbd0_worker [1370]) Jun 16 22:40:35 nasty kernel: block drbd0: receiver (re)started Jun 16 22:40:35 nasty kernel: block drbd0: conn( Unconnected -> WFConnection ) Jun 16 22:40:36 nasty kernel: block drbd0: Handshake successful: Agreed network protocol version 90 Jun 16 22:40:36 nasty kernel: block drbd0: conn( WFConnection -> WFReportParams ) Jun 16 22:40:36 nasty kernel: block drbd0: Starting asender thread (from drbd0_receiver [1424]) Jun 16 22:40:36 nasty kernel: block drbd0: data-integrity-alg: <not-used> Jun 16 22:40:36 nasty kernel: block drbd0: drbd_sync_handshake: Jun 16 22:40:36 nasty kernel: block drbd0: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:26087 flags:0 Jun 16 22:40:36 nasty kernel: block drbd0: peer 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:26087 flags:0 Jun 16 22:40:36 nasty kernel: block drbd0: uuid_compare()=0 by rule 1 Jun 16 22:40:36 nasty kernel: block drbd0: No resync, but 26087 bits in bitmap! Jun 16 22:40:36 nasty kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent ) nasty:~# cat /proc/drbd version: 8.3.2rc1 (api:88/proto:86-90) GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, 2009-06-16 22:04:24 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r---- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348 nasty:~# drbdadm -- --overwrite-data-of-peer primary r0 0: State change failed: (-2) Refusing to be Primary without at least one UpToDate disk Command 'drbdsetup 0 primary --overwrite-data-of-peer' terminated with exit code 17 Jun 16 22:43:11 nasty kernel: block drbd0: State change failed: Refusing to be Primary without at least one UpToDate disk Jun 16 22:43:11 nasty kernel: block drbd0: state = { cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent r--- } Jun 16 22:43:11 nasty kernel: block drbd0: wanted = { cs:Connected ro:Primary/Secondary ds:Inconsistent/Inconsistent r--- } The behaviour is the same from whichever end I run the last command, and the logs are essentially the same at boths ends. I am confident my network (firewall) connection is valid, it was working with the previous (patched) version, this morning. Is there any more information you would like from me. I have some time to do some more debugging tomorrow, if you could give me a clue where I should start looking? Regards Nick Philipp Reisner wrote: > On Monday 15 June 2009 23:21:51 you wrote: > >> Hi, >> >> With the latest patch, things are not working very well at all! >> >> >> It think the problem may lie here: >> >> >> #define put_unaligned(val, ptr) ({ \ >> + typeof(val) v; \ >> + switch (sizeof(*(ptr))) { \ >> + case 1: \ >> + *(uint8_t *)(ptr) = (uint8_t)(val); \ >> + break; \ >> + case 2: \ >> + case 4: \ >> + case 8: \ >> + v = val; \ >> + memcpy(ptr, &v, sizeof(*(ptr))); \ >> + break; \ >> + default: \ >> + __bad_unaligned_access_size(); \ >> + break; \ >> + } \ >> + (void)0; }) >> + >> >> >> When called with >> >> put_unaligned(tag, tl->tag_list_cpos++); >> >> ptr is evaluated in the switch statement, and twice in the memcpy call, >> and hence the post-increment will be done three times ....not what we want! >> >> Maybe an inline function, or taking a copy of ptr if safer here >> >> > > Hi Nick, > > No, typeof(), sizeof() do not evaluate the expression at runtime. They > just deliver the type or the size at compile time. But in that macro > there was an error with the type conversation. > > I have again, attached a patch, that should fix the issue. Please verify. > >