Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Philip, Just a little follow-up on this, with some additional ideas. It strikes me that the new put/get unaligned macros defined in unaligned.h, may not be preserving the endianness of the data being copied. The equivalent calls in kernelspace, do not use a memcpy or similar, they use byte access, and bit shifting. I could not find any userland equivalents however, that always 'do the right thing' based on the platform. I have not yet had time to test this theory, but will try something later in the day Another issue that I wondered about is that if DRBD is sending this data across the wire (peer to peer), it may be communicating between machines of different architectures (big-endian <-> little-endian), such that DRBD should 'standardize' on the endianess of these data structures, to enable this. Apologies if I have opened a 'can of worms', and thank you for your help on this Regards Nick Nick Liebmann wrote: > Hi Philipp, > > I have spent a bit more time looking into this. I am now working from > the latest version of the git repository, which seems to have had your > changes applied. > > I have gone back to basics, since I was concered I was feeding you > mis-information. > > > I have built the userland binaries, and the kernel module from the > latest source (git HEAD, no patches) > > I have copied them onto both machines (both exactly the same machines) > > My DRBD config is as follows: > > *********************** > global { > usage-count yes; > } > > common { > syncer { > rate 10M; > } > } > > resource r0 { > protocol C; > device /dev/drbd0; > meta-disk internal; > > net { > allow-two-primaries; > after-sb-0pri discard-zero-changes; > after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > } > > startup { > # become-primary-on both; > } > > on nasty { > address XXX.XXX.XXX.XXX:7788; > disk /dev/sda7; > } > > on coral { > address XXX.XXX.XXX.XXX:7788; > disk /dev/sda7; > } > } > ***************************** > > I am running the following commands on both machines > > dd if=/dev/zero of=/dev/sda7 bs=10M > > nasty:~# drbdadm create-md r0 > Writing meta data... > initializing activity log > NOT initialized bitmap > New drbd meta data block successfully created. > success > > nasty:~# modprobe drbd > </var/log/kern.log > Jun 16 22:32:14 nasty kernel: drbd: initialised. Version: 8.3.2rc1 > (api:88/proto:86-90) > Jun 16 22:32:14 nasty kernel: drbd: GIT-hash: > df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, 2009-06-16 > 22:04:24 > Jun 16 22:32:14 nasty kernel: drbd: registered as block device major 147 > Jun 16 22:32:14 nasty kernel: drbd: minor_table @ 0xc6f80620 > > nasty:~# drbdadm up r0 > 0: Failure: (126) UnknownMandatoryTag > Command 'drbdsetup 0 net XXX.XXX.XXX.XXX:7788 XXX.XXX.XXX.XXX:7788 C > --set-defaults --create-device --allow-two-primaries > --after-sb-0pri=discard-zero-changes --after-sb-1pri=discard-secondary > --after-sb-2pri=disconnect' terminated with exit code 10 > > </var/log/kern.log > Jun 16 22:34:19 nasty kernel: block drbd0: Starting worker thread (from > cqueue [8513]) > Jun 16 22:34:19 nasty kernel: block drbd0: disk( Diskless -> Attaching ) > Jun 16 22:34:19 nasty kernel: block drbd0: No usable activity log found. > Jun 16 22:34:19 nasty kernel: block drbd0: Method to ensure write > ordering: barrier > Jun 16 22:34:19 nasty kernel: block drbd0: max_segment_size ( = BIO size > ) = 32768 > Jun 16 22:34:19 nasty kernel: block drbd0: drbd_bm_resize called with > capacity == 208696 > Jun 16 22:34:19 nasty kernel: block drbd0: resync bitmap: bits=26087 > words=816 > Jun 16 22:34:19 nasty kernel: block drbd0: size = 102 MB (104348 KB) > Jun 16 22:34:19 nasty kernel: block drbd0: Writing the whole bitmap, > size changed > Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked > out-of-sync by on disk bit-map. > Jun 16 22:34:20 nasty kernel: block drbd0: recounting of set bits took > additional 0 jiffies > Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked > out-of-sync by on disk bit-map. > Jun 16 22:34:20 nasty kernel: block drbd0: disk( Attaching -> > Inconsistent ) > Jun 16 22:34:20 nasty kernel: block drbd0: Unknown tag: 3526 > > As you can see, this is not good! > > If I try the verbose method; > > nasty:~# drbdadm attach r0 > </var/log/kern.log > Jun 16 22:38:05 nasty kernel: block drbd0: Starting worker thread (from > cqueue [8513]) > Jun 16 22:38:05 nasty kernel: block drbd0: disk( Diskless -> Attaching ) > Jun 16 22:38:05 nasty kernel: block drbd0: No usable activity log found. > Jun 16 22:38:05 nasty kernel: block drbd0: Method to ensure write > ordering: barrier > Jun 16 22:38:05 nasty kernel: block drbd0: max_segment_size ( = BIO size > ) = 32768 > Jun 16 22:38:05 nasty kernel: block drbd0: drbd_bm_resize called with > capacity == 208696 > Jun 16 22:38:05 nasty kernel: block drbd0: resync bitmap: bits=26087 > words=816 > Jun 16 22:38:05 nasty kernel: block drbd0: size = 102 MB (104348 KB) > Jun 16 22:38:05 nasty kernel: block drbd0: recounting of set bits took > additional 0 jiffies > Jun 16 22:38:05 nasty kernel: block drbd0: 102 MB (26087 bits) marked > out-of-sync by on disk bit-map. > Jun 16 22:38:05 nasty kernel: block drbd0: disk( Attaching -> > Inconsistent ) > > nasty:~# cat /proc/drbd > version: 8.3.2rc1 (api:88/proto:86-90) > GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, > 2009-06-16 22:04:24 > 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r---- > ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348 > > nasty:~# drbdadm syncer r0 > > nasty:~# drbdadm connect r0 > </var/log/kern.log > Jun 16 22:40:35 nasty kernel: block drbd0: conn( StandAlone -> > Unconnected ) > Jun 16 22:40:35 nasty kernel: block drbd0: Starting receiver thread > (from drbd0_worker [1370]) > Jun 16 22:40:35 nasty kernel: block drbd0: receiver (re)started > Jun 16 22:40:35 nasty kernel: block drbd0: conn( Unconnected -> > WFConnection ) > Jun 16 22:40:36 nasty kernel: block drbd0: Handshake successful: Agreed > network protocol version 90 > Jun 16 22:40:36 nasty kernel: block drbd0: conn( WFConnection -> > WFReportParams ) > Jun 16 22:40:36 nasty kernel: block drbd0: Starting asender thread (from > drbd0_receiver [1424]) > Jun 16 22:40:36 nasty kernel: block drbd0: data-integrity-alg: <not-used> > Jun 16 22:40:36 nasty kernel: block drbd0: drbd_sync_handshake: > Jun 16 22:40:36 nasty kernel: block drbd0: self > 0000000000000004:0000000000000000:0000000000000000:0000000000000000 > bits:26087 flags:0 > Jun 16 22:40:36 nasty kernel: block drbd0: peer > 0000000000000004:0000000000000000:0000000000000000:0000000000000000 > bits:26087 flags:0 > Jun 16 22:40:36 nasty kernel: block drbd0: uuid_compare()=0 by rule 1 > Jun 16 22:40:36 nasty kernel: block drbd0: No resync, but 26087 bits in > bitmap! > Jun 16 22:40:36 nasty kernel: block drbd0: peer( Unknown -> Secondary ) > conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent ) > > nasty:~# cat /proc/drbd > version: 8.3.2rc1 (api:88/proto:86-90) > GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, > 2009-06-16 22:04:24 > 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r---- > ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348 > > > nasty:~# drbdadm -- --overwrite-data-of-peer primary r0 > 0: State change failed: (-2) Refusing to be Primary without at least one > UpToDate disk > Command 'drbdsetup 0 primary --overwrite-data-of-peer' terminated with > exit code 17 > > Jun 16 22:43:11 nasty kernel: block drbd0: State change failed: Refusing > to be Primary without at least one UpToDate disk > Jun 16 22:43:11 nasty kernel: block drbd0: state = { cs:Connected > ro:Secondary/Secondary ds:Inconsistent/Inconsistent r--- } > Jun 16 22:43:11 nasty kernel: block drbd0: wanted = { cs:Connected > ro:Primary/Secondary ds:Inconsistent/Inconsistent r--- } > > The behaviour is the same from whichever end I run the last command, and > the logs are essentially the same at boths ends. > > I am confident my network (firewall) connection is valid, it was working > with the previous (patched) version, this morning. > > Is there any more information you would like from me. > > I have some time to do some more debugging tomorrow, if you could give > me a clue where I should start looking? > > Regards > > Nick > > > > > > > > Philipp Reisner wrote: > >> On Monday 15 June 2009 23:21:51 you wrote: >> >> >>> Hi, >>> >>> With the latest patch, things are not working very well at all! >>> >>> >>> It think the problem may lie here: >>> >>> >>> #define put_unaligned(val, ptr) ({ \ >>> + typeof(val) v; \ >>> + switch (sizeof(*(ptr))) { \ >>> + case 1: \ >>> + *(uint8_t *)(ptr) = (uint8_t)(val); \ >>> + break; \ >>> + case 2: \ >>> + case 4: \ >>> + case 8: \ >>> + v = val; \ >>> + memcpy(ptr, &v, sizeof(*(ptr))); \ >>> + break; \ >>> + default: \ >>> + __bad_unaligned_access_size(); \ >>> + break; \ >>> + } \ >>> + (void)0; }) >>> + >>> >>> >>> When called with >>> >>> put_unaligned(tag, tl->tag_list_cpos++); >>> >>> ptr is evaluated in the switch statement, and twice in the memcpy call, >>> and hence the post-increment will be done three times ....not what we want! >>> >>> Maybe an inline function, or taking a copy of ptr if safer here >>> >>> >>> >> Hi Nick, >> >> No, typeof(), sizeof() do not evaluate the expression at runtime. They >> just deliver the type or the size at compile time. But in that macro >> there was an error with the type conversation. >> >> I have again, attached a patch, that should fix the issue. Please verify. >> >> >> > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > >