Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Philip,
Just a little follow-up on this, with some additional ideas.
It strikes me that the new put/get unaligned macros defined in
unaligned.h, may not be preserving the endianness of the data being
copied. The equivalent calls in kernelspace, do not use a memcpy or
similar, they use byte access, and bit shifting. I could not find any
userland equivalents however, that always 'do the right thing' based on
the platform.
I have not yet had time to test this theory, but will try something
later in the day
Another issue that I wondered about is that if DRBD is sending this data
across the wire (peer to peer), it may be communicating between machines
of different architectures (big-endian <-> little-endian), such that
DRBD should 'standardize' on the endianess of these data structures, to
enable this.
Apologies if I have opened a 'can of worms', and thank you for your help
on this
Regards
Nick
Nick Liebmann wrote:
> Hi Philipp,
>
> I have spent a bit more time looking into this. I am now working from
> the latest version of the git repository, which seems to have had your
> changes applied.
>
> I have gone back to basics, since I was concered I was feeding you
> mis-information.
>
>
> I have built the userland binaries, and the kernel module from the
> latest source (git HEAD, no patches)
>
> I have copied them onto both machines (both exactly the same machines)
>
> My DRBD config is as follows:
>
> ***********************
> global {
> usage-count yes;
> }
>
> common {
> syncer {
> rate 10M;
> }
> }
>
> resource r0 {
> protocol C;
> device /dev/drbd0;
> meta-disk internal;
>
> net {
> allow-two-primaries;
> after-sb-0pri discard-zero-changes;
> after-sb-1pri discard-secondary;
> after-sb-2pri disconnect;
> }
>
> startup {
> # become-primary-on both;
> }
>
> on nasty {
> address XXX.XXX.XXX.XXX:7788;
> disk /dev/sda7;
> }
>
> on coral {
> address XXX.XXX.XXX.XXX:7788;
> disk /dev/sda7;
> }
> }
> *****************************
>
> I am running the following commands on both machines
>
> dd if=/dev/zero of=/dev/sda7 bs=10M
>
> nasty:~# drbdadm create-md r0
> Writing meta data...
> initializing activity log
> NOT initialized bitmap
> New drbd meta data block successfully created.
> success
>
> nasty:~# modprobe drbd
> </var/log/kern.log
> Jun 16 22:32:14 nasty kernel: drbd: initialised. Version: 8.3.2rc1
> (api:88/proto:86-90)
> Jun 16 22:32:14 nasty kernel: drbd: GIT-hash:
> df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, 2009-06-16
> 22:04:24
> Jun 16 22:32:14 nasty kernel: drbd: registered as block device major 147
> Jun 16 22:32:14 nasty kernel: drbd: minor_table @ 0xc6f80620
>
> nasty:~# drbdadm up r0
> 0: Failure: (126) UnknownMandatoryTag
> Command 'drbdsetup 0 net XXX.XXX.XXX.XXX:7788 XXX.XXX.XXX.XXX:7788 C
> --set-defaults --create-device --allow-two-primaries
> --after-sb-0pri=discard-zero-changes --after-sb-1pri=discard-secondary
> --after-sb-2pri=disconnect' terminated with exit code 10
>
> </var/log/kern.log
> Jun 16 22:34:19 nasty kernel: block drbd0: Starting worker thread (from
> cqueue [8513])
> Jun 16 22:34:19 nasty kernel: block drbd0: disk( Diskless -> Attaching )
> Jun 16 22:34:19 nasty kernel: block drbd0: No usable activity log found.
> Jun 16 22:34:19 nasty kernel: block drbd0: Method to ensure write
> ordering: barrier
> Jun 16 22:34:19 nasty kernel: block drbd0: max_segment_size ( = BIO size
> ) = 32768
> Jun 16 22:34:19 nasty kernel: block drbd0: drbd_bm_resize called with
> capacity == 208696
> Jun 16 22:34:19 nasty kernel: block drbd0: resync bitmap: bits=26087
> words=816
> Jun 16 22:34:19 nasty kernel: block drbd0: size = 102 MB (104348 KB)
> Jun 16 22:34:19 nasty kernel: block drbd0: Writing the whole bitmap,
> size changed
> Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked
> out-of-sync by on disk bit-map.
> Jun 16 22:34:20 nasty kernel: block drbd0: recounting of set bits took
> additional 0 jiffies
> Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked
> out-of-sync by on disk bit-map.
> Jun 16 22:34:20 nasty kernel: block drbd0: disk( Attaching ->
> Inconsistent )
> Jun 16 22:34:20 nasty kernel: block drbd0: Unknown tag: 3526
>
> As you can see, this is not good!
>
> If I try the verbose method;
>
> nasty:~# drbdadm attach r0
> </var/log/kern.log
> Jun 16 22:38:05 nasty kernel: block drbd0: Starting worker thread (from
> cqueue [8513])
> Jun 16 22:38:05 nasty kernel: block drbd0: disk( Diskless -> Attaching )
> Jun 16 22:38:05 nasty kernel: block drbd0: No usable activity log found.
> Jun 16 22:38:05 nasty kernel: block drbd0: Method to ensure write
> ordering: barrier
> Jun 16 22:38:05 nasty kernel: block drbd0: max_segment_size ( = BIO size
> ) = 32768
> Jun 16 22:38:05 nasty kernel: block drbd0: drbd_bm_resize called with
> capacity == 208696
> Jun 16 22:38:05 nasty kernel: block drbd0: resync bitmap: bits=26087
> words=816
> Jun 16 22:38:05 nasty kernel: block drbd0: size = 102 MB (104348 KB)
> Jun 16 22:38:05 nasty kernel: block drbd0: recounting of set bits took
> additional 0 jiffies
> Jun 16 22:38:05 nasty kernel: block drbd0: 102 MB (26087 bits) marked
> out-of-sync by on disk bit-map.
> Jun 16 22:38:05 nasty kernel: block drbd0: disk( Attaching ->
> Inconsistent )
>
> nasty:~# cat /proc/drbd
> version: 8.3.2rc1 (api:88/proto:86-90)
> GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty,
> 2009-06-16 22:04:24
> 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r----
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348
>
> nasty:~# drbdadm syncer r0
>
> nasty:~# drbdadm connect r0
> </var/log/kern.log
> Jun 16 22:40:35 nasty kernel: block drbd0: conn( StandAlone ->
> Unconnected )
> Jun 16 22:40:35 nasty kernel: block drbd0: Starting receiver thread
> (from drbd0_worker [1370])
> Jun 16 22:40:35 nasty kernel: block drbd0: receiver (re)started
> Jun 16 22:40:35 nasty kernel: block drbd0: conn( Unconnected ->
> WFConnection )
> Jun 16 22:40:36 nasty kernel: block drbd0: Handshake successful: Agreed
> network protocol version 90
> Jun 16 22:40:36 nasty kernel: block drbd0: conn( WFConnection ->
> WFReportParams )
> Jun 16 22:40:36 nasty kernel: block drbd0: Starting asender thread (from
> drbd0_receiver [1424])
> Jun 16 22:40:36 nasty kernel: block drbd0: data-integrity-alg: <not-used>
> Jun 16 22:40:36 nasty kernel: block drbd0: drbd_sync_handshake:
> Jun 16 22:40:36 nasty kernel: block drbd0: self
> 0000000000000004:0000000000000000:0000000000000000:0000000000000000
> bits:26087 flags:0
> Jun 16 22:40:36 nasty kernel: block drbd0: peer
> 0000000000000004:0000000000000000:0000000000000000:0000000000000000
> bits:26087 flags:0
> Jun 16 22:40:36 nasty kernel: block drbd0: uuid_compare()=0 by rule 1
> Jun 16 22:40:36 nasty kernel: block drbd0: No resync, but 26087 bits in
> bitmap!
> Jun 16 22:40:36 nasty kernel: block drbd0: peer( Unknown -> Secondary )
> conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent )
>
> nasty:~# cat /proc/drbd
> version: 8.3.2rc1 (api:88/proto:86-90)
> GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty,
> 2009-06-16 22:04:24
> 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348
>
>
> nasty:~# drbdadm -- --overwrite-data-of-peer primary r0
> 0: State change failed: (-2) Refusing to be Primary without at least one
> UpToDate disk
> Command 'drbdsetup 0 primary --overwrite-data-of-peer' terminated with
> exit code 17
>
> Jun 16 22:43:11 nasty kernel: block drbd0: State change failed: Refusing
> to be Primary without at least one UpToDate disk
> Jun 16 22:43:11 nasty kernel: block drbd0: state = { cs:Connected
> ro:Secondary/Secondary ds:Inconsistent/Inconsistent r--- }
> Jun 16 22:43:11 nasty kernel: block drbd0: wanted = { cs:Connected
> ro:Primary/Secondary ds:Inconsistent/Inconsistent r--- }
>
> The behaviour is the same from whichever end I run the last command, and
> the logs are essentially the same at boths ends.
>
> I am confident my network (firewall) connection is valid, it was working
> with the previous (patched) version, this morning.
>
> Is there any more information you would like from me.
>
> I have some time to do some more debugging tomorrow, if you could give
> me a clue where I should start looking?
>
> Regards
>
> Nick
>
>
>
>
>
>
>
> Philipp Reisner wrote:
>
>> On Monday 15 June 2009 23:21:51 you wrote:
>>
>>
>>> Hi,
>>>
>>> With the latest patch, things are not working very well at all!
>>>
>>>
>>> It think the problem may lie here:
>>>
>>>
>>> #define put_unaligned(val, ptr) ({ \
>>> + typeof(val) v; \
>>> + switch (sizeof(*(ptr))) { \
>>> + case 1: \
>>> + *(uint8_t *)(ptr) = (uint8_t)(val); \
>>> + break; \
>>> + case 2: \
>>> + case 4: \
>>> + case 8: \
>>> + v = val; \
>>> + memcpy(ptr, &v, sizeof(*(ptr))); \
>>> + break; \
>>> + default: \
>>> + __bad_unaligned_access_size(); \
>>> + break; \
>>> + } \
>>> + (void)0; })
>>> +
>>>
>>>
>>> When called with
>>>
>>> put_unaligned(tag, tl->tag_list_cpos++);
>>>
>>> ptr is evaluated in the switch statement, and twice in the memcpy call,
>>> and hence the post-increment will be done three times ....not what we want!
>>>
>>> Maybe an inline function, or taking a copy of ptr if safer here
>>>
>>>
>>>
>> Hi Nick,
>>
>> No, typeof(), sizeof() do not evaluate the expression at runtime. They
>> just deliver the type or the size at compile time. But in that macro
>> there was an error with the type conversation.
>>
>> I have again, attached a patch, that should fix the issue. Please verify.
>>
>>
>>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>