[DRBD-user] DRBD - on ARM (armel)

Nick Liebmann nick at ukmail.org
Wed Jun 17 07:55:24 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Philip,

Just a little follow-up on this, with some additional ideas.

It strikes me that the new put/get unaligned macros defined in 
unaligned.h, may not be preserving the endianness of the data being 
copied. The equivalent calls in kernelspace, do not use a memcpy or 
similar, they use byte access, and bit shifting. I could not find any 
userland equivalents however, that always 'do the right thing' based on 
the platform.

I have not yet had time to test this theory, but will try something 
later in the day

Another issue that I wondered about is that if DRBD is sending this data 
across the wire (peer to peer), it may be communicating between machines 
of different architectures (big-endian <-> little-endian), such that 
DRBD should 'standardize' on the endianess of these data structures, to 
enable this.

Apologies if I have opened a 'can of worms', and thank you for your help 
on this

Regards

Nick

Nick Liebmann wrote:
> Hi Philipp,
>
> I have spent a bit more time looking into this. I am now working from 
> the latest version of the git repository, which seems to have had your 
> changes applied.
>
> I have gone back to basics, since I was concered I was feeding you 
> mis-information.
>
>
> I have built the userland binaries, and the kernel module from the 
> latest source (git HEAD, no patches)
>
> I have copied them onto both machines (both exactly the same machines)
>
> My DRBD config is as follows:
>
> ***********************
> global {
>     usage-count yes;
> }
>
> common {
>     syncer {
>         rate 10M;
>     }
> }
>
> resource r0 {
>   protocol C;
>   device     /dev/drbd0;
>   meta-disk  internal;
>
>   net {
>     allow-two-primaries;
>     after-sb-0pri discard-zero-changes;
>     after-sb-1pri discard-secondary;
>     after-sb-2pri disconnect;
>   }
>
>   startup {
> #    become-primary-on both;
>   }
>
>   on nasty {
>     address   XXX.XXX.XXX.XXX:7788;
>     disk       /dev/sda7;
>   }
>
>   on coral {
>     address   XXX.XXX.XXX.XXX:7788;
>     disk       /dev/sda7;
>   }
> }
> *****************************
>
> I am running the following commands on both machines
>
> dd if=/dev/zero of=/dev/sda7 bs=10M
>
> nasty:~# drbdadm create-md r0
> Writing meta data...
> initializing activity log
> NOT initialized bitmap
> New drbd meta data block successfully created.
> success
>
> nasty:~# modprobe drbd
> </var/log/kern.log
> Jun 16 22:32:14 nasty kernel: drbd: initialised. Version: 8.3.2rc1 
> (api:88/proto:86-90)
> Jun 16 22:32:14 nasty kernel: drbd: GIT-hash: 
> df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, 2009-06-16 
> 22:04:24
> Jun 16 22:32:14 nasty kernel: drbd: registered as block device major 147
> Jun 16 22:32:14 nasty kernel: drbd: minor_table @ 0xc6f80620
>
> nasty:~# drbdadm up r0
> 0: Failure: (126) UnknownMandatoryTag
> Command 'drbdsetup 0 net XXX.XXX.XXX.XXX:7788 XXX.XXX.XXX.XXX:7788 C 
> --set-defaults --create-device --allow-two-primaries 
> --after-sb-0pri=discard-zero-changes --after-sb-1pri=discard-secondary 
> --after-sb-2pri=disconnect' terminated with exit code 10
>
> </var/log/kern.log
> Jun 16 22:34:19 nasty kernel: block drbd0: Starting worker thread (from 
> cqueue [8513])
> Jun 16 22:34:19 nasty kernel: block drbd0: disk( Diskless -> Attaching )
> Jun 16 22:34:19 nasty kernel: block drbd0: No usable activity log found.
> Jun 16 22:34:19 nasty kernel: block drbd0: Method to ensure write 
> ordering: barrier
> Jun 16 22:34:19 nasty kernel: block drbd0: max_segment_size ( = BIO size 
> ) = 32768
> Jun 16 22:34:19 nasty kernel: block drbd0: drbd_bm_resize called with 
> capacity == 208696
> Jun 16 22:34:19 nasty kernel: block drbd0: resync bitmap: bits=26087 
> words=816
> Jun 16 22:34:19 nasty kernel: block drbd0: size = 102 MB (104348 KB)
> Jun 16 22:34:19 nasty kernel: block drbd0: Writing the whole bitmap, 
> size changed
> Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked 
> out-of-sync by on disk bit-map.
> Jun 16 22:34:20 nasty kernel: block drbd0: recounting of set bits took 
> additional 0 jiffies
> Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked 
> out-of-sync by on disk bit-map.
> Jun 16 22:34:20 nasty kernel: block drbd0: disk( Attaching -> 
> Inconsistent )
> Jun 16 22:34:20 nasty kernel: block drbd0: Unknown tag: 3526
>
> As you can see, this is not good!
>
> If I try the verbose method;
>
> nasty:~# drbdadm attach r0
> </var/log/kern.log
> Jun 16 22:38:05 nasty kernel: block drbd0: Starting worker thread (from 
> cqueue [8513])
> Jun 16 22:38:05 nasty kernel: block drbd0: disk( Diskless -> Attaching )
> Jun 16 22:38:05 nasty kernel: block drbd0: No usable activity log found.
> Jun 16 22:38:05 nasty kernel: block drbd0: Method to ensure write 
> ordering: barrier
> Jun 16 22:38:05 nasty kernel: block drbd0: max_segment_size ( = BIO size 
> ) = 32768
> Jun 16 22:38:05 nasty kernel: block drbd0: drbd_bm_resize called with 
> capacity == 208696
> Jun 16 22:38:05 nasty kernel: block drbd0: resync bitmap: bits=26087 
> words=816
> Jun 16 22:38:05 nasty kernel: block drbd0: size = 102 MB (104348 KB)
> Jun 16 22:38:05 nasty kernel: block drbd0: recounting of set bits took 
> additional 0 jiffies
> Jun 16 22:38:05 nasty kernel: block drbd0: 102 MB (26087 bits) marked 
> out-of-sync by on disk bit-map.
> Jun 16 22:38:05 nasty kernel: block drbd0: disk( Attaching -> 
> Inconsistent )
>
> nasty:~# cat /proc/drbd
> version: 8.3.2rc1 (api:88/proto:86-90)
> GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, 
> 2009-06-16 22:04:24
>  0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348
>
> nasty:~# drbdadm syncer r0
>
> nasty:~# drbdadm connect r0
> </var/log/kern.log
> Jun 16 22:40:35 nasty kernel: block drbd0: conn( StandAlone -> 
> Unconnected )
> Jun 16 22:40:35 nasty kernel: block drbd0: Starting receiver thread 
> (from drbd0_worker [1370])
> Jun 16 22:40:35 nasty kernel: block drbd0: receiver (re)started
> Jun 16 22:40:35 nasty kernel: block drbd0: conn( Unconnected -> 
> WFConnection )
> Jun 16 22:40:36 nasty kernel: block drbd0: Handshake successful: Agreed 
> network protocol version 90
> Jun 16 22:40:36 nasty kernel: block drbd0: conn( WFConnection -> 
> WFReportParams )
> Jun 16 22:40:36 nasty kernel: block drbd0: Starting asender thread (from 
> drbd0_receiver [1424])
> Jun 16 22:40:36 nasty kernel: block drbd0: data-integrity-alg: <not-used>
> Jun 16 22:40:36 nasty kernel: block drbd0: drbd_sync_handshake:
> Jun 16 22:40:36 nasty kernel: block drbd0: self 
> 0000000000000004:0000000000000000:0000000000000000:0000000000000000 
> bits:26087 flags:0
> Jun 16 22:40:36 nasty kernel: block drbd0: peer 
> 0000000000000004:0000000000000000:0000000000000000:0000000000000000 
> bits:26087 flags:0
> Jun 16 22:40:36 nasty kernel: block drbd0: uuid_compare()=0 by rule 1
> Jun 16 22:40:36 nasty kernel: block drbd0: No resync, but 26087 bits in 
> bitmap!
> Jun 16 22:40:36 nasty kernel: block drbd0: peer( Unknown -> Secondary ) 
> conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent )
>
> nasty:~# cat /proc/drbd
> version: 8.3.2rc1 (api:88/proto:86-90)
> GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, 
> 2009-06-16 22:04:24
>  0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348
>
>
> nasty:~# drbdadm -- --overwrite-data-of-peer primary r0
> 0: State change failed: (-2) Refusing to be Primary without at least one 
> UpToDate disk
> Command 'drbdsetup 0 primary --overwrite-data-of-peer' terminated with 
> exit code 17
>
> Jun 16 22:43:11 nasty kernel: block drbd0: State change failed: Refusing 
> to be Primary without at least one UpToDate disk
> Jun 16 22:43:11 nasty kernel: block drbd0:   state = { cs:Connected 
> ro:Secondary/Secondary ds:Inconsistent/Inconsistent r--- }
> Jun 16 22:43:11 nasty kernel: block drbd0:  wanted = { cs:Connected 
> ro:Primary/Secondary ds:Inconsistent/Inconsistent r--- }
>
> The behaviour is the same from whichever end I run the last command, and 
> the logs are essentially the same at boths ends.
>
> I am confident my network (firewall) connection is valid, it was working 
> with the previous (patched) version, this morning.
>
> Is there any more information you would like from me.
>
> I have some time to do some more debugging tomorrow, if you could give 
> me a clue where I should start looking?
>
> Regards
>
> Nick
>
>
>
>
>
>
>
> Philipp Reisner wrote:
>   
>> On Monday 15 June 2009 23:21:51 you wrote:
>>   
>>     
>>> Hi,
>>>
>>> With the latest patch, things are not working very well at all!
>>>
>>>
>>> It think the problem may lie here:
>>>
>>>
>>> #define put_unaligned(val, ptr) ({                    \
>>> +    typeof(val) v;                            \
>>> +    switch (sizeof(*(ptr))) {                    \
>>> +    case 1:                                \
>>> +        *(uint8_t *)(ptr) = (uint8_t)(val);            \
>>> +        break;                            \
>>> +    case 2:                                \
>>> +    case 4:                                \
>>> +    case 8:                                \
>>> +        v = val;                        \
>>> +        memcpy(ptr, &v, sizeof(*(ptr)));            \
>>> +        break;                            \
>>> +    default:                            \
>>> +        __bad_unaligned_access_size();                \
>>> +        break;                            \
>>> +    }                                \
>>> +    (void)0; })
>>> +
>>>
>>>
>>> When called with
>>>
>>> put_unaligned(tag, tl->tag_list_cpos++);
>>>
>>> ptr is evaluated in the switch statement, and twice in the memcpy call,
>>> and hence the post-increment will be done three times ....not what we want!
>>>
>>> Maybe an inline function, or taking a copy of ptr if safer here
>>>
>>>     
>>>       
>> Hi Nick,
>>
>> No, typeof(), sizeof() do not evaluate the expression at runtime. They
>> just deliver the type or the size at compile time. But in that macro 
>> there was an error with the type conversation.
>>
>> I have again, attached a patch, that should fix the issue. Please verify.
>>
>>   
>>     
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>   



More information about the drbd-user mailing list