[DRBD-user] DRBD - on ARM (armel)

Nick Liebmann nick at ukmail.org
Wed Jun 17 20:12:32 CEST 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello again Philipp,

I must be going mad........

I have now constructed a simple test-case, which highlights the error in 
the code, however I cannot actually see what is wrong with the code... I 
can make the code work, however the fix, should not be necessary, as far 
as I can tell, and is actually incorrect!



My test scenario is as follows.

I have DRBD running and waiting for a connection on a known to be 
working peer. (Previous working patched DRBD)

I start DRBD at the local end...the resource has never been synced, so 
if everything connected OK, I expect

cat /proc/drbd to show
......                                      ds:Inconsistent/Inconsistent

however when things are not working it shows

.......                                    ds:Inconsistent/DUnknown


I have reverted the source of drbdsetup.c back to before any of the 
put_unaligned and get_unaligned calls, and am concentrating on the 
add_tag functions, since this is where I found reverting back to made it 
work again.

I have modified add_tag as follows, with some debug code

-------------
    //Store the current position in the data stream
    unsigned short int *startPos = tl->tag_list_cpos;
#define USE_MEMCPY
#ifdef USE_MEMCPY
    //This is the memcpy used in the previous patch...
    //note the size will resolve to 4 (which is incorrect really, since 
the data is a short)
    memcpy(tl->tag_list_cpos++, &tag, sizeof(tl->tag_list_cpos));
    memcpy(tl->tag_list_cpos++, &data_len, sizeof(tl->tag_list_cpos));
#else
    put_unaligned(tag, tl->tag_list_cpos++);
    put_unaligned(data_len, tl->tag_list_cpos++);
#endif

    //Print the bytes that we have just written into the stream
    {
        int nPos = 0;
        //Format is TAG (XXXX) LEN (YYYY) [UN]ALIGNED
        fprintf(stderr, "TAG (%4.4x) LEN(%4.4x) %sALIGNED ", tag & 
0x0000ffff, data_len, ((int)startPos % 2) == 1 ? "UN" : "");
        for(nPos = 0; nPos < (char*)tl->tag_list_cpos - (char*)startPos; 
nPos++)
        {
            fprintf(stderr, "%2.2x", ((char*)startPos)[nPos]);
        }
        fprintf(stderr, "\n");
    }

-----------------

This allows me to check whether the bytes are written into the data 
stream effectively...


The output is as follows:

nasty:~# sdiff /tmp/out.broken /tmp/out.working
TAG (e003) LEN(000a) ALIGNED 03e00a00                TAG (e003) 
LEN(000a) ALIGNED 03e00a00
TAG (e004) LEN(000a) ALIGNED 04e00a00                TAG (e004) 
LEN(000a) ALIGNED 04e00a00
TAG (2005) LEN(0004) ALIGNED 05200400                TAG (2005) 
LEN(0004) ALIGNED 05200400
TAG (0000) LEN(0000) ALIGNED 00000000                TAG (0000) 
LEN(0000) ALIGNED 00000000
TAG (001e) LEN(0004) ALIGNED 1e000400                TAG (001e) 
LEN(0004) ALIGNED 1e000400
TAG (0000) LEN(0000) ALIGNED 00000000                TAG (0000) 
LEN(0000) ALIGNED 00000000
TAG (e008) LEN(0010) ALIGNED 08e01000                TAG (e008) 
LEN(0010) ALIGNED 08e01000
TAG (e009) LEN(0010) ALIGNED 09e01000                TAG (e009) 
LEN(0010) ALIGNED 09e01000
TAG (200f) LEN(0004) ALIGNED 0f200400                TAG (200f) 
LEN(0004) ALIGNED 0f200400
TAG (801c) LEN(0001) ALIGNED 1c800100                TAG (801c) 
LEN(0001) ALIGNED 1c800100
TAG (0018) LEN(0004) UNALIGNED 00040000                  |    TAG (0018) 
LEN(0004) UNALIGNED 18000400
TAG (0019) LEN(0004) UNALIGNED 00040000                  |    TAG (0019) 
LEN(0004) UNALIGNED 19000400
TAG (001a) LEN(0004) UNALIGNED 00040004                  |    TAG (001a) 
LEN(0004) UNALIGNED 1a000400
TAG (0000) LEN(0000) UNALIGNED 00000000                TAG (0000) 
LEN(0000) UNALIGNED 00000000
TAG (0000) LEN(0000) ALIGNED 00000000                TAG (0000) 
LEN(0000) ALIGNED 00000000

As you can see, the 3 unaligned writes are different, in fact we are 
back to the same problem I had originally.

Since put_unaligned resolves to a memcpy ,the only difference is the 
size that is passed to memcpy, you pass sizeof(val) (which is 2 (short) 
and is absolutely the correct value to pass I believe),
in my version I mistakenly passed sizeof(tl->tag_list_cpos) which is 4 
(a short *), totally incorrect.

Anyway, If I modify put_aligned accordingly, it starts working.

I am afraid my brain is too addled to actually work out what on earth is 
going on here....I am sure I used to understand this stuff.

Can you offer me any pointers (excuse the pun). If you feel you would 
like to look into this futher, and it would help, I can offer you a 
(non-root) shell account on one of these machines, good for testing 
code. I will probably knock up a standalone test to make this a bit 
simpler to verify

Regards

Nick






Nick Liebmann wrote:
> Philipp Reisner wrote:
>   
>> On Tuesday 16 June 2009 23:49:10 Nick Liebmann wrote:
>>   
>>     
>>> Hi Philipp,
>>>
>>> I have spent a bit more time looking into this. I am now working from
>>> the latest version of the git repository, which seems to have had your
>>> changes applied.
>>>
>>> I have gone back to basics, since I was concered I was feeding you
>>> mis-information.
>>>     
>>>       
>> [...]
>>
>> Please correct me if I got something wrong:
>> * You used the latest sources from GIT
>> * If you use the "drbdadm up" command you get 
>>
>> 0: Failure: (126) UnknownMandatoryTag
>>  Command 'drbdsetup 0 net XXX.XXX.XXX.XXX:7788 XXX.XXX.XXX.XXX:7788 C
>>  --set-defaults --create-device --allow-two-primaries
>>  --after-sb-0pri=discard-zero-changes --after-sb-1pri=discard-secondary
>>  --after-sb-2pri=disconnect' terminated with exit code 10
>>
>> Jun 16 22:34:20 nasty kernel: block drbd0: Unknown tag: 3526
>>
>>
>> * If you use the drbdsetup commands one by one, it works.
>>   
>>     
>
> It works better.....
>
> However I am unable to issue this command
>
> nasty:~# drbdadm -- --overwrite-data-of-peer primary r0
> 0: State change failed: (-2) Refusing to be Primary without at least one 
> UpToDate disk
> Command 'drbdsetup 0 primary --overwrite-data-of-peer' terminated with 
> exit code 17
>
> Which worked on the previous version, and obviously is required in order 
> to do my first time sync.
> I wondered if the two issues were connected, and perhaps symptomatic of 
> another, underlying issue.
>
>
>   
>> That does not feel like an alignment issue, that feels more like a
>> race condition.
>>
>> -Phil
>>   
>>     
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
>   



More information about the drbd-user mailing list