Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Jun 17, 2009 at 07:12:32PM +0100, Nick Liebmann wrote: > Hello again Philipp, > > I must be going mad........ :) > I have now constructed a simple test-case, which highlights the error in > the code, however I cannot actually see what is wrong with the code... I > can make the code work, however the fix, should not be necessary, as far > as I can tell, and is actually incorrect! The first thread in a long time that may be qualified for drbd-dev ;) but anyways, lets annoy the drbd-user crowd a bit longer, still. > My test scenario is as follows. > > I have DRBD running and waiting for a connection on a known to be > working peer. (Previous working patched DRBD) > > I start DRBD at the local end...the resource has never been synced, so > if everything connected OK, I expect > > cat /proc/drbd to show > ...... ds:Inconsistent/Inconsistent > > however when things are not working it shows > > ....... ds:Inconsistent/DUnknown > > > I have reverted the source of drbdsetup.c back to before any of the > put_unaligned and get_unaligned calls, and am concentrating on the > add_tag functions, since this is where I found reverting back to made it > work again. > > I have modified add_tag as follows, with some debug code > > ------------- > //Store the current position in the data stream > unsigned short int *startPos = tl->tag_list_cpos; > #define USE_MEMCPY > #ifdef USE_MEMCPY > //This is the memcpy used in the previous patch... > //note the size will resolve to 4 (which is incorrect really, since > the data is a short) > memcpy(tl->tag_list_cpos++, &tag, sizeof(tl->tag_list_cpos)); > memcpy(tl->tag_list_cpos++, &data_len, sizeof(tl->tag_list_cpos)); > #else > put_unaligned(tag, tl->tag_list_cpos++); > put_unaligned(data_len, tl->tag_list_cpos++); > #endif did you know that according to "the internet", http://forums.arm.com/index.php?showtopic=12717 ARM cores from architecture ARMv4 or ARMv5 convert an unaligned load into an aligned load with rotate (so LDR from 0x1 loads 0x1 0x2 0x3 0x0 into a register). (read that paragraph again. is that broken, or what.) (does apply to stores as well.) maybe look at the assembler code generated. hmmm. we definetly should move to drbd-dev, if we start throwing assembler listings around. there is also "echo 2 > /proc/cpu/alignment" (or some such). though I'm not sure whether that works on your box. what exactly is your compiler? > The output is as follows: > > nasty:~# sdiff /tmp/out.broken /tmp/out.working > TAG (e003) LEN(000a) ALIGNED 03e00a00 TAG (e003) LEN(000a) ALIGNED 03e00a00 > TAG (e004) LEN(000a) ALIGNED 04e00a00 TAG (e004) LEN(000a) ALIGNED 04e00a00 > TAG (2005) LEN(0004) ALIGNED 05200400 TAG (2005) LEN(0004) ALIGNED 05200400 > TAG (0000) LEN(0000) ALIGNED 00000000 TAG (0000) LEN(0000) ALIGNED 00000000 > TAG (001e) LEN(0004) ALIGNED 1e000400 TAG (001e) LEN(0004) ALIGNED 1e000400 > TAG (0000) LEN(0000) ALIGNED 00000000 TAG (0000) LEN(0000) ALIGNED 00000000 > TAG (e008) LEN(0010) ALIGNED 08e01000 TAG (e008) LEN(0010) ALIGNED 08e01000 > TAG (e009) LEN(0010) ALIGNED 09e01000 TAG (e009) LEN(0010) ALIGNED 09e01000 > TAG (200f) LEN(0004) ALIGNED 0f200400 TAG (200f) LEN(0004) ALIGNED 0f200400 > TAG (801c) LEN(0001) ALIGNED 1c800100 TAG (801c) LEN(0001) ALIGNED 1c800100 > TAG (0018) LEN(0004) UNALIGNED 00040000 | TAG (0018) LEN(0004) UNALIGNED 18000400 > TAG (0019) LEN(0004) UNALIGNED 00040000 | TAG (0019) LEN(0004) UNALIGNED 19000400 > TAG (001a) LEN(0004) UNALIGNED 00040004 | TAG (001a) LEN(0004) UNALIGNED 1a000400 > TAG (0000) LEN(0000) UNALIGNED 00000000 TAG (0000) LEN(0000) UNALIGNED 00000000 > TAG (0000) LEN(0000) ALIGNED 00000000 TAG (0000) LEN(0000) ALIGNED 00000000 > > As you can see, the 3 unaligned writes are different, in fact we are > back to the same problem I had originally. > > Since put_unaligned resolves to a memcpy ,the only difference is the > size that is passed to memcpy, you pass sizeof(val) (which is 2 (short) > and is absolutely the correct value to pass I believe), > in my version I mistakenly passed sizeof(tl->tag_list_cpos) which is 4 > (a short *), totally incorrect. and what happens to the other two bytes? they just get overwritten by the following memcpy. my current guess is that that somehow your compiler generates from the memcpy for size=2 one of these rotating store assembler instructions. > Anyway, If I modify put_aligned accordingly, it starts working. but may copy garbage for the last tag, so the tag list could become incorrectly closed, and when reading it, we could walk off the end? > I am afraid my brain is too addled to actually work out what on earth is > going on here....I am sure I used to understand this stuff. > > Can you offer me any pointers (excuse the pun). If you feel you would > like to look into this futher, and it would help, I can offer you a > (non-root) shell account on one of these machines, good for testing > code. I will probably knock up a standalone test to make this a bit > simpler to verify > > Regards > > Nick -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed