Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello again Philipp, I must be going mad........ I have now constructed a simple test-case, which highlights the error in the code, however I cannot actually see what is wrong with the code... I can make the code work, however the fix, should not be necessary, as far as I can tell, and is actually incorrect! My test scenario is as follows. I have DRBD running and waiting for a connection on a known to be working peer. (Previous working patched DRBD) I start DRBD at the local end...the resource has never been synced, so if everything connected OK, I expect cat /proc/drbd to show ...... ds:Inconsistent/Inconsistent however when things are not working it shows ....... ds:Inconsistent/DUnknown I have reverted the source of drbdsetup.c back to before any of the put_unaligned and get_unaligned calls, and am concentrating on the add_tag functions, since this is where I found reverting back to made it work again. I have modified add_tag as follows, with some debug code ------------- //Store the current position in the data stream unsigned short int *startPos = tl->tag_list_cpos; #define USE_MEMCPY #ifdef USE_MEMCPY //This is the memcpy used in the previous patch... //note the size will resolve to 4 (which is incorrect really, since the data is a short) memcpy(tl->tag_list_cpos++, &tag, sizeof(tl->tag_list_cpos)); memcpy(tl->tag_list_cpos++, &data_len, sizeof(tl->tag_list_cpos)); #else put_unaligned(tag, tl->tag_list_cpos++); put_unaligned(data_len, tl->tag_list_cpos++); #endif //Print the bytes that we have just written into the stream { int nPos = 0; //Format is TAG (XXXX) LEN (YYYY) [UN]ALIGNED fprintf(stderr, "TAG (%4.4x) LEN(%4.4x) %sALIGNED ", tag & 0x0000ffff, data_len, ((int)startPos % 2) == 1 ? "UN" : ""); for(nPos = 0; nPos < (char*)tl->tag_list_cpos - (char*)startPos; nPos++) { fprintf(stderr, "%2.2x", ((char*)startPos)[nPos]); } fprintf(stderr, "\n"); } ----------------- This allows me to check whether the bytes are written into the data stream effectively... The output is as follows: nasty:~# sdiff /tmp/out.broken /tmp/out.working TAG (e003) LEN(000a) ALIGNED 03e00a00 TAG (e003) LEN(000a) ALIGNED 03e00a00 TAG (e004) LEN(000a) ALIGNED 04e00a00 TAG (e004) LEN(000a) ALIGNED 04e00a00 TAG (2005) LEN(0004) ALIGNED 05200400 TAG (2005) LEN(0004) ALIGNED 05200400 TAG (0000) LEN(0000) ALIGNED 00000000 TAG (0000) LEN(0000) ALIGNED 00000000 TAG (001e) LEN(0004) ALIGNED 1e000400 TAG (001e) LEN(0004) ALIGNED 1e000400 TAG (0000) LEN(0000) ALIGNED 00000000 TAG (0000) LEN(0000) ALIGNED 00000000 TAG (e008) LEN(0010) ALIGNED 08e01000 TAG (e008) LEN(0010) ALIGNED 08e01000 TAG (e009) LEN(0010) ALIGNED 09e01000 TAG (e009) LEN(0010) ALIGNED 09e01000 TAG (200f) LEN(0004) ALIGNED 0f200400 TAG (200f) LEN(0004) ALIGNED 0f200400 TAG (801c) LEN(0001) ALIGNED 1c800100 TAG (801c) LEN(0001) ALIGNED 1c800100 TAG (0018) LEN(0004) UNALIGNED 00040000 | TAG (0018) LEN(0004) UNALIGNED 18000400 TAG (0019) LEN(0004) UNALIGNED 00040000 | TAG (0019) LEN(0004) UNALIGNED 19000400 TAG (001a) LEN(0004) UNALIGNED 00040004 | TAG (001a) LEN(0004) UNALIGNED 1a000400 TAG (0000) LEN(0000) UNALIGNED 00000000 TAG (0000) LEN(0000) UNALIGNED 00000000 TAG (0000) LEN(0000) ALIGNED 00000000 TAG (0000) LEN(0000) ALIGNED 00000000 As you can see, the 3 unaligned writes are different, in fact we are back to the same problem I had originally. Since put_unaligned resolves to a memcpy ,the only difference is the size that is passed to memcpy, you pass sizeof(val) (which is 2 (short) and is absolutely the correct value to pass I believe), in my version I mistakenly passed sizeof(tl->tag_list_cpos) which is 4 (a short *), totally incorrect. Anyway, If I modify put_aligned accordingly, it starts working. I am afraid my brain is too addled to actually work out what on earth is going on here....I am sure I used to understand this stuff. Can you offer me any pointers (excuse the pun). If you feel you would like to look into this futher, and it would help, I can offer you a (non-root) shell account on one of these machines, good for testing code. I will probably knock up a standalone test to make this a bit simpler to verify Regards Nick Nick Liebmann wrote: > Philipp Reisner wrote: > >> On Tuesday 16 June 2009 23:49:10 Nick Liebmann wrote: >> >> >>> Hi Philipp, >>> >>> I have spent a bit more time looking into this. I am now working from >>> the latest version of the git repository, which seems to have had your >>> changes applied. >>> >>> I have gone back to basics, since I was concered I was feeding you >>> mis-information. >>> >>> >> [...] >> >> Please correct me if I got something wrong: >> * You used the latest sources from GIT >> * If you use the "drbdadm up" command you get >> >> 0: Failure: (126) UnknownMandatoryTag >> Command 'drbdsetup 0 net XXX.XXX.XXX.XXX:7788 XXX.XXX.XXX.XXX:7788 C >> --set-defaults --create-device --allow-two-primaries >> --after-sb-0pri=discard-zero-changes --after-sb-1pri=discard-secondary >> --after-sb-2pri=disconnect' terminated with exit code 10 >> >> Jun 16 22:34:20 nasty kernel: block drbd0: Unknown tag: 3526 >> >> >> * If you use the drbdsetup commands one by one, it works. >> >> > > It works better..... > > However I am unable to issue this command > > nasty:~# drbdadm -- --overwrite-data-of-peer primary r0 > 0: State change failed: (-2) Refusing to be Primary without at least one > UpToDate disk > Command 'drbdsetup 0 primary --overwrite-data-of-peer' terminated with > exit code 17 > > Which worked on the previous version, and obviously is required in order > to do my first time sync. > I wondered if the two issues were connected, and perhaps symptomatic of > another, underlying issue. > > > >> That does not feel like an alignment issue, that feels more like a >> race condition. >> >> -Phil >> >> > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > >