Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Philipp,
I have spent a bit more time looking into this. I am now working from
the latest version of the git repository, which seems to have had your
changes applied.
I have gone back to basics, since I was concered I was feeding you
mis-information.
I have built the userland binaries, and the kernel module from the
latest source (git HEAD, no patches)
I have copied them onto both machines (both exactly the same machines)
My DRBD config is as follows:
***********************
global {
usage-count yes;
}
common {
syncer {
rate 10M;
}
}
resource r0 {
protocol C;
device /dev/drbd0;
meta-disk internal;
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
startup {
# become-primary-on both;
}
on nasty {
address XXX.XXX.XXX.XXX:7788;
disk /dev/sda7;
}
on coral {
address XXX.XXX.XXX.XXX:7788;
disk /dev/sda7;
}
}
*****************************
I am running the following commands on both machines
dd if=/dev/zero of=/dev/sda7 bs=10M
nasty:~# drbdadm create-md r0
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
nasty:~# modprobe drbd
</var/log/kern.log
Jun 16 22:32:14 nasty kernel: drbd: initialised. Version: 8.3.2rc1
(api:88/proto:86-90)
Jun 16 22:32:14 nasty kernel: drbd: GIT-hash:
df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty, 2009-06-16
22:04:24
Jun 16 22:32:14 nasty kernel: drbd: registered as block device major 147
Jun 16 22:32:14 nasty kernel: drbd: minor_table @ 0xc6f80620
nasty:~# drbdadm up r0
0: Failure: (126) UnknownMandatoryTag
Command 'drbdsetup 0 net XXX.XXX.XXX.XXX:7788 XXX.XXX.XXX.XXX:7788 C
--set-defaults --create-device --allow-two-primaries
--after-sb-0pri=discard-zero-changes --after-sb-1pri=discard-secondary
--after-sb-2pri=disconnect' terminated with exit code 10
</var/log/kern.log
Jun 16 22:34:19 nasty kernel: block drbd0: Starting worker thread (from
cqueue [8513])
Jun 16 22:34:19 nasty kernel: block drbd0: disk( Diskless -> Attaching )
Jun 16 22:34:19 nasty kernel: block drbd0: No usable activity log found.
Jun 16 22:34:19 nasty kernel: block drbd0: Method to ensure write
ordering: barrier
Jun 16 22:34:19 nasty kernel: block drbd0: max_segment_size ( = BIO size
) = 32768
Jun 16 22:34:19 nasty kernel: block drbd0: drbd_bm_resize called with
capacity == 208696
Jun 16 22:34:19 nasty kernel: block drbd0: resync bitmap: bits=26087
words=816
Jun 16 22:34:19 nasty kernel: block drbd0: size = 102 MB (104348 KB)
Jun 16 22:34:19 nasty kernel: block drbd0: Writing the whole bitmap,
size changed
Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked
out-of-sync by on disk bit-map.
Jun 16 22:34:20 nasty kernel: block drbd0: recounting of set bits took
additional 0 jiffies
Jun 16 22:34:20 nasty kernel: block drbd0: 102 MB (26087 bits) marked
out-of-sync by on disk bit-map.
Jun 16 22:34:20 nasty kernel: block drbd0: disk( Attaching ->
Inconsistent )
Jun 16 22:34:20 nasty kernel: block drbd0: Unknown tag: 3526
As you can see, this is not good!
If I try the verbose method;
nasty:~# drbdadm attach r0
</var/log/kern.log
Jun 16 22:38:05 nasty kernel: block drbd0: Starting worker thread (from
cqueue [8513])
Jun 16 22:38:05 nasty kernel: block drbd0: disk( Diskless -> Attaching )
Jun 16 22:38:05 nasty kernel: block drbd0: No usable activity log found.
Jun 16 22:38:05 nasty kernel: block drbd0: Method to ensure write
ordering: barrier
Jun 16 22:38:05 nasty kernel: block drbd0: max_segment_size ( = BIO size
) = 32768
Jun 16 22:38:05 nasty kernel: block drbd0: drbd_bm_resize called with
capacity == 208696
Jun 16 22:38:05 nasty kernel: block drbd0: resync bitmap: bits=26087
words=816
Jun 16 22:38:05 nasty kernel: block drbd0: size = 102 MB (104348 KB)
Jun 16 22:38:05 nasty kernel: block drbd0: recounting of set bits took
additional 0 jiffies
Jun 16 22:38:05 nasty kernel: block drbd0: 102 MB (26087 bits) marked
out-of-sync by on disk bit-map.
Jun 16 22:38:05 nasty kernel: block drbd0: disk( Attaching ->
Inconsistent )
nasty:~# cat /proc/drbd
version: 8.3.2rc1 (api:88/proto:86-90)
GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty,
2009-06-16 22:04:24
0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348
nasty:~# drbdadm syncer r0
nasty:~# drbdadm connect r0
</var/log/kern.log
Jun 16 22:40:35 nasty kernel: block drbd0: conn( StandAlone ->
Unconnected )
Jun 16 22:40:35 nasty kernel: block drbd0: Starting receiver thread
(from drbd0_worker [1370])
Jun 16 22:40:35 nasty kernel: block drbd0: receiver (re)started
Jun 16 22:40:35 nasty kernel: block drbd0: conn( Unconnected ->
WFConnection )
Jun 16 22:40:36 nasty kernel: block drbd0: Handshake successful: Agreed
network protocol version 90
Jun 16 22:40:36 nasty kernel: block drbd0: conn( WFConnection ->
WFReportParams )
Jun 16 22:40:36 nasty kernel: block drbd0: Starting asender thread (from
drbd0_receiver [1424])
Jun 16 22:40:36 nasty kernel: block drbd0: data-integrity-alg: <not-used>
Jun 16 22:40:36 nasty kernel: block drbd0: drbd_sync_handshake:
Jun 16 22:40:36 nasty kernel: block drbd0: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:26087 flags:0
Jun 16 22:40:36 nasty kernel: block drbd0: peer
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:26087 flags:0
Jun 16 22:40:36 nasty kernel: block drbd0: uuid_compare()=0 by rule 1
Jun 16 22:40:36 nasty kernel: block drbd0: No resync, but 26087 bits in
bitmap!
Jun 16 22:40:36 nasty kernel: block drbd0: peer( Unknown -> Secondary )
conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent )
nasty:~# cat /proc/drbd
version: 8.3.2rc1 (api:88/proto:86-90)
GIT-hash: df0abe3ea8a2beb412e094fbc76c8f4f0ba91858 build by nick at nasty,
2009-06-16 22:04:24
0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:104348
nasty:~# drbdadm -- --overwrite-data-of-peer primary r0
0: State change failed: (-2) Refusing to be Primary without at least one
UpToDate disk
Command 'drbdsetup 0 primary --overwrite-data-of-peer' terminated with
exit code 17
Jun 16 22:43:11 nasty kernel: block drbd0: State change failed: Refusing
to be Primary without at least one UpToDate disk
Jun 16 22:43:11 nasty kernel: block drbd0: state = { cs:Connected
ro:Secondary/Secondary ds:Inconsistent/Inconsistent r--- }
Jun 16 22:43:11 nasty kernel: block drbd0: wanted = { cs:Connected
ro:Primary/Secondary ds:Inconsistent/Inconsistent r--- }
The behaviour is the same from whichever end I run the last command, and
the logs are essentially the same at boths ends.
I am confident my network (firewall) connection is valid, it was working
with the previous (patched) version, this morning.
Is there any more information you would like from me.
I have some time to do some more debugging tomorrow, if you could give
me a clue where I should start looking?
Regards
Nick
Philipp Reisner wrote:
> On Monday 15 June 2009 23:21:51 you wrote:
>
>> Hi,
>>
>> With the latest patch, things are not working very well at all!
>>
>>
>> It think the problem may lie here:
>>
>>
>> #define put_unaligned(val, ptr) ({ \
>> + typeof(val) v; \
>> + switch (sizeof(*(ptr))) { \
>> + case 1: \
>> + *(uint8_t *)(ptr) = (uint8_t)(val); \
>> + break; \
>> + case 2: \
>> + case 4: \
>> + case 8: \
>> + v = val; \
>> + memcpy(ptr, &v, sizeof(*(ptr))); \
>> + break; \
>> + default: \
>> + __bad_unaligned_access_size(); \
>> + break; \
>> + } \
>> + (void)0; })
>> +
>>
>>
>> When called with
>>
>> put_unaligned(tag, tl->tag_list_cpos++);
>>
>> ptr is evaluated in the switch statement, and twice in the memcpy call,
>> and hence the post-increment will be done three times ....not what we want!
>>
>> Maybe an inline function, or taking a copy of ptr if safer here
>>
>>
>
> Hi Nick,
>
> No, typeof(), sizeof() do not evaluate the expression at runtime. They
> just deliver the type or the size at compile time. But in that macro
> there was an error with the type conversation.
>
> I have again, attached a patch, that should fix the issue. Please verify.
>
>