Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hey all, I am currently running into an issue using drbd in a xen cluster (managed by ganeti). When adding drbd instances, I can add up to 17 without issue, but the 18th instance stalls on initial sync: block drbd17: Starting worker thread (from cqueue/2 [261]) block drbd17: disk( Diskless -> Attaching ) block drbd17: No usable activity log found. block drbd17: Method to ensure write ordering: barrier block drbd17: max_segment_size ( = BIO size ) = 32768 block drbd17: drbd_bm_resize called with capacity == 419430400 block drbd17: resync bitmap: bits=52428800 words=819200 block drbd17: size = 200 GB (209715200 KB) block drbd17: Writing the whole bitmap, size changed block drbd17: 200 GB (52428800 bits) marked out-of-sync by on disk bit-map. block drbd17: recounting of set bits took additional 2 jiffies block drbd17: 200 GB (52428800 bits) marked out-of-sync by on disk bit-map. block drbd17: disk( Attaching -> Inconsistent ) block drbd17: Barriers not supported on meta data device - disabling block drbd17: conn( StandAlone -> Unconnected ) block drbd17: Starting receiver thread (from drbd17_worker [21794]) block drbd17: receiver (re)started block drbd17: conn( Unconnected -> WFConnection ) block drbd17: Handshake successful: Agreed network protocol version 94 block drbd17: Peer authenticated using 16 bytes of 'md5' HMAC block drbd17: conn( WFConnection -> WFReportParams ) block drbd17: Starting asender thread (from drbd17_receiver [21799]) block drbd17: data-integrity-alg: <not-used> block drbd17: drbd_sync_handshake: block drbd17: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:52428800 flags:0 block drbd17: peer 4829B58EB3A8FE8D:0000000000000004:0000000000000000:0000000000000000 bits:52428800 flags:0 block drbd17: uuid_compare()=-2 by rule 20 block drbd17: Becoming sync target due to disk states. block drbd17: Writing the whole bitmap, full sync required after drbd_sync_handshake. block drbd17: 200 GB (52428800 bits) marked out-of-sync by on disk bit-map. block drbd17: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) block drbd17: conn( WFBitMapT -> WFSyncUUID ) block drbd17: helper command: /bin/true before-resync-target minor-17 block drbd17: helper command: /bin/true before-resync-target minor-17 exit code 0 (0x0) block drbd17: conn( WFSyncUUID -> SyncTarget ) block drbd17: Began resync as SyncTarget (will sync 209715200 KB [52428800 bits set]). block drbd17: peer( Primary -> Unknown ) conn( SyncTarget -> Disconnecting ) pdsk( UpToDate -> DUnknown ) block drbd17: short read expecting header on sock: r=-512 block drbd17: meta connection shut down by peer. block drbd17: asender terminated block drbd17: Terminating asender thread block drbd17: Connection closed block drbd17: conn( Disconnecting -> StandAlone ) block drbd17: receiver terminated block drbd17: Terminating receiver thread block drbd17: disk( Inconsistent -> Diskless ) block drbd17: drbd_bm_resize called with capacity == 0 block drbd17: worker terminated block drbd17: Terminating worker thread I am running Centos 5 xen, drbd 8.3.8. I have tried multiple kernel/drbd(8.3.2/8)/bios combinations to no avail. This behavior is consistent between all nodes (currently 5). I have even changed out the switch the drbd data is transferred on. Currently the xen is running with 4GB ram allocated to dom0, with over 2GB free on each node. Do I just have not enough ram allocated to dom0? or am I missing something else. Any thoughts/assistance is appreciated. -- Andrew Maldonado Systems Administrator Pictage, Inc.