Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I'm having a problem with a DRBD 0.7.10 volume. After performing the initial sync, the master seems to block. Load average goes through the roof. Since this is a production mail server I've had to turn off the slave and run single-legged, which works just fine. The box performs just like it did before I added DRBD. I have another server which is running similar hardware and similar setup and it's got no such problems. Nothing like this popped up during testing, although those were different servers. Here is the dmesg output of the process drbd0: Handshake successful: DRBD Network Protocol version 74 drbd0: Connection established. drbd0: I am(P): 1:00000002:00000001:00000007:00000001:10 drbd0: Peer(S): 0:00000002:00000001:00000006:00000001:01 drbd0: drbd0_receiver [412]: cstate WFReportParams --> WFBitMapS drbd0: Primary/Unknown --> Primary/Secondary drbd0: drbd0_receiver [412]: cstate WFBitMapS --> SyncSource drbd0: Resync started as SyncSource (need to sync 4453844 KB [1113461 bits set]). drbd0: Resync done (total 1296 sec; paused 0 sec; 3436 K/sec) drbd0: drbd0_worker [22324]: cstate SyncSource --> Connected drbd0: sock was shut down by peer drbd0: meta connection shut down by peer. drbd0: drbd0_asender [15301]: cstate Connected --> NetworkFailure drbd0: asender terminated drbd0: drbd0_receiver [412]: cstate NetworkFailure --> BrokenPipe drbd0: short read expecting header on sock: r=0 drbd0: worker terminated drbd0: drbd0_receiver [412]: cstate BrokenPipe --> Unconnected drbd0: Connection lost. drbd0: drbd0_receiver [412]: cstate Unconnected --> WFConnection Nothing seems unusual there. I also traced the /proc/drbd info during that time. Here it is. # while true; do cat /proc/drbd |grep bm; sleep 1; done ns:84416914 nr:0 dw:76543740 dr:166308862 al:2136886 bm:2136347 lo:2 pe:869 ua:1631 ap:28 ns:84419154 nr:0 dw:76543984 dr:166313806 al:2136892 bm:2136347 lo:3 pe:319 ua:1132 ap:67 ns:84425174 nr:0 dw:76545548 dr:166313806 al:2136902 bm:2136349 lo:2 pe:585 ua:18 ap:13 ns:84425690 nr:0 dw:76545992 dr:166313866 al:2136939 bm:2136353 lo:8 pe:0 ua:0 ap:5 ns:84427234 nr:0 dw:76547536 dr:166315054 al:2137039 bm:2136353 lo:1 pe:16 ua:0 ap:17 ns:84427234 nr:0 dw:76547536 dr:166316054 al:2137039 bm:2136353 lo:1 pe:16 ua:0 ap:17 ns:84427234 nr:0 dw:76547536 dr:166319686 al:2137039 bm:2136353 lo:32 pe:16 ua:0 ap:48 -- This is about when the sync finished ns:84427234 nr:0 dw:76547536 dr:166324046 al:2137039 bm:2136353 lo:3 pe:16 ua:0 ap:19 ns:84427234 nr:0 dw:76547536 dr:166325930 al:2137039 bm:2136353 lo:0 pe:16 ua:0 ap:16 ns:84427234 nr:0 dw:76547536 dr:166327798 al:2137039 bm:2136353 lo:0 pe:16 ua:0 ap:16 ns:84427234 nr:0 dw:76547536 dr:166330402 al:2137039 bm:2136353 lo:1 pe:16 ua:0 ap:17 ns:84427234 nr:0 dw:76547536 dr:166336218 al:2137039 bm:2136353 lo:1 pe:16 ua:0 ap:17 ns:84427234 nr:0 dw:76547536 dr:166338454 al:2137039 bm:2136353 lo:2 pe:16 ua:0 ap:18 ns:84427234 nr:0 dw:76547536 dr:166340118 al:2137039 bm:2136353 lo:0 pe:16 ua:0 ap:16 ns:84429598 nr:0 dw:76549900 dr:166340218 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340218 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340218 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340218 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340218 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340234 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340234 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340234 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340234 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340238 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340238 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340238 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340250 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429598 nr:0 dw:76549900 dr:166340250 al:2137069 bm:2136353 lo:0 pe:15 ua:0 ap:15 ns:84429606 nr:0 dw:76549908 dr:166340274 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340282 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340282 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340282 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340282 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340282 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340282 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340282 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340322 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340322 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340334 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340334 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340338 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340338 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340346 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340346 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76549908 dr:166340346 al:2137069 bm:2136353 lo:0 pe:1 ua:0 ap:1 ns:84429606 nr:0 dw:76551308 dr:166341006 al:2137073 bm:2136357 lo:0 pe:0 ua:0 ap:0 -- this is about when I stopped the secondary ns:84429606 nr:0 dw:76555764 dr:166343742 al:2137093 bm:2136377 lo:47 pe:0 ua:0 ap:46 ns:84429606 nr:0 dw:76557484 dr:166346994 al:2137107 bm:2136391 lo:6 pe:0 ua:0 ap:6 ns:84429606 nr:0 dw:76559868 dr:166347738 al:2137136 bm:2136420 lo:1 pe:0 ua:0 ap:1 ns:84429606 nr:0 dw:76562492 dr:166348186 al:2137158 bm:2136442 lo:5 pe:0 ua:0 ap:5 ns:84429606 nr:0 dw:76563768 dr:166348634 al:2137172 bm:2136456 lo:3 pe:0 ua:0 ap:0 ns:84429606 nr:0 dw:76566732 dr:166349098 al:2137363 bm:2136648 lo:6 pe:0 ua:0 ap:3 It almost seems like the secondary isn't writing changes to the disk, but only when it's consistent. Everything works great during a sync. Corey -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20050511/b1ac3f21/attachment.pgp>