[DRBD-user] kernel: drbd0: Got NegAck packet. Peer is in troubles?
guohuai_li at hotmail.com
Mon Nov 16 12:34:12 CET 2009
In this case:
now DRBD runs well both on primay node and secondary node.
(Suse linux OS.)
(DRBD-8.3.0, use Protocol C. )
On primary node, one process use linux app "msync"(sync mode) to sync the content to disk.
(And this disk is the block device which DRBD protected.)
Now on the secondary node, the disk has some problem (for example, the disk is pull out from the machine...).
At this very point, the thread will block while calling "msync" api; but, api "fsync" will not be blocked.
( I reproduced this issue in lab, "UpToDate -> Diskless " in /var/log/messages happened at the same time with the time
when "msync" calling is blocked. )
About 3 minutes later, another process is invoked by me, and it will call "msync" api also, but now, it will not be blocked.
So, i think "msync" (sync mode)calling is waiting the OS to tell it that the data is flushed to disk.
But the local disk will not mark the data is flushed to local disk until the remote disk is flushed. (Because DRBD used "Protocol C".)
But at this very point, the remote disk has some problem, (for example, the disk has problem; I reproduded this issue by pulling the disk out ...)
So, "msync" calling is block.
But when DRBD found remore disk is "Diskless", "msync" api will not be blocked.
Because DRBD has the intelligence that remote disk is "Diskless", so it will not check the remore disk to flush data.
And when local disk is flushed, it will mark the data is flushed. So, at this time, "msync" calling will not be blocked.
Would you help to explain it ?
How could I solve this problem ?
+++++++++++++++++++++++++++++++ in /var/log/messages of primary node ++++++++++++++++++++++++++++++
Nov 16 17:19:27 linux-17 kernel: drbd0: Got NegAck packet. Peer is in troubles?
Nov 16 17:19:27 linux-17 kernel: drbd0: pdsk( UpToDate -> Diskless )
Nov 16 17:19:27 linux-17 kernel: drbd0: Creating new current UUID
+++++++++++++++++++++++++++++++ below is on primary node ++++++++++++++++++++++++++++++++++++++++++++++
linux-17:~ # cat /proc/drbd
version: 8.3.0 (api:88/proto:86-89)
GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by root at linux-17, 2009-02-21 11:09:33
0: cs:Connected ro:Primary/Secondary ds:UpToDate/Diskless C r---
ns:1243340 nr:168424 dw:1554336 dr:70703 al:28 bm:63 lo:0 pe:2 ua:0 ap:1 ep:1 wo:b oos:28
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---
ns:0 nr:53248 dw:6283648 dr:1833 al:3 bm:16 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
2: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---
ns:0 nr:8 dw:11 dr:96 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
> Date: Mon, 16 Nov 2009 09:42:25 +0100
> From: lars.ellenberg at linbit.com
> To: drbd-user at lists.linbit.com
> Subject: Re: [DRBD-user] kernel: drbd0: Got NegAck packet. Peer is in troubles?
> On Sat, Nov 14, 2009 at 09:27:12AM +0800, guohuai li wrote:
> > i use drbd8.3.0
> we have 8.3.5 respective 8.3.6 now.
> > i also met such problem.
> > I used msync linux api (sync mode) to write on primary node.
> > And also it blocks at this case.
> In _which_ case does _what_ block,
> and what exactly do you mean by "block"?
> when "it" "blocks",
> I'd like to know what happened up to that point,
> have the kernel logs,
> cat /proc/drbd
> ps -eo pid,state,wchan:30,cmd
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> please don't Cc me, but send to list -- I'm subscribed
> drbd-user mailing list
> drbd-user at lists.linbit.com
Keep your friends updated—even when you’re not signed in.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the drbd-user