Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I have a pair of Xen hosts, and the guests are pacemaker resources each of which has its own underlying DRBD storage device. I have drbd version 8.4.5 [0], and "minor-count 200;" in global_common.conf.[1] Last night, attempting to create a new DRBD resource failed, the error being (having passed through a few layers of reporting): "drbd.d/mws-priv-95.res:7: in resource mws-priv-95, on agogue: IP fd19:1b70:f7a6:1ae5::8d:6 not found on this host." This is incorrect, and I can verify that fd191b70f7a61ae500000000008d0006 03 40 00 80 eth1 appears in /proc/net/if_inet6. But perhaps this is all a red herring? Then today the cluster tried to migrate some guests[2] and all hell broke loose with xenstore unable to talk to the block devices any more, and drbdadm failed to be able to bring /any/ devices up or into the primary state, again complaining about missing IP. The output of strace -fvy -s 512 drbdadm up mws-priv-18 is at http://www.chiark.greenend.org.uk/~matthewv/junk/drbdupstrace [that's the other Xen host, and it has, in /proc/net/if_inet6: fd191b70f7a61ae500000000008d0007 03 40 00 80 eth1 ] This is a pretty serious problem, and I could only resolve it by rebooting both guests in turn. Any ideas of how to debug it if it happens again, resolve it without rebooting, or ideally stop it happening again? 90 drbd devices doesn't seem like it should be too many... Regards, Matthew [0] top of /proc/drbd - version: 8.4.5 (api:1/proto:86-101) srcversion: 5A4F43804B37BB28FCB1F47 [1] I appreciate that this should be unnecessary, but it seemed to help when I saw a similar issue in the past (see thread title "Misleading error messages from drbdadm up (IP not found on this host)" from 27 Jan [2] example xen logging of failed migration - you can see xen having problems with the backing store http://www.chiark.greenend.org.uk/~matthewv/junk/migration-fail.txt