Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
version: 8.3.11 (api:88/proto:86-96) May 10 21:11:55 scale-192-168-54-14 kernel: block drbd1: [drbd1_worker/8936] sock_sendmsg time expired, ko = 4294967295 ... (long countdown that will never get anywhere) This is a "dual primary" setup (underneath GPFS) over a failover-bonded network interface. Everything works fine (read/write/reboot/etc) until I attempt a verify. My configuration has no reference to ko-count, which from the documentation suggests it should be 0 and be disabled. Does the documentation actually intend to say that the default is 2^32? I'm building/running this all on a clone of RHEL6.2. This is occurring during an attempt to 'verify' a dual primary DRBD device. Originally I received this message on every attempt at verify, but after I reduced syncer { rate }, this message only props up after a few iterations. There is no network/connectivity problem during this time period, yet drbd commands hang such as: strace -f drbdsetup 1 disconnect --force ... stat("/proc/drbd", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 open("/var/lock/drbd-147-1", O_RDWR|O_CREAT, 0600) = 3 rt_sigaction(SIGALRM, {0x406b30, [], SA_RESTORER, 0x3935232900}, {SIG_DFL, [], 0}, 8) = 0 alarm(1) = 0 fcntl(3, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 alarm(0) = 1 rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, 0x3935232900}, NULL, 8) = 0 socket(PF_NETLINK, SOCK_DGRAM, 11) = 4 getpid() = 13360 bind(4, {sa_family=AF_NETLINK, pid=13360, groups=ffffffff}, 12) = 0 sendto(4, "9\0\0\0\3\0\0\0\1\0\0\00004\0\0\4\0\0\0\1\0\0\0\1\0\0\00004\0\0"..., 57, 0, NULL, 0) = 57 poll([{fd=4, events=POLLIN}], 1, 120000 << this is where it hangs and exits after a terminate (ctrl-c) >> All that's going on in the dmesg output is sock_sendmsg expiration reports. The documentation here also would be better if *count *and *number* were consistent (either 'count' or 'number'). > ko-count *number > *In case the secondary node fails to complete a single write request for * > count* times the *timeout*, it is expelled from the cluster. (I.e. the > primary node goes into StandAlone mode.) The default value is 0, which > disables this feature. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120510/af272b70/attachment.htm>