Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi All, I've run into some problems on my DRDB cluster this week. This cluster has been running fine for over a year. All of a sudden the secondary failed: Jan 14 08:37:58 data2 kernel: [4895290.318176] drbd r0: meta connection shut down by peer. Jan 14 08:37:58 data2 kernel: [4895290.318361] drbd r0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Jan 14 08:37:58 data2 kernel: [4895290.318532] drbd r0: asender terminated Jan 14 08:37:58 data2 kernel: [4895290.318534] drbd r0: Terminating drbd_a_r0 Jan 14 08:38:07 data2 kernel: [4895298.502391] drbd r0: Connection closed Jan 14 08:38:07 data2 kernel: [4895298.502405] drbd r0: conn( NetworkFailure -> Unconnected ) Jan 14 08:38:07 data2 kernel: [4895298.502406] drbd r0: receiver terminated Jan 14 08:38:07 data2 kernel: [4895298.502408] drbd r0: Restarting receiver thread Jan 14 08:38:07 data2 kernel: [4895298.502409] drbd r0: receiver (re)started Jan 14 08:38:07 data2 kernel: [4895298.502415] drbd r0: conn( Unconnected -> WFConnection ) Jan 14 08:38:07 data2 kernel: [4895299.002586] drbd r0: Handshake successful: Agreed network protocol version 101 Jan 14 08:38:07 data2 kernel: [4895299.002592] drbd r0: Agreed to support TRIM on protocol level Jan 14 08:38:07 data2 kernel: [4895299.002813] drbd r0: Peer authenticated using 20 bytes HMAC Jan 14 08:38:07 data2 kernel: [4895299.002848] drbd r0: conn( WFConnection -> WFReportParams ) Jan 14 08:38:07 data2 kernel: [4895299.002852] drbd r0: Starting asender thread (from drbd_r_r0 [3400]) It would reconnect, sync, and disconnect again. I stopped the node, checked the hardware (all seems fine), rebooted and tried to start drbd again: root at data2:/var/log# drbdadm connect r0 r0: Failure: (158) Unknown resource additional info from kernel: unknown resource Command 'drbdsetup-84 connect r0 ipv4:172.16.0.2:7789 ipv4:172.16.0.1:7789 --max-epoch-size=8000 --max-buffers=8000 --sndbuf-size=0 --after-sb-2pri=disconnect --after-sb-1pri=disconnect --after-sb-0pri=disconnect --shared-secret=1e69dc721fd2e65368ae3ba1e5929979 --verify-alg=sha1 --cram-hmac-alg=sha1 --protocol=C' terminated with exit code 10 My resource is: resource r0 { on data1 { device /dev/drbd0; disk /dev/sda1; address ipv4 172.16.0.1:7789; meta-disk internal; } on data2 { device /dev/drbd0; disk /dev/sda1; address ipv4 172.16.0.2:7789; meta-disk internal; } } And yes, /dev/sda1 does exist. I tried different things, I've updated to 8.4.7-1 (It was running 8.4.5), but I can't get it to work. I'm kind of stuck here - I have no idea what is going wrong here. Any help would be greatly appreciated. Kind regards, Dirk -- ProActive Software <http://www.proactive.nl> *T* 023 - 5422299 *M* 06 - 25078793 *W* www.proactive.nl <http://www.proactive.nl> Twitter <https://twitter.com/ProActive_nl> YouTube <http://www.youtube.com/user/ProActiveSoftware> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160117/c70f6294/attachment.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: ProActive_Logo.png Type: image/png Size: 5701 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160117/c70f6294/attachment.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: twitter_ico.png Type: image/png Size: 53339 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160117/c70f6294/attachment-0001.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: youtube_ico.png Type: image/png Size: 55819 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160117/c70f6294/attachment-0002.png>