Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi All,
I've run into some problems on my DRDB cluster this week. This cluster
has been running fine for over a year. All of a sudden the secondary failed:
Jan 14 08:37:58 data2 kernel: [4895290.318176] drbd r0: meta connection
shut down by peer.
Jan 14 08:37:58 data2 kernel: [4895290.318361] drbd r0: peer( Primary ->
Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Jan 14 08:37:58 data2 kernel: [4895290.318532] drbd r0: asender terminated
Jan 14 08:37:58 data2 kernel: [4895290.318534] drbd r0: Terminating
drbd_a_r0
Jan 14 08:38:07 data2 kernel: [4895298.502391] drbd r0: Connection closed
Jan 14 08:38:07 data2 kernel: [4895298.502405] drbd r0: conn(
NetworkFailure -> Unconnected )
Jan 14 08:38:07 data2 kernel: [4895298.502406] drbd r0: receiver terminated
Jan 14 08:38:07 data2 kernel: [4895298.502408] drbd r0: Restarting
receiver thread
Jan 14 08:38:07 data2 kernel: [4895298.502409] drbd r0: receiver (re)started
Jan 14 08:38:07 data2 kernel: [4895298.502415] drbd r0: conn(
Unconnected -> WFConnection )
Jan 14 08:38:07 data2 kernel: [4895299.002586] drbd r0: Handshake
successful: Agreed network protocol version 101
Jan 14 08:38:07 data2 kernel: [4895299.002592] drbd r0: Agreed to
support TRIM on protocol level
Jan 14 08:38:07 data2 kernel: [4895299.002813] drbd r0: Peer
authenticated using 20 bytes HMAC
Jan 14 08:38:07 data2 kernel: [4895299.002848] drbd r0: conn(
WFConnection -> WFReportParams )
Jan 14 08:38:07 data2 kernel: [4895299.002852] drbd r0: Starting asender
thread (from drbd_r_r0 [3400])
It would reconnect, sync, and disconnect again. I stopped the node,
checked the hardware (all seems fine), rebooted and tried to start drbd
again:
root at data2:/var/log# drbdadm connect r0
r0: Failure: (158) Unknown resource
additional info from kernel:
unknown resource
Command 'drbdsetup-84 connect r0 ipv4:172.16.0.2:7789
ipv4:172.16.0.1:7789 --max-epoch-size=8000 --max-buffers=8000
--sndbuf-size=0 --after-sb-2pri=disconnect --after-sb-1pri=disconnect
--after-sb-0pri=disconnect
--shared-secret=1e69dc721fd2e65368ae3ba1e5929979 --verify-alg=sha1
--cram-hmac-alg=sha1 --protocol=C' terminated with exit code 10
My resource is:
resource r0 {
on data1 {
device /dev/drbd0;
disk /dev/sda1;
address ipv4 172.16.0.1:7789;
meta-disk internal;
}
on data2 {
device /dev/drbd0;
disk /dev/sda1;
address ipv4 172.16.0.2:7789;
meta-disk internal;
}
}
And yes, /dev/sda1 does exist.
I tried different things, I've updated to 8.4.7-1 (It was running
8.4.5), but I can't get it to work. I'm kind of stuck here - I have no
idea what is going wrong here. Any help would be greatly appreciated.
Kind regards,
Dirk
--
ProActive Software <http://www.proactive.nl>
*T* 023 - 5422299
*M* 06 - 25078793
*W* www.proactive.nl <http://www.proactive.nl>
Twitter <https://twitter.com/ProActive_nl> YouTube
<http://www.youtube.com/user/ProActiveSoftware>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160117/c70f6294/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ProActive_Logo.png
Type: image/png
Size: 5701 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160117/c70f6294/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: twitter_ico.png
Type: image/png
Size: 53339 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160117/c70f6294/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: youtube_ico.png
Type: image/png
Size: 55819 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160117/c70f6294/attachment-0002.png>