Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Late last night I started getting paged for a DRBD issue. It appears that
the two servers have lost connection for an unknown reason.
Here is an excerpt from the logs, it should be a complete startup:
Aug 13 08:53:36 plccnfs02 kernel: drbd0: drbdsetup [7274]: cstate
WFConnection --> Unconnected
Aug 13 08:53:36 plccnfs02 kernel: drbd0: worker terminated
Aug 13 08:53:36 plccnfs02 kernel: drbd0: drbd0_receiver [6956]: cstate
Unconnected --> StandAlone
Aug 13 08:53:36 plccnfs02 kernel: drbd0: Connection lost.
Aug 13 08:53:36 plccnfs02 kernel: drbd0: Discarding network configuration.
Aug 13 08:53:36 plccnfs02 kernel: drbd0: drbd0_receiver [6956]: cstate
StandAlone --> StandAlone
Aug 13 08:53:36 plccnfs02 kernel: drbd0: receiver terminated
Aug 13 08:53:36 plccnfs02 kernel: drbd0: drbdsetup [7274]: cstate
StandAlone --> StandAlone
Aug 13 08:53:36 plccnfs02 kernel: drbd0: drbdsetup [7274]: cstate
StandAlone --> Unconfigured
Aug 13 08:53:36 plccnfs02 kernel: drbd0: worker terminated
Aug 13 08:53:42 plccnfs02 kernel: drbd0: resync bitmap: bits=10453652
words=326678
Aug 13 08:53:42 plccnfs02 kernel: drbd0: size = 39 GB (41814608 KB)
Aug 13 08:53:43 plccnfs02 kernel: drbd0: 1116 KB marked out-of-sync by on
disk bit-map.
Aug 13 08:53:43 plccnfs02 kernel: drbd0: Found 4 transactions (192 active
extents) in activity log.
Aug 13 08:53:43 plccnfs02 kernel: drbd0: drbdsetup [7284]: cstate
Unconfigured --> StandAlone
Aug 13 08:53:43 plccnfs02 kernel: drbd0: drbdsetup [7287]: cstate
StandAlone --> Unconnected
Aug 13 08:53:43 plccnfs02 kernel: drbd0: drbd0_receiver [7288]: cstate
Unconnected --> WFConnection
Aug 13 08:54:01 plccnfs02 kernel: drbd0: drbdsetup [7295]: cstate
WFConnection --> Unconnected
Aug 13 08:54:01 plccnfs02 kernel: drbd0: worker terminated
Aug 13 08:54:01 plccnfs02 kernel: drbd0: drbd0_receiver [7288]: cstate
Unconnected --> StandAlone
Aug 13 08:54:01 plccnfs02 kernel: drbd0: Connection lost.
Aug 13 08:54:01 plccnfs02 kernel: drbd0: Discarding network configuration.
Aug 13 08:54:01 plccnfs02 kernel: drbd0: drbd0_receiver [7288]: cstate
StandAlone --> StandAlone
Aug 13 08:54:01 plccnfs02 kernel: drbd0: receiver terminated
Aug 13 08:54:01 plccnfs02 kernel: drbd0: drbdsetup [7295]: cstate
StandAlone --> StandAlone
Aug 13 08:54:01 plccnfs02 kernel: drbd0: drbdsetup [7295]: cstate
StandAlone --> Unconfigured
Aug 13 08:54:01 plccnfs02 kernel: drbd0: worker terminated
Aug 13 08:54:05 plccnfs02 kernel: drbd0: resync bitmap: bits=10453652
words=326678
Aug 13 08:54:05 plccnfs02 kernel: drbd0: size = 39 GB (41814608 KB)
Aug 13 08:54:05 plccnfs02 kernel: drbd0: 1116 KB marked out-of-sync by on
disk bit-map.
Aug 13 08:54:05 plccnfs02 kernel: drbd0: Found 4 transactions (192 active
extents) in activity log.
Aug 13 08:54:05 plccnfs02 kernel: drbd0: drbdsetup [7304]: cstate
Unconfigured --> StandAlone
Aug 13 08:54:05 plccnfs02 kernel: drbd0: drbdsetup [7307]: cstate
StandAlone --> Unconnected
Aug 13 08:54:05 plccnfs02 kernel: drbd0: drbd0_receiver [7308]: cstate
Unconnected --> WFConnection
Here is /etc/drbd.conf (the same on both machines):
resource drbd0 {
protocol C;
incon-degr-cmd "halt -f"; # killall heartbeat would be a good
alternative :->
startup {
degr-wfc-timeout 120; # 2 minutes
}
disk {
on-io-error detach;
}
syncer {
rate 10M; # Note: 'M' is MegaBytes, not MegaBits
}
on plccnfs01 {
device /dev/drbd0;
disk /dev/cciss/c0d1p1;
address 10.1.100.173:7789;
meta-disk internal;
}
on plccnfs02 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.1.100.172:7789;
meta-disk internal;
}
}
On plccnfs01 there are no drbd issues in /var/log/messages at all. The
previous log was from the secondary. At this time I cannot get the
secondary device to come up as part of the cluster. I have tried
restarting DRBD, rebooting the machine, using drbdadm, and pretty much
everything I could think of. Any help at all would be greatly appreciated.
Best Regards,
Mark L. Potter
Systems Engineer
Academy Sports & Outdoors