Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
Problem:
-----------------
In Pacemaker GFS2 DRBD dual-Primary setup, before the initial syncing
between the 2 nodes was complete one node accidentally got shutdown
(server4) i.e. while initial DRBD syncing from server4 --> server7 was
going on the server4 crashed. The server7 was left in Inconsistent state.
On surviving node (server7) I could see errors in /var/log/messages:
Apr 2 00:41:04 server7 kernel: block drbd0: State change failed: Need
access to UpToDate data
Apr 2 00:41:04 server7 kernel: block drbd0: state = { cs:SyncTarget
ro:Primary/Secondary ds:Inconsistent/UpToDate r----- }
Apr 2 00:41:04 server7 kernel: block drbd0: wanted = { cs:TearDown
ro:Primary/Unknown ds:Inconsistent/Outdated r----- }
Apr 2 00:41:04 server7 kernel: drbd vDrbd: State change failed: Need
access to UpToDate data
Apr 2 00:41:04 server7 kernel: drbd vDrbd: mask = 0x1e1f0 val = 0xa070
Apr 2 00:41:04 server7 kernel: drbd vDrbd: old_conn:WFReportParams
wanted_conn:TearDown
Apr 2 00:41:05 server7 kernel: block drbd0: State change failed: Need
access to UpToDate data
Apr 2 00:41:05 server7 kernel: block drbd0: state = { cs:SyncTarget
ro:Primary/Secondary ds:Inconsistent/UpToDate r----- }
Apr 2 00:41:05 server7 kernel: block drbd0: wanted = { cs:TearDown
ro:Primary/Unknown ds:Inconsistent/DUnknown s---F- }
Apr 2 00:41:05 server7 kernel: drbd vDrbd: State change failed: Need
access to UpToDate data
Apr 2 00:41:05 server7 kernel: drbd vDrbd: mask = 0x1f0 val = 0x70
Apr 2 00:41:05 server7 kernel: drbd vDrbd: old_conn:WFReportParams
wanted_conn:TearDown
Apr 2 00:41:05 server7 kernel: block drbd0: State change failed: Need
access to UpToDate data
Apr 2 00:41:05 server7 kernel: block drbd0: state = { cs:SyncTarget
ro:Primary/Secondary ds:Inconsistent/UpToDate r----- }
Apr 2 00:41:05 server7 kernel: block drbd0: wanted = { cs:TearDown
ro:Primary/Unknown ds:Inconsistent/Outdated r----- }
Apr 2 00:41:05 server7 kernel: drbd vDrbd: State change failed: Need
access to UpToDate data
Apr 2 00:41:05 server7 kernel: drbd vDrbd: mask = 0x1e1f0 val = 0xa070
Apr 2 00:41:05 server7 kernel: drbd vDrbd: old_conn:WFReportParams
wanted_conn:TearDown
Apr 2 00:41:06 server7 kernel: block drbd0: State change failed: Need
access to UpToDate data
Apr 2 00:41:06 server7 kernel: block drbd0: state = { cs:SyncTarget
ro:Primary/Secondary ds:Inconsistent/UpToDate r----- }
Apr 2 00:41:06 server7 kernel: block drbd0: wanted = { cs:TearDown
ro:Primary/Unknown ds:Inconsistent/DUnknown s---F- }
Apr 2 00:41:06 server7 kernel: drbd vDrbd: State change failed: Need
access to UpToDate data
Apr 2 00:41:06 server7 kernel: drbd vDrbd: mask = 0x1f0 val = 0x70
Apr 2 00:41:06 server7 kernel: drbd vDrbd: old_conn:WFReportParams
wanted_conn:TearDown
Apr 2 00:41:06 server7 kernel: block drbd0: State change failed: Need
access to UpToDate data
Apr 2 00:41:06 server7 kernel: block drbd0: state = { cs:SyncTarget
ro:Primary/Secondary ds:Inconsistent/UpToDate r----- }
Apr 2 00:41:06 server7 kernel: block drbd0: wanted = { cs:TearDown
ro:Primary/Unknown ds:Inconsistent/Outdated r----- }
Apr 2 00:41:06 server7 kernel: drbd vDrbd: State change failed: Need
access to UpToDate data
Apr 2 00:41:22 server7 kernel: drbd vDrbd: PingAck did not arrive in time.
Apr 2 00:41:22 server7 kernel: drbd vDrbd: peer( Secondary -> Unknown )
conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0
-> 1 )
Apr 2 00:41:22 server7 kernel: block drbd0: helper command: /sbin/drbdadm
pri-on-incon-degr minor-0
Apr 2 00:41:22 server7 kernel: block drbd0: helper command: /sbin/drbdadm
pri-on-incon-degr minor-0 exit code 0 (0x0)
Apr 2 00:41:22 server7 kernel: drbd vDrbd: ack_receiver terminated
Apr 2 00:41:22 server7 kernel: drbd vDrbd: Terminating drbd_a_vDrbd
Apr 2 00:41:22 server7 kernel: drbd vDrbd: Connection closed
Apr 2 00:41:22 server7 kernel: drbd vDrbd: conn( NetworkFailure ->
Unconnected )
Apr 2 00:41:22 server7 kernel: drbd vDrbd: receiver terminated
Apr 2 00:41:22 server7 kernel: drbd vDrbd: Restarting receiver thread
Apr 2 00:41:22 server7 kernel: drbd vDrbd: receiver (re)started
Apr 2 00:41:22 server7 kernel: drbd vDrbd: conn( Unconnected ->
WFConnection )
*Apr 2 00:41:22 server7 kernel: drbd vDrbd: Not fencing peer, I'm not even
Consistent myself.*
Apr 2 00:41:22 server7 kernel: drbd vDrbd: susp( 1 -> 0 )
Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor
remote data, sector 0+0
Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor
remote data, sector 344936+8
Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5
writing to log
Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor
remote data, sector 344944+24
Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5
writing to log
Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor
remote data, sector 0+0
Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor
remote data, sector 344968+8
Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5
writing to log
Apr 2 00:41:22 server7 kernel: Buffer I/O error on dev dm-0, logical block
66218, lost async page write
Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5
writing to log
Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5
writing to log
DRBD state on surviving node server7
---------------------------------------------------------------
version: 8.4.9-1 (api:1/proto:86-101)
GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by akemi at Build64R7,
2016-12-04 01:08:48
0: cs:WFConnection ro:Primary/Unknown ds:Inconsistent/DUnknown C r-----
ns:3414 nr:1438774 dw:1441849 dr:72701144 al:25 bm:0 lo:0 pe:0 ua:0
ap:0 ep:1 wo:f oos:29623116
Question:
------------------
Are these serious in nature?
When crashed node comes UP again and joins cluster will it cause any
problem?
How this can be avoided if a node crashes before sync completes?
Env:
---------
CentOS 7.3
DRBD 8.4
gfs2-utils-3.1.9-3.el7.x86_64
Pacemaker 1.1.15-11.el7_3.4
Pacemaker:
---------------------
[root at server7 ~]# pcs status
Cluster name: vCluster
Stack: corosync
Current DC: server7ha (version 1.1.15-11.el7_3.4-e174ec8) - partition with
quorum
Last updated: Sun Apr 2 01:01:43 2017 Last change: Sun Apr 2
00:28:39 2017 by root via cibadmin on server4ha
2 nodes and 9 resources configured
Online: [ server7ha ]
OFFLINE: [ server4ha ]
Full list of resources:
vCluster-VirtualIP-10.168.10.199 (ocf::heartbeat:IPaddr2):
Started server7ha
vCluster-Stonith-server7ha (stonith:fence_ipmilan): Stopped
vCluster-Stonith-server4ha (stonith:fence_ipmilan): Started
server7ha
Clone Set: dlm-clone [dlm]
Started: [ server7ha ]
Stopped: [ server4ha ]
Clone Set: clvmd-clone [clvmd]
Started: [ server7ha ]
Stopped: [ server4ha ]
Master/Slave Set: drbd_data_clone [drbd_data]
Masters: [ server7ha ]
Stopped: [ server4ha ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root at server7 ~]#
Attaching DRBD config files.
--Raman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170402/f32b3688/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: global_common.conf
Type: application/octet-stream
Size: 367 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170402/f32b3688/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vDrbd.res
Type: application/octet-stream
Size: 629 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170402/f32b3688/attachment-0001.obj>