Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Problem: ----------------- In Pacemaker GFS2 DRBD dual-Primary setup, before the initial syncing between the 2 nodes was complete one node accidentally got shutdown (server4) i.e. while initial DRBD syncing from server4 --> server7 was going on the server4 crashed. The server7 was left in Inconsistent state. On surviving node (server7) I could see errors in /var/log/messages: Apr 2 00:41:04 server7 kernel: block drbd0: State change failed: Need access to UpToDate data Apr 2 00:41:04 server7 kernel: block drbd0: state = { cs:SyncTarget ro:Primary/Secondary ds:Inconsistent/UpToDate r----- } Apr 2 00:41:04 server7 kernel: block drbd0: wanted = { cs:TearDown ro:Primary/Unknown ds:Inconsistent/Outdated r----- } Apr 2 00:41:04 server7 kernel: drbd vDrbd: State change failed: Need access to UpToDate data Apr 2 00:41:04 server7 kernel: drbd vDrbd: mask = 0x1e1f0 val = 0xa070 Apr 2 00:41:04 server7 kernel: drbd vDrbd: old_conn:WFReportParams wanted_conn:TearDown Apr 2 00:41:05 server7 kernel: block drbd0: State change failed: Need access to UpToDate data Apr 2 00:41:05 server7 kernel: block drbd0: state = { cs:SyncTarget ro:Primary/Secondary ds:Inconsistent/UpToDate r----- } Apr 2 00:41:05 server7 kernel: block drbd0: wanted = { cs:TearDown ro:Primary/Unknown ds:Inconsistent/DUnknown s---F- } Apr 2 00:41:05 server7 kernel: drbd vDrbd: State change failed: Need access to UpToDate data Apr 2 00:41:05 server7 kernel: drbd vDrbd: mask = 0x1f0 val = 0x70 Apr 2 00:41:05 server7 kernel: drbd vDrbd: old_conn:WFReportParams wanted_conn:TearDown Apr 2 00:41:05 server7 kernel: block drbd0: State change failed: Need access to UpToDate data Apr 2 00:41:05 server7 kernel: block drbd0: state = { cs:SyncTarget ro:Primary/Secondary ds:Inconsistent/UpToDate r----- } Apr 2 00:41:05 server7 kernel: block drbd0: wanted = { cs:TearDown ro:Primary/Unknown ds:Inconsistent/Outdated r----- } Apr 2 00:41:05 server7 kernel: drbd vDrbd: State change failed: Need access to UpToDate data Apr 2 00:41:05 server7 kernel: drbd vDrbd: mask = 0x1e1f0 val = 0xa070 Apr 2 00:41:05 server7 kernel: drbd vDrbd: old_conn:WFReportParams wanted_conn:TearDown Apr 2 00:41:06 server7 kernel: block drbd0: State change failed: Need access to UpToDate data Apr 2 00:41:06 server7 kernel: block drbd0: state = { cs:SyncTarget ro:Primary/Secondary ds:Inconsistent/UpToDate r----- } Apr 2 00:41:06 server7 kernel: block drbd0: wanted = { cs:TearDown ro:Primary/Unknown ds:Inconsistent/DUnknown s---F- } Apr 2 00:41:06 server7 kernel: drbd vDrbd: State change failed: Need access to UpToDate data Apr 2 00:41:06 server7 kernel: drbd vDrbd: mask = 0x1f0 val = 0x70 Apr 2 00:41:06 server7 kernel: drbd vDrbd: old_conn:WFReportParams wanted_conn:TearDown Apr 2 00:41:06 server7 kernel: block drbd0: State change failed: Need access to UpToDate data Apr 2 00:41:06 server7 kernel: block drbd0: state = { cs:SyncTarget ro:Primary/Secondary ds:Inconsistent/UpToDate r----- } Apr 2 00:41:06 server7 kernel: block drbd0: wanted = { cs:TearDown ro:Primary/Unknown ds:Inconsistent/Outdated r----- } Apr 2 00:41:06 server7 kernel: drbd vDrbd: State change failed: Need access to UpToDate data Apr 2 00:41:22 server7 kernel: drbd vDrbd: PingAck did not arrive in time. Apr 2 00:41:22 server7 kernel: drbd vDrbd: peer( Secondary -> Unknown ) conn( SyncTarget -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) Apr 2 00:41:22 server7 kernel: block drbd0: helper command: /sbin/drbdadm pri-on-incon-degr minor-0 Apr 2 00:41:22 server7 kernel: block drbd0: helper command: /sbin/drbdadm pri-on-incon-degr minor-0 exit code 0 (0x0) Apr 2 00:41:22 server7 kernel: drbd vDrbd: ack_receiver terminated Apr 2 00:41:22 server7 kernel: drbd vDrbd: Terminating drbd_a_vDrbd Apr 2 00:41:22 server7 kernel: drbd vDrbd: Connection closed Apr 2 00:41:22 server7 kernel: drbd vDrbd: conn( NetworkFailure -> Unconnected ) Apr 2 00:41:22 server7 kernel: drbd vDrbd: receiver terminated Apr 2 00:41:22 server7 kernel: drbd vDrbd: Restarting receiver thread Apr 2 00:41:22 server7 kernel: drbd vDrbd: receiver (re)started Apr 2 00:41:22 server7 kernel: drbd vDrbd: conn( Unconnected -> WFConnection ) *Apr 2 00:41:22 server7 kernel: drbd vDrbd: Not fencing peer, I'm not even Consistent myself.* Apr 2 00:41:22 server7 kernel: drbd vDrbd: susp( 1 -> 0 ) Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor remote data, sector 0+0 Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor remote data, sector 344936+8 Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5 writing to log Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor remote data, sector 344944+24 Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5 writing to log Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor remote data, sector 0+0 Apr 2 00:41:22 server7 kernel: block drbd0: IO ERROR: neither local nor remote data, sector 344968+8 Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5 writing to log Apr 2 00:41:22 server7 kernel: Buffer I/O error on dev dm-0, logical block 66218, lost async page write Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5 writing to log Apr 2 00:41:22 server7 kernel: GFS2: fsid=vCluster:vGFS2.1: Error -5 writing to log DRBD state on surviving node server7 --------------------------------------------------------------- version: 8.4.9-1 (api:1/proto:86-101) GIT-hash: 9976da086367a2476503ef7f6b13d4567327a280 build by akemi at Build64R7, 2016-12-04 01:08:48 0: cs:WFConnection ro:Primary/Unknown ds:Inconsistent/DUnknown C r----- ns:3414 nr:1438774 dw:1441849 dr:72701144 al:25 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:29623116 Question: ------------------ Are these serious in nature? When crashed node comes UP again and joins cluster will it cause any problem? How this can be avoided if a node crashes before sync completes? Env: --------- CentOS 7.3 DRBD 8.4 gfs2-utils-3.1.9-3.el7.x86_64 Pacemaker 1.1.15-11.el7_3.4 Pacemaker: --------------------- [root at server7 ~]# pcs status Cluster name: vCluster Stack: corosync Current DC: server7ha (version 1.1.15-11.el7_3.4-e174ec8) - partition with quorum Last updated: Sun Apr 2 01:01:43 2017 Last change: Sun Apr 2 00:28:39 2017 by root via cibadmin on server4ha 2 nodes and 9 resources configured Online: [ server7ha ] OFFLINE: [ server4ha ] Full list of resources: vCluster-VirtualIP-10.168.10.199 (ocf::heartbeat:IPaddr2): Started server7ha vCluster-Stonith-server7ha (stonith:fence_ipmilan): Stopped vCluster-Stonith-server4ha (stonith:fence_ipmilan): Started server7ha Clone Set: dlm-clone [dlm] Started: [ server7ha ] Stopped: [ server4ha ] Clone Set: clvmd-clone [clvmd] Started: [ server7ha ] Stopped: [ server4ha ] Master/Slave Set: drbd_data_clone [drbd_data] Masters: [ server7ha ] Stopped: [ server4ha ] Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled [root at server7 ~]# Attaching DRBD config files. --Raman -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170402/f32b3688/attachment.htm> -------------- next part -------------- A non-text attachment was scrubbed... Name: global_common.conf Type: application/octet-stream Size: 367 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170402/f32b3688/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: vDrbd.res Type: application/octet-stream Size: 629 bytes Desc: not available URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170402/f32b3688/attachment-0001.obj>