[DRBD-user] Mount problem at boot on multimaster config

Ervin Hegedüs airween at gmail.com
Sun Sep 1 19:09:10 CEST 2019


Hi there,

I have an up-to-date Debian 10 system, with drbd 8.4.10, and ocfs2
1.8.5.

Looks like everything works, but when a node restarted, then my
/dev/drbd0 device doesn't mounted automatically. After the boot I
can log in to system, and can mount it by hand - the `mount -a`
command takes about 8 seconds...

# time mount -a

real	0m8,173s
user	0m0,009s
sys	0m0,019s

Before I mount the volume by hand, I checked the /proc/drbd, and
I think there is everything well:

# cat /proc/drbd 
version: 8.4.10 (api:1/proto:86-101)
srcversion: 9B4D87C5E865DF526864868 
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:124 nr:16511 dw:16635 dr:646 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

(this is what I see after the system booted and I logged in,
before I mount the device)

Here are the relevant configs:

# cat /etc/drbd.d/r0.res 
resource r0 {
  meta-disk internal;
  device /dev/drbd0;
  syncer {
    verify-alg sha1;
  }
  net {
    allow-two-primaries;
  }
  startup {
    become-primary-on both;
  }
  on t2app1 {
    disk /dev/vg-t2app1/lvdrbd0;
    address 192.168.72.21:7789;
  }
  on t2app2 {
    disk /dev/vg-t2app2/lvdrbd0;
    address 192.168.72.22:7789;
  }
}

(Note, that the configs are same on both nodes)

# cat /etc/ocfs2/cluster.conf 
cluster:
    node_count = 2
    name = data

node:
    ip_port = 7777
    ip_address = 192.168.72.21
    number = 1
    name = t2app1
    cluster = data

node:
    ip_port = 7777
    ip_address = 192.168.72.22
    number = 2
    name = t2app2
    cluster = data

(also the configs are same on both nodes)

# grep drbd /etc/fstab 
/dev/drbd0	/drbd	ocfs2	_netdev,defaults	0	0

on both nodes.

I've found some interesting lines in syslog from the boot
intervall:

# grep "\(o2\|ocfs\|drbd\)" /var/log/syslog


Sep  1 18:47:37 t2app2 systemd[1]: Stopping Mount ocfs2 Filesystems...
Sep  1 18:47:52 t2app2 systemd-modules-load[361]: Inserted module 'drbd'
Sep  1 18:47:52 t2app2 systemd[1]: Starting Load o2cb Modules...
Sep  1 18:47:52 t2app2 kernel: [    3.822815] drbd: initialized.  Version: 8.4.10 (api:1/proto:86-101)
Sep  1 18:47:52 t2app2 kernel: [    3.822816] drbd: srcversion: 9B4D87C5E865DF526864868 
Sep  1 18:47:52 t2app2 kernel: [    3.822817] drbd: registered as block device major 147
Sep  1 18:47:52 t2app2 o2cb[659]: checking debugfs...
Sep  1 18:47:53 t2app2 o2cb[659]: Loading filesystem "configfs": OK
Sep  1 18:47:53 t2app2 o2cb[659]: Mounting configfs filesystem at /sys/kernel/config: mount: /sys/kernel/config: configfs already mounted on /sys/kernel/config.
Sep  1 18:47:53 t2app2 o2cb[659]: Loading stack plugin "o2cb": OK
Sep  1 18:47:53 t2app2 kernel: [    4.544893] ocfs2: Registered cluster interface o2cb
Sep  1 18:47:53 t2app2 o2cb[659]: Loading filesystem "ocfs2_dlmfs": OK
Sep  1 18:47:53 t2app2 o2cb[659]: Mounting ocfs2_dlmfs filesystem at /dlm: OK
Sep  1 18:47:53 t2app2 o2cb[659]: Setting cluster stack "o2cb": OK
Sep  1 18:47:53 t2app2 drbd[696]: Starting DRBD resources:[
Sep  1 18:47:53 t2app2 drbd[696]:      create res: r0
Sep  1 18:47:53 t2app2 drbd[696]:    prepare disk: r0
Sep  1 18:47:53 t2app2 systemd[1]: Found device /dev/drbd0.
Sep  1 18:47:53 t2app2 systemd[1]: Mounting /drbd...
Sep  1 18:47:53 t2app2 mount[726]: mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep  1 18:47:53 t2app2 systemd[1]: drbd.mount: Mount process exited, code=exited, status=1/FAILURE
Sep  1 18:47:53 t2app2 systemd[1]: drbd.mount: Failed with result 'exit-code'.
Sep  1 18:47:53 t2app2 systemd[1]: Failed to mount /drbd.
Sep  1 18:47:53 t2app2 kernel: [    4.581222] drbd r0: Starting worker thread (from drbdsetup-84 [729])
Sep  1 18:47:53 t2app2 drbd[696]:     adjust disk: r0
Sep  1 18:47:53 t2app2 kernel: [    4.581729] block drbd0: disk( Diskless -> Attaching ) 
Sep  1 18:47:53 t2app2 kernel: [    4.581782] drbd r0: Method to ensure write ordering: flush
Sep  1 18:47:53 t2app2 kernel: [    4.581784] block drbd0: max BIO size = 1048576
Sep  1 18:47:53 t2app2 kernel: [    4.581786] block drbd0: drbd_bm_resize called with capacity == 20970808
Sep  1 18:47:53 t2app2 kernel: [    4.581827] block drbd0: resync bitmap: bits=2621351 words=40959 pages=80
Sep  1 18:47:53 t2app2 kernel: [    4.581828] block drbd0: size = 10 GB (10485404 KB)
Sep  1 18:47:53 t2app2 kernel: [    4.583737] block drbd0: recounting of set bits took additional 0 jiffies
Sep  1 18:47:53 t2app2 kernel: [    4.583738] block drbd0: 16 MB (4096 bits) marked out-of-sync by on disk bit-map.
Sep  1 18:47:53 t2app2 kernel: [    4.583741] block drbd0: disk( Attaching -> UpToDate ) 
Sep  1 18:47:53 t2app2 kernel: [    4.583743] block drbd0: attached to UUIDs 1E63CB670138DCE9:0000000000000000:A7B3A04F6A72E7D9:A7B2A04F6A72E7D9
Sep  1 18:47:53 t2app2 kernel: [    4.586632] drbd r0: conn( StandAlone -> Unconnected ) 
Sep  1 18:47:53 t2app2 kernel: [    4.586642] drbd r0: Starting receiver thread (from drbd_w_r0 [731])
Sep  1 18:47:53 t2app2 kernel: [    4.586811] drbd r0: receiver (re)started
Sep  1 18:47:53 t2app2 kernel: [    4.586817] drbd r0: conn( Unconnected -> WFConnection ) 
Sep  1 18:47:53 t2app2 drbd[696]:      adjust net: r0
Sep  1 18:47:53 t2app2 drbd[696]: ]
Sep  1 18:47:53 t2app2 o2cb[659]: Registering O2CB cluster "data": OK
Sep  1 18:47:53 t2app2 o2cb[659]: Setting O2CB cluster timeouts : OK
Sep  1 18:47:53 t2app2 kernel: [    4.634584] o2hb: Heartbeat mode set to local
Sep  1 18:47:53 t2app2 systemd[1]: Started Load o2cb Modules.
Sep  1 18:47:53 t2app2 o2hbmonitor: Starting
Sep  1 18:47:53 t2app2 systemd[1]: Starting Mount ocfs2 Filesystems...
Sep  1 18:47:53 t2app2 ocfs2[794]: Starting Oracle Cluster File System (OCFS2) mount.ocfs2: I/O error on channel while opening device /dev/drbd0
Sep  1 18:47:53 t2app2 ocfs2[794]: Failed
Sep  1 18:47:53 t2app2 systemd[1]: Started Mount ocfs2 Filesystems.
Sep  1 18:47:53 t2app2 kernel: [    5.107027] drbd r0: Handshake successful: Agreed network protocol version 101
Sep  1 18:47:53 t2app2 kernel: [    5.107028] drbd r0: Feature flags enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
Sep  1 18:47:53 t2app2 kernel: [    5.107049] drbd r0: conn( WFConnection -> WFReportParams ) 
Sep  1 18:47:53 t2app2 kernel: [    5.107050] drbd r0: Starting ack_recv thread (from drbd_r_r0 [738])
Sep  1 18:47:53 t2app2 kernel: [    5.153635] block drbd0: drbd_sync_handshake:
Sep  1 18:47:53 t2app2 kernel: [    5.153637] block drbd0: self 1E63CB670138DCE8:0000000000000000:A7B3A04F6A72E7D9:A7B2A04F6A72E7D9 bits:4096 flags:0
Sep  1 18:47:53 t2app2 kernel: [    5.153638] block drbd0: peer DA6EC47089E7218F:1E63CB670138DCE9:A7B3A04F6A72E7D9:A7B2A04F6A72E7D9 bits:1 flags:0
Sep  1 18:47:53 t2app2 kernel: [    5.153639] block drbd0: uuid_compare()=-1 by rule 50
Sep  1 18:47:53 t2app2 kernel: [    5.153643] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) disk( UpToDate -> Outdated ) pdsk( DUnknown -> UpToDate ) 
Sep  1 18:47:53 t2app2 kernel: [    5.154072] block drbd0: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 25(1), total 25; compression: 100.0%
Sep  1 18:47:53 t2app2 kernel: [    5.154125] block drbd0: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 44(1), total 44; compression: 100.0%
Sep  1 18:47:53 t2app2 kernel: [    5.154127] block drbd0: conn( WFBitMapT -> WFSyncUUID ) 
Sep  1 18:47:53 t2app2 kernel: [    5.155755] block drbd0: updated sync uuid 1E64CB670138DCE8:0000000000000000:A7B3A04F6A72E7D9:A7B2A04F6A72E7D9
Sep  1 18:47:53 t2app2 kernel: [    5.156150] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
Sep  1 18:47:53 t2app2 kernel: [    5.157047] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Sep  1 18:47:53 t2app2 kernel: [    5.157054] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( Outdated -> Inconsistent ) 
Sep  1 18:47:53 t2app2 kernel: [    5.157067] block drbd0: Began resync as SyncTarget (will sync 16384 KB [4096 bits set]).
Sep  1 18:47:53 t2app2 kernel: [    5.157500] block drbd0: role( Secondary -> Primary ) 
Sep  1 18:47:53 t2app2 drbd[696]: WARN: stdin/stdout is not a TTY; using /dev/console.
Sep  1 18:48:00 t2app2 kernel: [   11.508179] block drbd0: Resync done (total 6 sec; paused 0 sec; 2728 K/sec)
Sep  1 18:48:00 t2app2 kernel: [   11.508183] block drbd0: updated UUIDs DA6EC47089E7218F:0000000000000000:1E64CB670138DCE9:1E63CB670138DCE9
Sep  1 18:48:00 t2app2 kernel: [   11.508186] block drbd0: conn( SyncTarget -> Connected ) disk( Inconsistent -> UpToDate ) 
Sep  1 18:48:00 t2app2 kernel: [   11.508516] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0
Sep  1 18:48:00 t2app2 kernel: [   11.509496] block drbd0: helper command: /sbin/drbdadm after-resync-target minor-0 exit code 0 (0x0)
Sep  1 18:48:58 t2app2 kernel: [   69.610107] o2net: Connected to node t2app1 (num 1) at 192.168.72.21:7777
Sep  1 18:49:02 t2app2 kernel: [   73.701897] o2dlm: Joining domain 34CC3A6762474F669C72F63604E47EE4 
Sep  1 18:49:02 t2app2 kernel: [   73.706853] ocfs2: Mounting device (147,0) on (node 2, slot 1) with ordered data mode.

note, that the last 3 lines (18:48:58 - 18:49:02) comes when I
mounted the device by hand.

Looks like the /dev/drbd0 doesn't exists at boot time, but when I
log in, it builded.

Am I missing something? Or there is some network issue, eg. slow
network...?

This config (which is _same_) works on Debian 9 without any
issue.


Any helps are welcome,


thanks,


a.



More information about the drbd-user mailing list