[DRBD-user] sync problems when disabling barriers/flushes

Sun Apr 6 00:23:20 CEST 2014

Kernel: 3.2.0
DRBD Utils: 8.3.11

I've been testing to see if I can attain better performance for a
Postgres Database which utilizes DRBD. When I add no-disk-flushes and
no-disk-drain to the disk section of the configuration the following
errors get written to the system log:

Mar 31 16:52:24 ha-portal-1-2 kernel: [6399277.608376] block drbd0: BAD! BarrierAck #89430650 received, expected #89430649!
Mar 31 16:52:24 ha-portal-1-2 kernel: [6399277.616118] block drbd0: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError ) 
Mar 31 16:52:24 ha-portal-1-2 kernel: [6399277.683664] block drbd0: asender terminated
Mar 31 16:52:24 ha-portal-1-2 kernel: [6399277.683674] block drbd0: Terminating drbd0_asender
Mar 31 16:52:24 ha-portal-1-2 kernel: [6399277.683841] block drbd0: Connection closed
Mar 31 16:52:24 ha-portal-1-2 kernel: [6399277.683853] block drbd0: conn( ProtocolError -> Unconnected ) 
Mar 31 16:52:25 ha-portal-1-2 kernel: [6399278.361416] block drbd0: bitmap WRITE of 29532 pages took 170 jiffies
Mar 31 16:52:25 ha-portal-1-2 kernel: [6399278.512469] block drbd0: 32 GB (8413485 bits) marked out-of-sync by on disk bit-map.
Mar 31 16:52:25 ha-portal-1-2 kernel: [6399278.512518] block drbd0: receiver terminated
Mar 31 16:52:25 ha-portal-1-2 kernel: [6399278.512526] block drbd0: Restarting drbd0_receiver
Mar 31 16:52:25 ha-portal-1-2 kernel: [6399278.512533] block drbd0: receiver (re)started
Mar 31 16:52:25 ha-portal-1-2 kernel: [6399278.512545] block drbd0: conn( Unconnected -> WFConnection ) 
Mar 31 16:52:26 ha-portal-1-2 kernel: [6399279.276235] block drbd0: Handshake successful: Agreed network protocol version 96
Mar 31 16:52:26 ha-portal-1-2 kernel: [6399279.276248] block drbd0: conn( WFConnection -> WFReportParams ) 
Mar 31 16:52:26 ha-portal-1-2 kernel: [6399279.276292] block drbd0: Starting asender thread (from drbd0_receiver [25810])

The resync restarts a few times, then errors again, and finally gives
up. I did verify that the output in the /proc/drbd files on both hosts
contains "wo:n". I didn't expect to see any messages about barriers in
this case. I didn't add the no-disk-barrier flag to the configuration
because in it's default state /proc/drbd contains "wo:f".

Does that message really mean that barriers are being used, do I need
to explicitly add the no-disk-barrier option?

Thanks,
Wayne