[DRBD-user] drbd on ramdisks

Wed Dec 8 14:29:42 CET 2010

On Tue, 2010-12-07 at 14:47 -0200, Andre Nathan wrote:
> It seems the problem only occurs when you have resources on physical
> volumes and on ram disks simultaneously. I can create these resources as
> usual, but after a reboot the resource on the ramdisk stops working (as
> expected, I guess, because the metadata was lost on the reboot) but then
> drbdadm create-md results in the error I mentioned.

I tried working around this by using tmpfs. I created a large file in a
tmpfs volume, and used losetup to map the file to a device. Then I
created a DRBD volume on /dev/loop0.

I can bring up the resource fine, but it breaks during synchronization:

Dec  8 10:26:58 wcluster1 kernel: [ 1699.000927] block drbd2: Handshake
successful: Agreed network protocol version 94
Dec  8 10:26:58 wcluster1 kernel: [ 1699.000944] block drbd2:
conn( WFConnection -> WFReportParams ) 
Dec  8 10:26:58 wcluster1 kernel: [ 1699.000976] block drbd2: Starting
asender thread (from drbd2_receiver [2161])
Dec  8 10:26:58 wcluster1 kernel: [ 1699.001098] block drbd2:
data-integrity-alg: <not-used>
Dec  8 10:26:58 wcluster1 kernel: [ 1699.001191] block drbd2:
drbd_sync_handshake:
Dec  8 10:26:58 wcluster1 kernel: [ 1699.001198] block drbd2: self
D608531CD3061EE3:0000000000000004:0000000000000000:0000000000000000
bits:262127 flags:0
Dec  8 10:26:58 wcluster1 kernel: [ 1699.001204] block drbd2: peer
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:262127 flags:4
Dec  8 10:26:58 wcluster1 kernel: [ 1699.001209] block drbd2:
uuid_compare()=2 by rule 30
Dec  8 10:26:58 wcluster1 kernel: [ 1699.001212] block drbd2: Becoming
sync source due to disk states.
Dec  8 10:26:58 wcluster1 kernel: [ 1699.001216] block drbd2: Writing
the whole bitmap, full sync required after drbd_sync_handshake.
Dec  8 10:26:58 wcluster1 kernel: [ 1699.001380] block drbd2: 1024 MB
(262127 bits) marked out-of-sync by on disk bit-map.
Dec  8 10:26:58 wcluster1 kernel: [ 1699.001427] block drbd2:
peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS )
pdsk( Outdated -> Inconsistent ) 
Dec  8 10:26:58 wcluster1 kernel: [ 1699.241078] block drbd2:
conn( WFBitMapS -> SyncSource ) 
Dec  8 10:26:58 wcluster1 kernel: [ 1699.241098] block drbd2: Began
resync as SyncSource (will sync 1048508 KB [262127 bits set]).
Dec  8 10:27:19 wcluster1 kernel: [ 1719.729138] block drbd2: read:
error=-28 s=523520s
Dec  8 10:27:19 wcluster1 kernel: [ 1719.729148] block drbd2: Resync
aborted.
Dec  8 10:27:19 wcluster1 kernel: [ 1719.729156] block drbd2:
conn( SyncSource -> Connected ) disk( UpToDate -> Failed ) 
Dec  8 10:27:19 wcluster1 kernel: [ 1719.729165] block drbd2: Local IO
failed in drbd_endio_read_sec_final.Detaching...
Dec  8 10:27:19 wcluster1 kernel: [ 1719.729223] block drbd2: read:
error=-28 s=523584s
Dec  8 10:27:19 wcluster1 kernel: [ 1719.729235] block drbd2: read:
error=-28 s=523648s
Dec  8 10:27:19 wcluster1 kernel: [ 1719.729241] receive_DataRequest: 14
callbacks suppressed
Dec  8 10:27:19 wcluster1 kernel: [ 1719.729246] block drbd2: Can not
satisfy peer's read request, no local data.

Then I get lots of error=-28 messages, and DRDB goes into the DiskLess
state.

Dec  8 10:27:21 wcluster1 kernel: [ 1721.880693] block drbd2: helper
command: /sbin/drbdadm pri-on-incon-degr minor-2
Dec  8 10:27:21 wcluster1 kernel: [ 1721.884967] block drbd2: helper
command: /sbin/drbdadm pri-on-incon-degr minor-2 exit code 0 (0x0)
Dec  8 10:27:21 wcluster1 kernel: [ 1721.884989] block drbd2:
disk( Failed -> Diskless ) 
Dec  8 10:27:21 wcluster1 kernel: [ 1721.885224] block drbd2: Notified
peer that my disk is broken.
Dec  8 10:27:21 wcluster1 kernel: [ 1721.887452] block drbd2: in
got_BlockAck:4348: rs_pending_cnt = -1 < 0 !

This last message is repeated many times with varying rs_pending_cnt.

Is there a way to get a better clue about what's going on?

Thanks in advance,
Andre