Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tue, 2010-12-07 at 14:47 -0200, Andre Nathan wrote: > It seems the problem only occurs when you have resources on physical > volumes and on ram disks simultaneously. I can create these resources as > usual, but after a reboot the resource on the ramdisk stops working (as > expected, I guess, because the metadata was lost on the reboot) but then > drbdadm create-md results in the error I mentioned. I tried working around this by using tmpfs. I created a large file in a tmpfs volume, and used losetup to map the file to a device. Then I created a DRBD volume on /dev/loop0. I can bring up the resource fine, but it breaks during synchronization: Dec 8 10:26:58 wcluster1 kernel: [ 1699.000927] block drbd2: Handshake successful: Agreed network protocol version 94 Dec 8 10:26:58 wcluster1 kernel: [ 1699.000944] block drbd2: conn( WFConnection -> WFReportParams ) Dec 8 10:26:58 wcluster1 kernel: [ 1699.000976] block drbd2: Starting asender thread (from drbd2_receiver [2161]) Dec 8 10:26:58 wcluster1 kernel: [ 1699.001098] block drbd2: data-integrity-alg: <not-used> Dec 8 10:26:58 wcluster1 kernel: [ 1699.001191] block drbd2: drbd_sync_handshake: Dec 8 10:26:58 wcluster1 kernel: [ 1699.001198] block drbd2: self D608531CD3061EE3:0000000000000004:0000000000000000:0000000000000000 bits:262127 flags:0 Dec 8 10:26:58 wcluster1 kernel: [ 1699.001204] block drbd2: peer 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:262127 flags:4 Dec 8 10:26:58 wcluster1 kernel: [ 1699.001209] block drbd2: uuid_compare()=2 by rule 30 Dec 8 10:26:58 wcluster1 kernel: [ 1699.001212] block drbd2: Becoming sync source due to disk states. Dec 8 10:26:58 wcluster1 kernel: [ 1699.001216] block drbd2: Writing the whole bitmap, full sync required after drbd_sync_handshake. Dec 8 10:26:58 wcluster1 kernel: [ 1699.001380] block drbd2: 1024 MB (262127 bits) marked out-of-sync by on disk bit-map. Dec 8 10:26:58 wcluster1 kernel: [ 1699.001427] block drbd2: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Inconsistent ) Dec 8 10:26:58 wcluster1 kernel: [ 1699.241078] block drbd2: conn( WFBitMapS -> SyncSource ) Dec 8 10:26:58 wcluster1 kernel: [ 1699.241098] block drbd2: Began resync as SyncSource (will sync 1048508 KB [262127 bits set]). Dec 8 10:27:19 wcluster1 kernel: [ 1719.729138] block drbd2: read: error=-28 s=523520s Dec 8 10:27:19 wcluster1 kernel: [ 1719.729148] block drbd2: Resync aborted. Dec 8 10:27:19 wcluster1 kernel: [ 1719.729156] block drbd2: conn( SyncSource -> Connected ) disk( UpToDate -> Failed ) Dec 8 10:27:19 wcluster1 kernel: [ 1719.729165] block drbd2: Local IO failed in drbd_endio_read_sec_final.Detaching... Dec 8 10:27:19 wcluster1 kernel: [ 1719.729223] block drbd2: read: error=-28 s=523584s Dec 8 10:27:19 wcluster1 kernel: [ 1719.729235] block drbd2: read: error=-28 s=523648s Dec 8 10:27:19 wcluster1 kernel: [ 1719.729241] receive_DataRequest: 14 callbacks suppressed Dec 8 10:27:19 wcluster1 kernel: [ 1719.729246] block drbd2: Can not satisfy peer's read request, no local data. Then I get lots of error=-28 messages, and DRDB goes into the DiskLess state. Dec 8 10:27:21 wcluster1 kernel: [ 1721.880693] block drbd2: helper command: /sbin/drbdadm pri-on-incon-degr minor-2 Dec 8 10:27:21 wcluster1 kernel: [ 1721.884967] block drbd2: helper command: /sbin/drbdadm pri-on-incon-degr minor-2 exit code 0 (0x0) Dec 8 10:27:21 wcluster1 kernel: [ 1721.884989] block drbd2: disk( Failed -> Diskless ) Dec 8 10:27:21 wcluster1 kernel: [ 1721.885224] block drbd2: Notified peer that my disk is broken. Dec 8 10:27:21 wcluster1 kernel: [ 1721.887452] block drbd2: in got_BlockAck:4348: rs_pending_cnt = -1 < 0 ! This last message is repeated many times with varying rs_pending_cnt. Is there a way to get a better clue about what's going on? Thanks in advance, Andre