[DRBD-user] weird behavior with metadata

Alex Vasilenko aa.vasilenko at gmail.com
Sat Jan 24 15:01:11 CET 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,

Having a strange behavior using drbd 8.4.5. After restart one partition of
two becomes diskless.
Steps to reproduce:

$ sudo service drbd status
>
> drbd driver loaded OK; device status:
> version: 8.4.5 (api:1/proto:86-101)
> GIT-hash: 1d360bde0e095d495786eaeb2a1ac76888e4db96 build by phil at Build64R6,
> 2014-10-28 10:32:53
> m:res     cs          ro                 ds                     p  mounted
>                      fstype
> 0:main    Connected   Primary/Secondary  UpToDate/UpToDate      C
>  /home/www/storage            ext4
> ...       sync'ed:    0.1%               (1048376/1048540)M
> 1:backup  SyncSource  Primary/Secondary  UpToDate/Inconsistent  C
>  /home/www/backup.looplr.com  ext4


Ok, backup is syncing after same fail before. Now trying to restart drbd
service:

> $ sudo service drbd restart
> Stopping all DRBD resources:
> .
> Starting DRBD resources: [
>      create res: backup main
>    prepare disk: backup main
>     adjust disk: backup:failed(apply-al:255) main
>      adjust net: backup main
> ]
> .


And bam, disk is diskless:

> $ sudo service drbd status
> drbd driver loaded OK; device status:
> version: 8.4.5 (api:1/proto:86-101)
> GIT-hash: 1d360bde0e095d495786eaeb2a1ac76888e4db96 build by phil at Build64R6,
> 2014-10-28 10:32:53
> m:res     cs         ro                   ds                     p
>  mounted  fstype
> 0:main    Connected  Secondary/Secondary  UpToDate/UpToDate      C
> 1:backup  Connected  Secondary/Secondary  Diskless/Inconsistent  C


And now the log (backup is drbd1)

> Jan 24 14:42:39 de kernel: block drbd0: role( Primary -> Secondary )

Jan 24 14:42:39 de kernel: block drbd0: bitmap WRITE of 0 pages took 0
> jiffies

Jan 24 14:42:39 de kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by
> on disk bit-map.

Jan 24 14:42:39 de kernel: drbd main: peer( Secondary -> Unknown ) conn(
> Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )

Jan 24 14:42:39 de kernel: drbd main: asender terminated

Jan 24 14:42:39 de kernel: drbd main: Terminating drbd_a_main

Jan 24 14:42:39 de kernel: drbd main: Connection closed

Jan 24 14:42:39 de kernel: drbd main: conn( Disconnecting -> StandAlone )

Jan 24 14:42:39 de kernel: drbd main: receiver terminated

Jan 24 14:42:39 de kernel: drbd main: Terminating drbd_r_main

Jan 24 14:42:39 de kernel: block drbd0: disk( UpToDate -> Failed )

Jan 24 14:42:39 de kernel: block drbd0: bitmap WRITE of 0 pages took 0
> jiffies

Jan 24 14:42:39 de kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by
> on disk bit-map.

Jan 24 14:42:39 de kernel: block drbd0: disk( Failed -> Diskless )

Jan 24 14:42:39 de kernel: block drbd0: drbd_bm_resize called with capacity
> == 0

Jan 24 14:42:39 de kernel: drbd main: Terminating drbd_w_main

Jan 24 14:42:39 de kernel: block drbd1: role( Primary -> Secondary )

Jan 24 14:42:39 de kernel: drbd backup: peer( Secondary -> Unknown ) conn(
> SyncSource -> Disconnecting )

Jan 24 14:42:39 de kernel: drbd backup: asender terminated

Jan 24 14:42:39 de kernel: drbd backup: Terminating drbd_a_backup

Jan 24 14:42:39 de kernel: drbd backup: Connection closed

Jan 24 14:42:39 de kernel: drbd backup: conn( Disconnecting -> StandAlone )

Jan 24 14:42:39 de kernel: drbd backup: receiver terminated

Jan 24 14:42:39 de kernel: drbd backup: Terminating drbd_r_backup

Jan 24 14:42:39 de kernel: block drbd1: disk( UpToDate -> Failed )

Jan 24 14:42:39 de kernel: block drbd1: bitmap WRITE of 0 pages took 1
> jiffies

Jan 24 14:42:39 de kernel: block drbd1: 1024 GB (268383732 bits) marked
> out-of-sync by on disk bit-map.

Jan 24 14:42:39 de kernel: block drbd1: disk( Failed -> Diskless )

Jan 24 14:42:39 de kernel: block drbd1: drbd_bm_resize called with capacity
> == 0

Jan 24 14:42:39 de kernel: drbd backup: Terminating drbd_w_backup

Jan 24 14:42:39 de kernel: drbd: module cleanup done.

Jan 24 14:42:39 de kernel: drbd: events: mcg drbd: 2

Jan 24 14:42:39 de kernel: drbd: initialized. Version: 8.4.5
> (api:1/proto:86-101)

Jan 24 14:42:39 de kernel: drbd: GIT-hash:
> 1d360bde0e095d495786eaeb2a1ac76888e4db96 build by phil at Build64R6,
> 2014-10-28 10:32:53

Jan 24 14:42:39 de kernel: drbd: registered as block device major 147

Jan 24 14:42:39 de kernel: drbd main: Starting worker thread (from
> drbdsetup-84 [29262])

Jan 24 14:42:39 de kernel: block drbd0: disk( Diskless -> Attaching )

Jan 24 14:42:39 de kernel: drbd main: Method to ensure write ordering: flush

Jan 24 14:42:39 de kernel: block drbd0: max BIO size = 327680

Jan 24 14:42:39 de kernel: block drbd0: drbd_bm_resize called with capacity
> == 1825502744

Jan 24 14:42:39 de kernel: block drbd0: resync bitmap: bits=228187843
> words=3565436 pages=6964

Jan 24 14:42:39 de kernel: block drbd0: size = 870 GB (912751372 KB)



Jan 24 14:42:40 de kernel: block drbd0: recounting of set bits took
> additional 12 jiffies

Jan 24 14:42:40 de kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by
> on disk bit-map.

Jan 24 14:42:40 de kernel: block drbd0: disk( Attaching -> UpToDate )

Jan 24 14:42:40 de kernel: block drbd0: attached to UUIDs
> C0027C4A2E905388:0000000000000000:2CE80910F236732B:2CE70910F236732B

Jan 24 14:42:40 de kernel: drbd backup: Starting worker thread (from
> drbdsetup-84 [29265])

Jan 24 14:42:40 de kernel: drbd backup: conn( StandAlone -> Unconnected )

Jan 24 14:42:40 de kernel: drbd backup: Starting receiver thread (from
> drbd_w_backup [29267])

Jan 24 14:42:40 de kernel: drbd backup: receiver (re)started

Jan 24 14:42:40 de kernel: drbd backup: conn( Unconnected -> WFConnection )

Jan 24 14:42:40 de kernel: drbd main: conn( StandAlone -> Unconnected )

Jan 24 14:42:40 de kernel: drbd main: Starting receiver thread (from
> drbd_w_main [29264])

Jan 24 14:42:40 de kernel: drbd main: receiver (re)started

Jan 24 14:42:40 de kernel: drbd main: conn( Unconnected -> WFConnection )

Jan 24 14:42:40 de kernel: drbd main: Handshake successful: Agreed network
> protocol version 101

Jan 24 14:42:40 de kernel: drbd main: Agreed to support TRIM on protocol
> level

Jan 24 14:42:40 de kernel: drbd main: conn( WFConnection -> WFReportParams )

Jan 24 14:42:40 de kernel: drbd main: Starting asender thread (from
> drbd_r_main [29272])

Jan 24 14:42:40 de kernel: drbd backup: Handshake successful: Agreed
> network protocol version 101

Jan 24 14:42:40 de kernel: drbd backup: Agreed to support TRIM on protocol
> level

Jan 24 14:42:40 de kernel: drbd backup: conn( WFConnection ->
> WFReportParams )

Jan 24 14:42:40 de kernel: drbd backup: Starting asender thread (from
> drbd_r_backup [29269])

Jan 24 14:42:40 de kernel: block drbd1: max BIO size = 4096

Jan 24 14:42:40 de kernel: block drbd0: drbd_sync_handshake:

Jan 24 14:42:40 de kernel: block drbd1: peer( Unknown -> Secondary ) conn(
> WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent )

Jan 24 14:42:40 de kernel: block drbd0: self
> C0027C4A2E905388:0000000000000000:2CE80910F236732B:2CE70910F236732B bits:0
> flags:0

Jan 24 14:42:40 de kernel: block drbd0: peer
> C0027C4A2E905388:0000000000000000:2CE80910F236732A:2CE70910F236732B bits:0
> flags:0

Jan 24 14:42:40 de kernel: block drbd0: uuid_compare()=0 by rule 40

Jan 24 14:42:40 de kernel: block drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )


>From this log I don't understand why does it fail to restart. Now trying to
restart backup partition:

> $ sudo drbdadm down backup
> $ sudo drbdadm up backup
> strange bm_offset -65608 (expected: -65600)
> No valid meta data found
> Command 'drbdmeta 1 v08 /dev/sda3 internal apply-al' terminated with exit
> code 255


Somehow metadata is not there anymore (?!). Note warning saying "strange
bm_offset -65608 (expected: -65600)".
Ok, how about recreate metadata and resync full partition? Trying to wipe
old metadata shows that metadata is not present. Ok, let's pretend it's not
there:

> $ sudo drbdadm create-md backup
> strange bm_offset -65608 (expected: -65600)
> strange bm_offset -65608 (expected: -65600)
> md_offset 1099511623680
> al_offset 1099511590912
> bm_offset 1099478036480
> Found ext3 filesystem
>   1073709016 kB data area apparently used
>   1073709020 kB left usable by current configuration
> Even though it looks like this would place the new meta data into
> unused space, you still need to confirm, as this is only a guess.
> Do you want to proceed?
> [need to type 'yes' to confirm] yes

initializing activity log
> NOT initializing bitmap
> Writing meta data...
> New drbd meta data block successfully created.
> success


Same warning twice + possible wrong fs detection (in fact it's ext4) and
incorrect data usage detection (in fact takes 533GB of 1Tb partition)
Wiping and creating metadata again fixes bm_offset warning until next drbd
service restart.

Having latest Centos 6.5. Kernel 2.6.32-504.1.3.el6.x86_64
Disks are based on hardware RAID 1.

$ sudo fdisk -l
> Disk /dev/sda: 3000.0 GB, 3000034656256 bytes
> 255 heads, 63 sectors/track, 364733 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x0005a176
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1               1        1567    12582912+  83  Linux
> /dev/sda2            1567        1633      524288+  83  Linux
> /dev/sda3            1633      135307  1073741824+  83  Linux
> /dev/sda4          135307      364734  1842868224    f  W95 Ext'd (LBA)
> /dev/sda5          135308      251098   930086912+  83  Linux
> /dev/sda6          251098      364734   912779264   83  Linux


backup uses /dev/sda3 and main uses /dev/sda6

Can someone hint me what's wrong am I doing? Will provide any additional
info on request.

Thanks,
Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20150124/93aaa624/attachment.htm>


More information about the drbd-user mailing list