[DRBD-user] DRBD starts too long (in about of 1,5 minutes).

Igor Yu. Zhbanov bsg at uniyar.ac.ru
Sun Feb 26 05:35:55 CET 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello!

I am using DRBD v0.7.16 on top of set of software RAIDs:

md8 = raid1 (sdc1, (md6 = raid1 (sdb1, sda1))); // DRBD-meta
md9 = raid1 (sdc2, (md7 = raid0 (sda2, sdb2))); // DRBD-data
The size of DRBD-data is about 400 GB.

Drives sda and sdb are plugged into onboard SATA controller and sdc is plugged
in external PCI SATA controller.  This is done to be able  to hot remove drive
from the system to process data on another server. Since modules handling SATA
controllers are different  for internal and  external controllers  in our case
(sata_promise vs sata_sil),  it is possible  to do a hot swap  of the drive by
unloading sata_sil module.  And after processing data  we just plug drive back
and make  full RAID  synchronization,  so we don't afraid  to mount  removable
drive on another server read/write and make it out of sync.

If it is important  filesystem on drbd device  was recently formatted and last
synchronization  was relatively long ago,  so there are must be  much data out
of sync.

That was a preface. I just don't know whether these details are significant or
not.

The story is that (at least when DRBD on second node is shut down) DRBD starts
in about  of 1,5 minutes.  This is suspiciously long.  DRBD's init-script just
displays
Starting DRBD resources:    [ d0
and you  must to wait  about 1.5 minutes  before it continues. After that time
everything is ok.

Looking in /var/log/messages I see following (output of start process):

Feb 26 05:56:48 fileserv1 kernel: drbd: initialised. Version: 0.7.16 (api:77/proto:74)
Feb 26 05:56:48 fileserv1 kernel: drbd: SVN Revision: 2066 build by root at fileserv1, 2006-02-15 02:04:23
Feb 26 05:56:48 fileserv1 kernel: drbd: registered as block device major 147
Feb 26 05:56:48 fileserv1 kernel: klogd 1.4.1, ---------- state change ----------
Feb 26 05:56:49 fileserv1 kernel: drbd0: resync bitmap: bits=95217088 words=2975534
Feb 26 05:56:49 fileserv1 kernel: drbd0: size = 363 GB (380868352 KB)
######### ^^^^^ A LOOOONG DELAY ##############################################
Feb 26 05:58:22 fileserv1 kernel: drbd0: 363 GB marked out-of-sync by on disk bit-map.
Feb 26 05:58:22 fileserv1 kernel: drbd0: Found 6 transactions (324 active extents) in activity log.
Feb 26 05:58:22 fileserv1 kernel: drbd0: drbdsetup [13313]: cstate Unconfigured --> StandAlone
Feb 26 05:58:23 fileserv1 kernel: drbd0: drbdsetup [13327]: cstate StandAlone --> Unconnected
Feb 26 05:58:23 fileserv1 kernel: drbd0: drbd0_receiver [13328]: cstate Unconnected --> WFConnection

Than I have compiled DRBD  with maximum debug options enabled  as I could found
in drbd_config.h. Here is output of how it starts from /var/log/messages:

Feb 26 06:01:27 fileserv1 kernel: drbd: initialised. Version: 0.7.16 (api:77/proto:74)
Feb 26 06:01:27 fileserv1 kernel: drbd: SVN Revision: 2066 build by root at fileserv1, 2006-02-26 06:00:38
Feb 26 06:01:27 fileserv1 kernel: drbd: registered as block device major 147
Feb 26 06:01:27 fileserv1 kernel: klogd 1.4.1, ---------- state change ----------
Feb 26 06:01:28 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,0,<NULL>)
Feb 26 06:01:28 fileserv1 kernel: drbd0: drbd_bm_resize: bm_set=95217088
Feb 26 06:01:28 fileserv1 kernel: drbd0: bm[2975533]=0xFFFFFFFF
Feb 26 06:01:28 fileserv1 kernel: drbd0: resync bitmap: bits=95217088 words=2975534
Feb 26 06:01:28 fileserv1 kernel: drbd0: size = 363 GB (380868352 KB)
Feb 26 06:01:28 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,72,<NULL>)
Feb 26 06:01:28 fileserv1 kernel: drbd0: read_sect: sector=0 offset=0 num_words=128
Feb 26 06:01:28 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,73,<NULL>)
Feb 26 06:01:28 fileserv1 kernel: drbd0: read_sect: sector=1 offset=128 num_words=128
Feb 26 06:01:28 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,74,<NULL>)
Feb 26 06:01:28 fileserv1 kernel: drbd0: read_sect: sector=2 offset=256 num_words=128
   ... SKIPPED ... SKIPPED ... SKIPPED ... SKIPPED ... SKIPPED ... SKIPPED ...
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,23317,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: read_sect: sector=23245 offset=2975360 num_words=128
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,23318,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: read_sect: sector=23246 offset=2975488 num_words=46
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbd_bm_set_lel: bm_set=95217088
Feb 26 06:03:01 fileserv1 kernel: drbd0: bm[2975533]=0xFFFFFFFF
Feb 26 06:03:01 fileserv1 kernel: drbd0: 363 GB marked out-of-sync by on disk bit-map.
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,0,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,8,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,9,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,10,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,11,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,12,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,13,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,12,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,13,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,8,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,9,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,10,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]:drbd_md_sync_page_io(,11,<NULL>)
Feb 26 06:03:01 fileserv1 kernel: drbd0: Found 6 transactions (324 active extents) in activity log.
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13769]: cstate Unconfigured --> StandAlone
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbdsetup [13785]: cstate StandAlone --> Unconnected
Feb 26 06:03:01 fileserv1 kernel: drbd0: drbd0_receiver [13786]: cstate Unconnected --> WFConnection

So,  I don't know  what drbd_md_sync_page_io  and  read_sect means  but  23246
sectors is about of 11 MB (provided the sector size of 512 bytes). I think, it
is too slow  to read or write 11 MB for 1,5 minutes.  I hope that here happens
something else.

And the question is what is it and how to fix? :-)
Is it because of RAID or because of unavailable second node  and much data out
of sync?

Thanks.




More information about the drbd-user mailing list