[DRBD-user] SCSI errors when SyncSourcing

rixed at happyleptic.org rixed at happyleptic.org
Fri Nov 25 12:08:36 CET 2005

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


-[ Thu, Nov 10, 2005 at 11:30:49AM +0100, Lars Ellenberg ]----
> [ suggestion of a test involving dd + netcat ]
> if that triggers similar behaviour, your scsi/dma whatever is broken.

I've just run this test, and it performed perfectly well without
a single error.

So let's recap the situation.

(I) The situation
-----------------

- s2g1 is the scsi host. It has 3 disks, here are the partitions :

Disk /dev/sda: 73.5 GB, 73557090304 bytes
255 heads, 63 sectors/track, 8942 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
	/dev/sda1   *         1       261   2096451   83  Linux
	/dev/sda2           262       392   1052257+  82  Linux swap
	/dev/sda3           393      8942  68677875   83  Linux

Disk /dev/sdb: 73.5 GB, 73557090304 bytes
255 heads, 63 sectors/track, 8942 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
	/dev/sdb1             1      8942  71826583+  83  Linux

Disk /dev/sdc: 73.5 GB, 73557090304 bytes
255 heads, 63 sectors/track, 8942 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
	/dev/sdc1             1      8942  71826583+  83  Linux

The 3 disks are identicals. sda3 sdb1 and sdc1 are duplicated
with DRBD.


- s2g2 is the IDE host. It has also 3 larger disks, here are the
partitions :

Disk /dev/hda: 400.0 GB, 400088457216 bytes
255 heads, 63 sectors/track, 48641 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
	/dev/hda1   *         1        13    104391   83  Linux
	/dev/hda2            14      8563  68677875   83  Linux
	/dev/hda3          8564     48641 321926535   83  Linux

Disk /dev/hdb: 400.0 GB, 400088457216 bytes
255 heads, 63 sectors/track, 48641 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
	/dev/hdb1             1      8942  71826583+  83  Linux
	/dev/hdb2          8943     48641 318882217+  83  Linux

Disk /dev/hdc: 400.0 GB, 400088457216 bytes
255 heads, 63 sectors/track, 48641 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
	/dev/hdc1             1      8942  71826583+  83  Linux
	/dev/hdc2          8943      9007    522112+  82  Linux swap
	/dev/hdc3          9008     48641 318360105   83  Linux

the 3 disks are also identicals. hda2, hdb1 and hdc1 are the images of
sda3, sdb1 and sdc1.

Here is the drbd.conf file, identical on both hosts :

----[ drbd.conf ]----

resource dba {
  protocol C;
  incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   pass_on;
  }
  syncer {
    rate 10M;
    group 1;
    al-extents 257;
  }
  on s2g1 {
    device     /dev/drbd0;
    disk       /dev/sda3;
    address    192.168.1.221:7788;
    meta-disk  internal;
  }
  on s2g2 {
    device    /dev/drbd0;
    disk      /dev/hda2;
    address   192.168.1.222:7788;
    meta-disk internal;
  }
}

resource dbb {
  protocol C;
  incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   pass_on;
  }
  syncer {
    rate 10M;
    group 1;
    al-extents 257;
  }
  on s2g1 {
    device     /dev/drbd1;
    disk       /dev/sdb1;
    address    192.168.1.221:7789;
    meta-disk  internal;
  }
  on s2g2 {
    device    /dev/drbd1;
    disk      /dev/hdb1;
    address   192.168.1.222:7789;
    meta-disk internal;
  }
}

resource dbc {
  protocol C;
  incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
  startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
  disk {
    on-io-error   pass_on;
  }
  syncer {
    rate 10M;
    group 1;
    al-extents 257;
  }
  on s2g1 {
    device     /dev/drbd2;
    disk       /dev/sdc1;
    address    192.168.1.221:7790;
    meta-disk  internal;
  }
  on s2g2 {
    device    /dev/drbd2;
    disk      /dev/hdc1;
    address   192.168.1.222:7790;
    meta-disk internal;
  }
}

----[ EOF ]-----

The given IPs are correct.


(II) Various Troubles Observed
------------------------------

The first strange behaviour observed was that the hosts were unable to
synchronize from s2g1 to s2g2 : very quickly, many SCSI driver errors
were displayed on the console, and soon the system hangs.

Synchronisation on the other way round works very well. So s2g2 was the
primary that I used for testing my application (a database like
application), and everything run fine untill I tried to switch the
hosts.

I unmounted /dev/drbd{0,1,2} and run "drbdadm secondary all" on
s2g2, then "drbdadm primary all" on s2g1, and mounted the
/dev/drbd{0,1,2}. The files were there, but the root inode was corrupted
on /dev/drbd2 (On other tests, it was on other devices, even on all
three devices together). By corrupted, I mean that it was impossible to
create anything on this file-system, the 'df' command shows wrong
statistics (like 101% full), and the root directory has only 1 link.

e2fsck solved the problem without problem each time this occured.

(III) What I already tried
--------------------------

To see if it's a question of the meta-data overlapping the file-system I
tried with repartitionning everything on both hosts to have a dedicated
256Mb partition on each drive for each 'meta-disk'. This changed
nothing (same synchronisation errors, same root inode corruption).

As the disk geometry are differents from one hosts to the other, and
that the partition size may differ slightly due to this, I would like to
specify the disk size but cannot find a way to do this. Anyway, drbd
logs looks OK :

drbd0: size = 65 GB (68546800 KB)
drbd1: size = 68 GB (71695508 KB)
drbd2: size = 68 GB (71695508 KB)

On both hosts.


What should I check/could I try now ??






More information about the drbd-user mailing list