[DRBD-user] It goes on with error got NegRSDreply for the second device

Fri May 18 12:54:50 CEST 2007

Hi,I'm new to the list and I'm using DRBD v 0.7.23 on Linux Gentoo (1
node (primary)kernel 2.6.17-r8 1 node (secondary) kernel 2.6.18).
I want to do High Availability for Apache and Postgresql. First I've
tryed with Apache and it was successful: drbd was working ok with
sincronization and update.
Now I've configured another device drbd1 to store postgresql data,but
still i haven't been able to end the part of sincronization. I explain
better :
first node cat /proc/partitions
major minor  #blocks  name

   3     0   20010816 hda
   3     1      37768 hda1
   3     2     506520 hda2
   3     3    7174440 hda3
   3     4          1 hda4
   3     5    3069328 hda5
   3     6    3069328 hda6

second node cat /proc/partitions

major minor  #blocks  name

   3     0   19551168 hda
   3     1      40131 hda1
   3     2     987997 hda2
   3     3    5084572 hda3
   3     4          1 hda4
   3     5    3068383 hda5
   3     6    3068383 hda6

I know the 2 hard disks are different but the first device /dev/drbd0
is on the /dev/hda5 in both nodes and the second /dev/drbd1 is in
/dev/hda6 for both nodes.
my drbd.conf is
resource r0 {
protocol C;
net {
     timeout 15;
}
syncer {
group 0;
rate 5M;
}
on first {
device /dev/drbd0;
disk /dev/hda5;
address 192.168.0.1:7788;
meta-disk internal;
}
on second {
device /dev/drbd0;
disk /dev/hda5;
address 192.168.0.2:7788;
meta-disk internal;
}
}

resource r1 {
protocol C;
net {
     timeout 15;
}
syncer {
group 1;
rate 5M;
}

on first {
device /dev/drbd1;
disk /dev/hda6;
address 192.168.0.1:7789;
meta-disk internal;
}
on second {
device /dev/drbd1;
disk /dev/hda6;
address 192.168.0.2:7789;
meta-disk internal;
}
}

so i run drbdadm up all and then on the primary i execute drbdadm --
--do-what-I-say primary all
the first device sincronize really early (it also have the filesystem
reiserfs on it yet,since it was used first for apache2 documents) and
then the second starts sincronization and after a few i receive on the
second cluster the error
"drbd1 : got negRSDreply. WE ARE LOST ...
and the kernel panic: drbd1 not syncing
I've tried a lot of times with :
- looking after drbdadm up all if in cat /proc/partitions there were
different size for drbd1 in both nodes (it ddoesn't happen now)
- creating a partition in another point of the disk (since i ' ve
looked for the error and seen that it can be caused by error on the
disk) on both nodes
- I've also created a filesystem on both disk to check up if there is
an error on the disk but there's no error on it (then i've created
again the partition unformatted)
- I've tried to change the role of the nodes and sincronise with the
secondary but it doesn't work.
-I've done a resize for drbd1 to see if there was a problem with the
size of the devices (using an advice found on this mayling list when 2
devices didn't connect correctly)
-I've compiled the kernel 2.6.17-r8 on the second node also (but it
didn't get better)
-I've spent almost one day trying and trying with multiple attempts
like that but it doesn't get better.

At a certain poin of sincronization (sometimes at the
beginning,sometimes after the 50%) i receive the
"drbd1 : got negRSDreply. WE ARE LOST. we lost our up-to-date disk."
and at the same time i receive the Kernel panic : drbd1 not syncing.
WE ARE LOST...
on the second node...
and of course everything is blocked so i have to switch off manually
the pcs and start them manually.
Please could you help me with that?? Do you want some more information?
Thanks in advance.
Alberto