[DRBD-user] Re: DRBD with disk failure.

Brent A Nelson brent at phys.ufl.edu
Tue Jul 25 20:49:40 CEST 2006

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Some additional info: the mkfs is still hung and a subsequent attempt also 
hung.  A short dd to the device did not hang, but it completed far too 
quickly and showed no activity on the secondary.  A longer dd did hang.

The machine has three stuck processes and top shows that the machine is 
in 100% wait.

All 6 drbd devices have LVM logical volumes for their backing store (I 
used logical volumes so that the block devices wouldn't get reordered by 
the system if a disk disappeared; perhaps there's a better way).  3 disks 
are secondary for the other machine, and 3 disks are primary.

Could this be an issue with drbd on LVM? Or maybe something that's fixed 
by a newer drbd version? A bug when compiled with gcc-3.4, maybe? Is there 
anything I should try to help diagnose the situation before I attempt to 
recover (these machines are not yet in production, so I can wait a bit, if 
needed)?

Thanks,

Brent

On Mon, 24 Jul 2006, Brent A Nelson wrote:

> I experienced a disk failure today when doing mkfs on one of 6 drbd devices, 
> which resulted in the process getting stuck in the "D" state.
>
> dmesg shows a series of SCSI errors and then the following on the primary:
>
> drbd3: drbd_md_sync_page_io(,390455306,WRITE) failed!
> drbd3: Notified peer that my disk is broken.
>
> The secondary went to the "ServerForDLess" state and the primary went to 
> "DiskLessClient".
>
> This all seems like a normal drbd response, right? But, although I think I 
> can read from the device (read attempts don't report any errors, and the 
> secondary drbd processes seem to be busy serving data when I attempt a read), 
> I can't seem to write to it.  I imagine if I switch the secondary over to 
> primary all will be well, but the primary should be able to pass both reads 
> and writes to the secondary in the event of its own disk failing, correct?
>
> Is there something I'm doing wrong or a bug in my drbd (version 0.7.15 in 
> Ubuntu Dapper but running a 2.6.12 kernel)?
>
> Thanks,
>
> Brent Nelson
> Director of Computing
> Dept. of Physics
> University of Florida
>




More information about the drbd-user mailing list