[DRBD-user] b->n_req != set_size

Tom Brown brown at esteem.com
Sat Mar 8 23:27:04 CET 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hardware: 2 Dell PowerEdge SC1435's
Mirrored Drive: 1TB Hitachi HUA72101 SATA
OS: Debian Etch 4.0r3
Kernel: vanilla kernel 2.6.24.3
DRBD: 8.2.5
Heartbeat: 2.1.3

I had drbd and heartbeat up and running. I did initial tests of the
fail-over and mirroring. Everything worked as expected. Then I attached
an external drive via firewire to a SIIG firewire card in the primary
node. I mounted the external drive on /backup. The /dev/drbd0 device is
mounted on /ha. Then I issued the following command at 17:50 and left
for the night:

tar cf /ha/fullbackup.tar /backup/ha

The /backup/ha directory contains 334GB of data. When I came in to work
this moring, I issued an 'ls -lh /ha' command and it hung. I checked
syslog and found this:

Mar  7 20:03:02 fs01 kernel: drbd0: FIXME (barrier_acked but pending)
f6af0688 W L-coNp-s-- 82821 (621446208s +4096) Connected
Mar  7 20:03:02 fs01 kernel: drbd0: ASSERT( b->n_req == set_size )
in /usr/src/drbd-8.2.5/drbd/drbd_main.c:238
Mar  7 20:03:02 fs01 kernel: drbd0: b->n_req = 592
in /usr/src/drbd-8.2.5/drbd/drbd_main.c:246
Mar  7 20:03:02 fs01 kernel: drbd0: set_size = 591
in /usr/src/drbd-8.2.5/drbd/drbd_main.c:247

Any access to /ha hangs. The tar command is hung. I found a post from
January 10, 2005 that the second line in the log is nothing to worry
about. I looked in drbd_main.c and didn't see anything that indicated a
major problem. It looks like it just reports the sizes of b->n_req and
set_size when they are not equal.

What does it mean when b->n_req != set_size? Is this an indicator of why
the drbd0 device is not accessible anymore? Or is the first line from
the log above (where it says FIXME) an indicator of a bigger problem? 

I had to restart the primary. I could access the drbd0 device after it
came back up. I found that the last write to the tar file was at 20:02.
That's about the same time those errors showed up in the log.

Well I found some posts about a Broadcom NetXtreme II BCM5708 NIC with
TOE causing drbd lockups. I am using an onboard Broadcom NetXtreme
BCM5721 NIC without TOE. One of the posts said to try this:

ethtool -K ethX tx off
ethtool -K ethX rx off

Which I did and tried it again. This time it worked. Did I fix the
problem, or just get lucky? Any ideas?

Thanks,
Tom




More information about the drbd-user mailing list