Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hardware: 2 Dell PowerEdge SC1435's Mirrored Drive: 1TB Hitachi HUA72101 SATA OS: Debian Etch 4.0r3 Kernel: vanilla kernel 2.6.24.3 DRBD: 8.2.5 Heartbeat: 2.1.3 I had drbd and heartbeat up and running. I did initial tests of the fail-over and mirroring. Everything worked as expected. Then I attached an external drive via firewire to a SIIG firewire card in the primary node. I mounted the external drive on /backup. The /dev/drbd0 device is mounted on /ha. Then I issued the following command at 17:50 and left for the night: tar cf /ha/fullbackup.tar /backup/ha The /backup/ha directory contains 334GB of data. When I came in to work this moring, I issued an 'ls -lh /ha' command and it hung. I checked syslog and found this: Mar 7 20:03:02 fs01 kernel: drbd0: FIXME (barrier_acked but pending) f6af0688 W L-coNp-s-- 82821 (621446208s +4096) Connected Mar 7 20:03:02 fs01 kernel: drbd0: ASSERT( b->n_req == set_size ) in /usr/src/drbd-8.2.5/drbd/drbd_main.c:238 Mar 7 20:03:02 fs01 kernel: drbd0: b->n_req = 592 in /usr/src/drbd-8.2.5/drbd/drbd_main.c:246 Mar 7 20:03:02 fs01 kernel: drbd0: set_size = 591 in /usr/src/drbd-8.2.5/drbd/drbd_main.c:247 Any access to /ha hangs. The tar command is hung. I found a post from January 10, 2005 that the second line in the log is nothing to worry about. I looked in drbd_main.c and didn't see anything that indicated a major problem. It looks like it just reports the sizes of b->n_req and set_size when they are not equal. What does it mean when b->n_req != set_size? Is this an indicator of why the drbd0 device is not accessible anymore? Or is the first line from the log above (where it says FIXME) an indicator of a bigger problem? I had to restart the primary. I could access the drbd0 device after it came back up. I found that the last write to the tar file was at 20:02. That's about the same time those errors showed up in the log. Well I found some posts about a Broadcom NetXtreme II BCM5708 NIC with TOE causing drbd lockups. I am using an onboard Broadcom NetXtreme BCM5721 NIC without TOE. One of the posts said to try this: ethtool -K ethX tx off ethtool -K ethX rx off Which I did and tried it again. This time it worked. Did I fix the problem, or just get lucky? Any ideas? Thanks, Tom