Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
We finally have our systems configured and functioning in a drbd environment with redhat 9.0 / 2.4.x kernel SMP. Thanks to all for the assistance. Now that the partitions are syncing, we see that we are getting some errors on the primary as follows: drbd0: Secondary/Secondary --> Primary/Secondary kjournald starting. Commit interval 5 seconds EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(147,0), internal journal EXT3-fs: mounted filesystem with ordered data mode. drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 aacraid:ID(0:00:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295 Notice that I've only seen one occurance of the aacraid error on the primary... Now, see the output from the secondary box below. These systems are both Dell 2400 with the same configuration / hardware. Is this pointing to a hardware issue? I don't think it is, but I'm not sure, that's why I'm asking if anyone has any ideas. I notice it never complains about devices (0:00:0) or (0:01:0) which would be hard drive 0 and 1. Anyone have anything to add to this? I belive the time expired errors are being caused by the underlying issue, whatever that may be. I don't think any tweaking of drbd will fix it, but instead maybe a raid issue or hardware issue. drbd0: Resync started as SyncTarget (need to sync 52863812 KB [13215953 bits set]). aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) drbd0: Secondary/Secondary --> Secondary/Primary aacraid: <...repeats 1 more times> aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x28] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid: <...repeats 2 more times> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid: <...repeats 1 more times> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid: <...repeats 1 more times> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid: <...repeats 1 more times> aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid: <...repeats 1 more times> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid: <...repeats 2 more times> aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI Channel[0]: Timeout Detected On 1 Command(s) [root at linux2 src]# Thanks, Dan