[DRBD-user] sock_sendmsg time expired, ko = 4294967295 - could it be my Dell 2400 hardware?

Dan Didier dan at mapolce.com
Fri Sep 24 15:47:17 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


> 
> We finally have our systems configured and functioning in a 
> drbd environment with redhat 9.0 / 2.4.x kernel SMP.
> Thanks to all for the assistance.  Now that the partitions 
> are syncing, we see that we are getting some errors on the 
> primary as follows:
> 
> drbd0: Secondary/Secondary --> Primary/Secondary kjournald 
> starting.  Commit interval 5 seconds
> EXT3 FS 2.4-0.9.19, 19 August 2002 on drbd(147,0), internal journal
> EXT3-fs: mounted filesystem with ordered data mode.
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967294
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> aacraid:ID(0:00:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> drbd0: [drbd0_worker/2411] sock_sendmsg time expired, ko = 4294967295
> 
> Notice that I've only seen one occurance of the aacraid error 
> on the primary...
> Now, see the output from the secondary box below.  These 
> systems are both Dell 2400 with the same configuration / 
> hardware.  Is this pointing to a hardware issue?  I don't 
> think it is, but I'm not sure, that's why I'm asking if 
> anyone has any ideas.  I notice it never complains about 
> devices (0:00:0) or (0:01:0) which would be hard drive 0 and 
> 1.  Anyone have anything to add to this?  I belive the time 
> expired errors are being caused by the underlying issue, 
> whatever that may be.  I don't think any tweaking of drbd 
> will fix it, but instead maybe a raid issue or hardware issue.
> 
> drbd0: Resync started as SyncTarget (need to sync 52863812 KB 
> [13215953 bits set]).
> aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> drbd0: Secondary/Secondary --> Secondary/Primary
> aacraid: <...repeats 1 more times>
> aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x28] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid: <...repeats 2 more times>
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid: <...repeats 1 more times>
> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid: <...repeats 1 more times>
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid: <...repeats 1 more times>
> aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid: <...repeats 1 more times>
> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid: <...repeats 2 more times>
> aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:02:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:05:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:03:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> aacraid:ID(0:04:0) Timeout detected on cmd[0x2a] aacraid:SCSI 
> Channel[0]: Timeout Detected On 1 Command(s)
> [root at linux2 src]#
> 
> Thanks,
> Dan


Some more info for you.  On the system that is not showing the aacraid
errors (linux1 / primary) the system load is very low:
[root at linux1 /]# uptime
 09:40:23  up  1:07,  1 user,  load average: 0.02, 0.09, 0.08

However, on the system that is showing the problems (linux2 / secondary)
the system load is very high.  I am thinking this is because of whatever
is the underlying issue:
[root at linux2 mail]# uptime
 23:46:14  up  1:25,  1 user,  load average: 2.96, 2.85, 2.03

Any thoughts on this?
Thanks,
Dan


> 
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 



More information about the drbd-user mailing list