[DRBD-user] "Remote failed" and "State change failed" while trying the stress test

Junko IKEDA tsukishima.ha at gmail.com
Tue Sep 6 06:15:16 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

When I tried the stress test, I got the following messages.
"Remote failed to finish a request within ko-count * timeout"

 * testing environment
CPU:	Intel(R) Xeon(R) CPU 5160 @3.00GHz (dual core x2)
Memory:	1024MB x4
HDD:	Smart Array P400, SAS 74GB x2 (RAID1+0)
OS:	RHEL 5.6 x86_64
DRBD:	8.3.11

 * "stress" tool options (CPU and Memory utilizations become almost 100%)
[root at dl380g5c ~]# stress --cpu 4 &
[root at dl380g5c ~]# stress --vm 1 --vm-bytes 4000000K &
[root at dl380g5c ~]# cd /drbd; stress -d 1 --hdd-bytes 1G &

Primary's syslog said;
Aug 29 11:09:55 dl380g5c kernel: block drbd0: drbd_sync_handshake:
Aug 29 11:09:55 dl380g5c kernel: block drbd0: self
C40A42F4B83EC72E:0000000000000000:0001000000000000:0001000000000000
bits:0 flags:0
Aug 29 11:09:55 dl380g5c kernel: block drbd0: peer
C40A42F4B83EC72E:0000000000000000:0001000000000000:0001000000000000
bits:0 flags:0
Aug 29 11:09:55 dl380g5c kernel: block drbd0: uuid_compare()=0 by rule 40
Aug 29 11:09:55 dl380g5c kernel: block drbd0: peer( Unknown ->
Secondary ) conn( WFReportParams -> Connected ) disk( Consistent ->
UpToDate ) pdsk( DUnknown -> UpToDate )
Aug 29 11:09:58 dl380g5c kernel: block drbd0: role( Secondary -> Primary )
Aug 29 11:10:03 dl380g5c kernel: kjournald starting.  Commit interval 5 seconds
Aug 29 11:10:03 dl380g5c kernel: EXT3-fs warning: maximal mount count
reached, running e2fsck is recommended
Aug 29 11:10:03 dl380g5c kernel: EXT3 FS on drbd0, internal journal
Aug 29 11:10:03 dl380g5c kernel: EXT3-fs: drbd0: 1 orphan inode deleted
Aug 29 11:10:03 dl380g5c kernel: EXT3-fs: recovery complete.
Aug 29 11:10:03 dl380g5c kernel: EXT3-fs: mounted filesystem with
ordered data mode.
Aug 29 11:13:37 dl380g5c ntpd[3813]: synchronized to 172.30.17.226, stratum 4
Aug 29 11:13:36 dl380g5c ntpd[3813]: time reset -0.475105 s
Aug 29 11:13:36 dl380g5c ntpd[3813]: kernel time sync enabled 0001

 * something timeout...
Aug 29 11:13:43 dl380g5c kernel: block drbd0: Remote failed to finish
a request within ko-count * timeout
Aug 29 11:13:45 dl380g5c kernel: block drbd0: State change failed:
Refusing to be Primary while peer is not outdated
Aug 29 11:13:45 dl380g5c kernel: block drbd0:   state = { cs:Connected
ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
Aug 29 11:13:45 dl380g5c kernel: block drbd0:  wanted = { cs:Timeout
ro:Primary/Unknown ds:UpToDate/DUnknown r----- }
Aug 29 11:17:59 dl380g5c ntpd[3813]: synchronized to 172.30.17.226, stratum 4

 * ko-count is expired, but it's not zero.
Aug 29 11:24:13 dl380g5c kernel: block drbd0: [drbd0_worker/4001]
sock_sendmsg time expired, ko = 2
Aug 30 04:35:00 dl380g5c ntpd[3813]: no servers reachable
Aug 30 04:52:04 dl380g5c ntpd[3813]: synchronized to 172.30.17.226, stratum 4

 * at this time, I tried to write/read some files on the replication
area, and it succeeded.

Should I disconnect the Secondary node manually?

Regards,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION
-------------- next part --------------
A non-text attachment was scrubbed...
Name: syslog
Type: application/octet-stream
Size: 4819 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110906/8ce19405/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: drbd.conf
Type: application/octet-stream
Size: 1768 bytes
Desc: not available
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20110906/8ce19405/attachment-0001.obj>


More information about the drbd-user mailing list