Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
>Serial console? >Netconsole? >Logs? Which logs are you interested about, it is the first time I'm seriously troubleshooting DRBD problem. The /var/log/messages. just stops having messages on the time of the freeze (see snippet below). is there some debug level I can increase for DRBD? >Network stress tests not using DRBD? >General stress tests? >Memtest? The problem happens on the "production lan" as well on a 4 port "1Gig staging switch". iperf shows in all cases normal values. The problems happens on Fujitsu Siemens server RX200/RX300. The total of Fujistu Siemens Servers having this problem is 6 in total. Other servers I have installed do not have this problem. The Fujistu Siemens server have onboard Broadcom interfaces "NIC: NetXtreme II BCM5708 Gigabit Ethernet". ---------- /var/log/messages on the target machine -------------- Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: PingAck did not arrive in time. Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: asender terminated Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: Terminating asender thread Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: short read expecting header on sock: r=-512 Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: Connection closed Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: conn( NetworkFailure -> Unconnected ) Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: receiver terminated Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: Restarting receiver thread Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: receiver (re)started Sep 25 11:33:13 Cluster3Node1 kernel: block drbd2: conn( Unconnected -> WFConnection ) Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: PingAck did not arrive in time. Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: asender terminated Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: Terminating asender thread Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: short read expecting header on sock: r=-512 Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: Connection closed Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: conn( NetworkFailure -> Unconnected ) Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: receiver terminated Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: Restarting receiver thread Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: receiver (re)started Sep 25 11:33:19 Cluster3Node1 kernel: block drbd0: conn( Unconnected -> WFConnection ) ---------- here it is frozen ------------------------------- ---------- /var/log/messages on the target machine -------------- Here it stop until the booting messages of the reboot show up. mfg, jeroen. Lars Ellenberg wrote: > On Fri, Sep 25, 2009 at 01:10:24PM +0200, Jeroen Groenewegen van der Weyden wrote: > >> Anybody? >> >> The same seems to happen with 8.3.3RC2. although the error is either to >> freeze the system or the system disconnects all network interfaces. >> Anybody? >> >> mfg, >> >> jeroen >> >> Jeroen Groenewegen van der Weyden wrote: >> >>> Hello, >>> >>> I have a problem when full syncing with drbd the target machine >>> freezes. scenario is simple whenever a full sync is made manual or >>> automaticly the syncing is stalled after some time. after the syncing >>> reaches the stalled states a view moments later the target machine >>> freeze entirely. >>> >>> OpenSuse 11.1 >>> kernel 2.6.27.21-0.1-xen # >>> drbd 8.3.1 >>> >>> NIC: NetXtreme II BCM5708 Gigabit Ethernet >>> >>> On the Source Machine: >>> cat /proc/drbd >>> version: 8.3.1 (api:88/proto:86-89) >>> GIT-hash: fd40f4a8f9104941537d1afc8521e584a6d3003c build by >>> root at DefaultNode, 2009-04-27 11:34:17 >>> 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---- >>> ns:324524 nr:0 dw:110988 dr:689400 al:263 bm:242 lo:0 pe:2131 >>> ua:978 ap:36 ep:1 wo:b oos:1635880 >>> [==>.................] sync'ed: 16.4% (1635880/1951768)K >>> stalled >>> >>> How to find out what is happening here? >>> > > Serial console? > Netconsole? > Logs? > > Network stress tests not using DRBD? > General stress tests? > Memtest? > > >>> (and prevent it in the future.) >>> > > > ------------------------------------------------------------------------ > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.112/2393 - Release Date: 09/24/09 18:00:00 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20090925/2090691b/attachment.htm>