Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I have a problem with drbd-0.7.25 and drbd-0.8.2.6... my situation is: two servers Supermicro in company A connected with crossover cable and CentOS 5.2 (all updates installed) two servers Poweredge in company B connected with network fiber in separate sites and Citrix XenServer 4 installed. the problem is that time in time, both servers restart without apparent reason.. in logs, only show messages about network failure and after this, server restart. in company A... this problem occurred 2 o 3 times and the last incident is on 4 months ago.. and I had forget this problem.. because, I think that could be for electrical energy line in this company. but now, in company B.. I have the same problem for first time (after various months work fine) and this servers is connected in UPS line. two servers groups are running a Virtualization Server.. but from different vendors and configurations.. Memory, disks and network work fine in four servers and, DRBD resource contain only data from VMs, none files/data from owner server. and I don't understand why servers restart when recive a error from network !!??? and in case of problem..I think that restart of VMs is probably but not of complete Server. Above, logs and config file of two servers in company B... Any idea ??? ========= DRBD-1 berfore restart========= Feb 12 06:45:37 drbd1 kernel: drbd2: PingAck did not arrive in time. Feb 12 06:45:37 drbd1 kernel: drbd2: drbd2_asender [15826]: cstate Connected --> NetworkFailure Feb 12 06:45:37 drbd1 kernel: drbd2: asender terminated Feb 12 06:45:37 drbd1 kernel: drbd2: drbd2_receiver [6450]: cstate NetworkFailure --> BrokenPipe Feb 12 06:45:37 drbd1 kernel: drbd2: short read receiving data block: read 2336 expected 4096 Feb 12 06:45:37 drbd1 kernel: drbd2: error receiving Data, l: 4112! Feb 12 06:45:37 drbd1 kernel: drbd2: worker terminated Feb 12 06:45:37 drbd1 kernel: drbd2: drbd2_receiver [6450]: cstate BrokenPipe --> Unconnected Feb 12 06:45:37 drbd1 kernel: drbd2: Connection lost. Feb 12 06:45:37 drbd1 kernel: drbd2: drbd2_receiver [6450]: cstate Unconnected --> WFConnection Feb 12 06:45:37 drbd1 kernel: drbd1: PingAck did not arrive in time. Feb 12 06:45:37 drbd1 kernel: drbd1: drbd1_asender [15828]: cstate Connected --> NetworkFailure Feb 12 06:45:37 drbd1 kernel: drbd1: asender terminated Feb 12 06:45:37 drbd1 kernel: drbd1: drbd1_receiver [6442]: cstate NetworkFailure --> BrokenPipe Feb 12 06:45:37 drbd1 kernel: drbd1: short read expecting header on sock: r=-512 Feb 12 06:45:37 drbd1 kernel: drbd1: worker terminated Feb 12 06:45:37 drbd1 kernel: drbd1: drbd1_receiver [6442]: cstate BrokenPipe --> Unconnected Feb 12 06:45:37 drbd1 kernel: drbd1: Connection lost. Feb 12 06:45:37 drbd1 kernel: drbd1: drbd1_receiver [6442]: cstate Unconnected --> WFConnection Feb 12 06:49:29 drbd1 syslogd 1.4.1: restart. =================================== ========= DRBD-1 after restart========= Feb 12 06:45:37 drbd1 kernel: drbd2: PingAck did not arrive in time. Feb 12 06:45:37 drbd1 kernel: drbd2: drbd2_asender [15826]: cstate Connected --> NetworkFailure Feb 12 06:45:37 drbd1 kernel: drbd2: asender terminated Feb 12 06:45:37 drbd1 kernel: drbd2: drbd2_receiver [6450]: cstate NetworkFailure --> BrokenPipe Feb 12 06:45:37 drbd1 kernel: drbd2: short read receiving data block: read 2336 expected 4096 Feb 12 06:45:37 drbd1 kernel: drbd2: error receiving Data, l: 4112! Feb 12 06:45:37 drbd1 kernel: drbd2: worker terminated Feb 12 06:45:37 drbd1 kernel: drbd2: drbd2_receiver [6450]: cstate BrokenPipe --> Unconnected Feb 12 06:45:37 drbd1 kernel: drbd2: Connection lost. Feb 12 06:45:37 drbd1 kernel: drbd2: drbd2_receiver [6450]: cstate Unconnected --> WFConnection Feb 12 06:45:37 drbd1 kernel: drbd1: PingAck did not arrive in time. Feb 12 06:45:37 drbd1 kernel: drbd1: drbd1_asender [15828]: cstate Connected --> NetworkFailure Feb 12 06:45:37 drbd1 kernel: drbd1: asender terminated Feb 12 06:45:37 drbd1 kernel: drbd1: drbd1_receiver [6442]: cstate NetworkFailure --> BrokenPipe Feb 12 06:45:37 drbd1 kernel: drbd1: short read expecting header on sock: r=-512 Feb 12 06:45:37 drbd1 kernel: drbd1: worker terminated Feb 12 06:45:37 drbd1 kernel: drbd1: drbd1_receiver [6442]: cstate BrokenPipe --> Unconnected Feb 12 06:45:37 drbd1 kernel: drbd1: Connection lost. Feb 12 06:45:37 drbd1 kernel: drbd1: drbd1_receiver [6442]: cstate Unconnected --> WFConnection Feb 12 06:50:52 drbd1 kernel: drbd: initialised. Version: 0.7.25 (api:79/proto:74) Feb 12 06:50:52 drbd1 kernel: drbd: GIT-hash: 3a9c7c136a9af8df921b3628129dafbe212ace9f build by root at localhost.localdomain, 2008-04-28 17:15:49 Feb 12 06:50:52 drbd1 kernel: drbd: registered as block device major 147 Feb 12 06:50:53 drbd1 kernel: drbd1: resync bitmap: bits=191966063 words=5998940 Feb 12 06:50:53 drbd1 kernel: drbd1: size = 732 GB (767864252 KB) Feb 12 06:50:55 drbd1 kernel: drbd1: 0 KB marked out-of-sync by on disk bit-map. Feb 12 06:50:55 drbd1 kernel: drbd1: Found 6 transactions (324 active extents) in activity log. Feb 12 06:50:55 drbd1 kernel: drbd1: Marked additional 1028 MB as out-of-sync based on AL. Feb 12 06:50:55 drbd1 kernel: drbd1: drbdsetup [6426]: cstate Unconfigured --> StandAlone Feb 12 06:50:55 drbd1 kernel: drbd2: resync bitmap: bits=191966063 words=5998940 Feb 12 06:50:55 drbd1 kernel: drbd2: size = 732 GB (767864252 KB) Feb 12 06:50:57 drbd1 kernel: drbd2: 0 KB marked out-of-sync by on disk bit-map. Feb 12 06:50:57 drbd1 kernel: drbd2: Found 4 transactions (192 active extents) in activity log. Feb 12 06:50:57 drbd1 kernel: drbd2: drbdsetup [6430]: cstate Unconfigured --> StandAlone Feb 12 06:50:57 drbd1 kernel: drbd1: drbdsetup [6448]: cstate StandAlone --> Unconnected Feb 12 06:50:57 drbd1 kernel: drbd1: drbd1_receiver [6449]: cstate Unconnected --> WFConnection Feb 12 06:50:57 drbd1 kernel: drbd2: drbdsetup [6456]: cstate StandAlone --> Unconnected Feb 12 06:50:57 drbd1 kernel: drbd2: drbd2_receiver [6457]: cstate Unconnected --> WFConnection Feb 12 06:50:58 drbd1 kernel: drbd1: using degr_wfc_timeout=30 seconds Feb 12 06:50:58 drbd1 kernel: drbd2: drbd2_receiver [6457]: cstate WFConnection --> WFReportParams Feb 12 06:50:58 drbd1 kernel: drbd2: Handshake successful: DRBD Network Protocol version 74 Feb 12 06:50:58 drbd1 kernel: drbd2: Connection established. Feb 12 06:50:58 drbd1 kernel: drbd2: I am(S): 1:00000007:00000010:0000002a:00000005:00 Feb 12 06:50:58 drbd1 kernel: drbd2: Peer(S): 1:00000007:00000010:0000002a:00000005:10 Feb 12 06:50:58 drbd1 kernel: drbd2: drbd2_receiver [6457]: cstate WFReportParams --> WFBitMapT Feb 12 06:50:58 drbd1 kernel: drbd2: Secondary/Unknown --> Secondary/Secondary Feb 12 06:50:58 drbd1 kernel: drbd2: drbd2_receiver [6457]: cstate WFBitMapT --> SyncTarget Feb 12 06:50:58 drbd1 kernel: drbd2: Resync started as SyncTarget (need to sync 1052672 KB [263168 bits set]). Feb 12 06:50:59 drbd1 kernel: drbd1: drbd1_receiver [6449]: cstate WFConnection --> WFReportParams Feb 12 06:50:59 drbd1 kernel: drbd1: Handshake successful: DRBD Network Protocol version 74 Feb 12 06:50:59 drbd1 kernel: drbd1: Connection established. Feb 12 06:50:59 drbd1 kernel: drbd1: I am(S): 1:00000005:00000002:0000001f:00000002:10 Feb 12 06:50:59 drbd1 kernel: drbd1: Peer(S): 1:00000005:00000002:0000001e:00000002:00 Feb 12 06:50:59 drbd1 kernel: drbd1: drbd1_receiver [6449]: cstate WFReportParams --> WFBitMapS Feb 12 06:50:59 drbd1 kernel: drbd1: Secondary/Unknown --> Primary/Unknown Feb 12 06:50:59 drbd1 kernel: drbd1: Primary/Unknown --> Primary/Secondary Feb 12 06:50:59 drbd1 kernel: drbd1: drbd1_receiver [6449]: cstate WFBitMapS --> SyncSource Feb 12 06:50:59 drbd1 kernel: drbd1: Resync started as SyncSource (need to sync 1052672 KB [263168 bits set]). Feb 12 06:51:00 drbd1 kernel: drbd2: Secondary/Secondary --> Secondary/Primary Feb 12 06:51:00 drbd1 kernel: drbd2: Not in Primary state, no IO requests allowed Feb 12 06:51:00 drbd1 kernel: drbd2: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:00 drbd1 kernel: drbd2: Not in Primary state, no IO requests allowed Feb 12 06:51:00 drbd1 kernel: drbd2: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:08 drbd1 kernel: drbd2: Not in Primary state, no IO requests allowed Feb 12 06:51:08 drbd1 kernel: drbd2: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:08 drbd1 kernel: drbd2: Not in Primary state, no IO requests allowed Feb 12 06:51:08 drbd1 kernel: drbd2: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:08 drbd1 kernel: drbd2: Not in Primary state, no IO requests allowed Feb 12 06:51:08 drbd1 kernel: drbd2: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:22 drbd1 kernel: drbd1: Resync done (total 22 sec; paused 0 sec; 47848 K/sec) Feb 12 06:51:22 drbd1 kernel: drbd1: drbd1_worker [6427]: cstate SyncSource --> Connected Feb 12 06:51:22 drbd1 kernel: drbd2: Resync done (total 23 sec; paused 0 sec; 45768 K/sec) Feb 12 06:51:22 drbd1 kernel: drbd2: drbd2_worker [6431]: cstate SyncTarget --> Connected =================================== ========= DRBD-2 berfore restart========= Feb 12 06:40:22 drbd2 last message repeated 9 times Feb 12 06:49:43 drbd2 syslogd 1.4.1: restart. Feb 12 06:49:43 drbd2 kernel: klogd 1.4.1, log source = /proc/kmsg started. =================================== ========= DRBD-2 after restart========= Feb 12 06:50:19 drbd2 kernel: drbd: initialised. Version: 0.7.25 (api:79/proto:74) Feb 12 06:50:19 drbd2 kernel: drbd: GIT-hash: 3a9c7c136a9af8df921b3628129dafbe212ace9f build by root at localhost.localdomain, 2008-04-28 17:15:49 Feb 12 06:50:19 drbd2 kernel: drbd: registered as block device major 147 Feb 12 06:50:20 drbd2 kernel: drbd1: resync bitmap: bits=191966063 words=5998940 Feb 12 06:50:20 drbd2 kernel: drbd1: size = 732 GB (767864252 KB) Feb 12 06:50:22 drbd2 kernel: drbd1: 0 KB marked out-of-sync by on disk bit-map. Feb 12 06:50:22 drbd2 kernel: drbd1: drbd_al_read_log: (rv == 0) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_actlog.c:470 Feb 12 06:50:22 drbd2 kernel: drbd1: drbd_al_read_log: (rv == 0) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_actlog.c:470 Feb 12 06:50:22 drbd2 kernel: drbd1: Found 4 transactions (192 active extents) in activity log. Feb 12 06:50:22 drbd2 kernel: drbd1: drbdsetup [6363]: cstate Unconfigured --> StandAlone Feb 12 06:50:22 drbd2 kernel: drbd2: resync bitmap: bits=191966063 words=5998940 Feb 12 06:50:22 drbd2 kernel: drbd2: size = 732 GB (767864252 KB) Feb 12 06:50:24 drbd2 kernel: drbd2: 0 KB marked out-of-sync by on disk bit-map. Feb 12 06:50:24 drbd2 kernel: drbd2: Found 6 transactions (324 active extents) in activity log. Feb 12 06:50:24 drbd2 kernel: drbd2: Marked additional 1028 MB as out-of-sync based on AL. Feb 12 06:50:24 drbd2 kernel: drbd2: drbdsetup [6367]: cstate Unconfigured --> StandAlone Feb 12 06:50:24 drbd2 kernel: drbd1: drbdsetup [6385]: cstate StandAlone --> Unconnected Feb 12 06:50:24 drbd2 kernel: drbd1: drbd1_receiver [6386]: cstate Unconnected --> WFConnection Feb 12 06:50:24 drbd2 kernel: drbd2: drbdsetup [6393]: cstate StandAlone --> Unconnected Feb 12 06:50:24 drbd2 kernel: drbd2: drbd2_receiver [6394]: cstate Unconnected --> WFConnection Feb 12 06:50:58 drbd2 kernel: drbd2: drbd2_receiver [6394]: cstate WFConnection --> WFReportParams Feb 12 06:50:58 drbd2 kernel: drbd2: Handshake successful: DRBD Network Protocol version 74 Feb 12 06:50:58 drbd2 kernel: drbd2: Connection established. Feb 12 06:50:58 drbd2 kernel: drbd2: I am(S): 1:00000007:00000010:0000002a:00000005:10 Feb 12 06:50:58 drbd2 kernel: drbd2: Peer(S): 1:00000007:00000010:0000002a:00000005:00 Feb 12 06:50:58 drbd2 kernel: drbd2: drbd2_receiver [6394]: cstate WFReportParams --> WFBitMapS Feb 12 06:50:58 drbd2 kernel: drbd2: Secondary/Unknown --> Secondary/Secondary Feb 12 06:50:58 drbd2 kernel: drbd2: drbd2_receiver [6394]: cstate WFBitMapS --> SyncSource Feb 12 06:50:58 drbd2 kernel: drbd2: Resync started as SyncSource (need to sync 1052672 KB [263168 bits set]). Feb 12 06:50:59 drbd2 kernel: drbd1: drbd1_receiver [6386]: cstate WFConnection --> WFReportParams Feb 12 06:50:59 drbd2 kernel: drbd1: Handshake successful: DRBD Network Protocol version 74 Feb 12 06:50:59 drbd2 kernel: drbd1: Connection established. Feb 12 06:50:59 drbd2 kernel: drbd1: I am(S): 1:00000005:00000002:0000001e:00000002:00 Feb 12 06:50:59 drbd2 kernel: drbd1: Peer(S): 1:00000005:00000002:0000001f:00000002:10 Feb 12 06:50:59 drbd2 kernel: drbd1: drbd1_receiver [6386]: cstate WFReportParams --> WFBitMapT Feb 12 06:50:59 drbd2 kernel: drbd1: Secondary/Unknown --> Secondary/Secondary Feb 12 06:50:59 drbd2 kernel: drbd1: drbd1_receiver [6386]: cstate WFBitMapT --> SyncTarget Feb 12 06:50:59 drbd2 kernel: drbd1: Resync started as SyncTarget (need to sync 1052672 KB [263168 bits set]). Feb 12 06:50:59 drbd2 kernel: drbd1: Secondary/Secondary --> Secondary/Primary Feb 12 06:51:00 drbd2 kernel: drbd2: Secondary/Secondary --> Primary/Secondary Feb 12 06:51:01 drbd2 kernel: drbd1: Not in Primary state, no IO requests allowed Feb 12 06:51:01 drbd2 kernel: drbd1: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:01 drbd2 kernel: drbd1: Not in Primary state, no IO requests allowed Feb 12 06:51:01 drbd2 kernel: drbd1: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:04 drbd2 kernel: drbd1: Not in Primary state, no IO requests allowed Feb 12 06:51:04 drbd2 kernel: drbd1: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:04 drbd2 kernel: drbd1: Not in Primary state, no IO requests allowed Feb 12 06:51:04 drbd2 kernel: drbd1: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:04 drbd2 kernel: drbd1: Not in Primary state, no IO requests allowed Feb 12 06:51:04 drbd2 kernel: drbd1: ASSERT( mdev->state == Primary ) in /root/drbd-0.7.25/dist/BUILD/drbd-0.7.25/drbd/drbd_main.c:1430 Feb 12 06:51:22 drbd2 kernel: drbd1: Resync done (total 22 sec; paused 0 sec; 47848 K/sec) Feb 12 06:51:22 drbd2 kernel: drbd1: drbd1_worker [6364]: cstate SyncTarget --> Connected Feb 12 06:51:22 drbd2 kernel: drbd2: Resync done (total 23 sec; paused 0 sec; 45768 K/sec) Feb 12 06:51:22 drbd2 kernel: drbd2: drbd2_worker [6368]: cstate SyncSource --> Connected =================================== ============DRBD.CONF============= resource "disco-xen1" { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { disk { on-io-error detach; net { syncer { rate 500M; al-extents 257; on drbd1.companya.com { device /dev/drbd1; disk /dev/sdb1; address 10.0.1.70:7789; meta-disk internal; on drbd2.companya.com { device /dev/drbd1; disk /dev/sdb1; address 10.0.1.60:7789; meta-disk internal; resource "disco-xen2" { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { disk { on-io-error detach; net { syncer { rate 500M; al-extents 257; on drbd1.companya.com { device /dev/drbd2; disk /dev/sdc1; address 10.0.1.70:7788; meta-disk internal; on drbd2.companya.com { device /dev/drbd2; disk /dev/sdc1; address 10.0.1.60:7788; meta-disk internal; ======================== -- -- Victor Hugo dos Santos Linux Counter #224399