Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I run drbd-0.7.20 on 2x dell poweredge 2650, with raid 5, linux2.4, ext3 fs (I use 2.4 because one year ago looked like the the only driver/kernel combination that will work corectly with my perc 3/Di controller). Tonight I have a problem with a unavailable service. The secondary tryied to take over, but the filesystem on it was currupted. My failover script tryes to make the primary seconday, then secondary is made primary. It looks like it did that, as I found the primary/secondary switched, and the state was connected, both nodes consistent. The primary machine was fine, the script was triggered by an apache restart (I'll work on it). After I saw the fs is imposible to use, I disconnected the devices, made the broken node primary again, ran fsck, it found 4 broken files, but other then that, the filesystem could be used. These nodes have about 2 years since they function. I had no problems, since drbd-0.7 was released when using it with ext3. I always updated drbd connecting 2 different verions of drdb (same protocol though). The last time I switched the primary/secondary was 2 weeks ago, I had a full rsync. After that full rsync I switched them one more time, and they ended up in the configuration they where before the crash. I configure the drbd device manually: PROTOCOL="C" RATE="400M" DEVICE="/dev/drbd0" DISK="/dev/sda5" META="/dev/sda6" LOCAL_ADDRESS="192.168.3.2" REMOTE_ADDRESS="192.168.3.1" DRBDSETUP="/sbin/drbdsetup" $DRBDSETUP $DEVICE disk $DISK $META 0 $DRBDSETUP $DEVICE net $LOCAL_ADDRESS $REMOTE_ADDRESS $PROTOCOL $DRBDSETUP $DEVICE syncer --rate $RATE On the imposible to use fs I did all the checks available on the utility partition available on dells. Memory and disks are fine. I'm planning to do a backup, then a full rsync, and do full checks on the machine that used to be primary. If that machine is good too, I really ran out of options to identify the problem. I guess I will test the latest drbd with the latest 2.6 kernel. This is like what I have in logs for this weeks, on the machine that used to be primary. Please note that I concatenated the drbd messages from syslog with the onces from messages. 19.35 is the time when the failover switch and the last disconnect is done by me, manually. Thank you in advance for any suggetion on what could possible create this situation (/proc/drbd reports that the machines are both consistent/connected, but the filesystem is different). Aug 25 07:07:33 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate SyncSource --> Connected Aug 25 07:07:33 dell1 kernel: drbd0: meta connection shut down by peer. Aug 25 07:07:33 dell1 kernel: drbd0: short read expecting header on sock: r=0 Aug 25 07:07:38 dell1 kernel: drbd0: sock was shut down by peer Aug 25 07:07:38 dell1 kernel: drbd0: drbd0_asender [24289]: cstate Connected --> NetworkFailure Aug 25 07:07:38 dell1 kernel: drbd0: asender terminated Aug 25 07:07:38 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate NetworkFailure --> BrokenPipe Aug 25 07:07:38 dell1 kernel: drbd0: worker terminated Aug 25 07:07:38 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate BrokenPipe --> Unconnected Aug 25 07:07:38 dell1 kernel: drbd0: Connection lost. Aug 25 07:07:38 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate Unconnected --> WFConnection Aug 25 07:07:38 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate WFConnection --> WFReportParams Aug 25 07:07:38 dell1 kernel: drbd0: meta connection shut down by peer. Aug 25 07:07:38 dell1 kernel: drbd0: short read expecting header on sock: r=0 Aug 25 07:07:39 dell1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Aug 25 07:07:39 dell1 kernel: drbd0: Connection established. Aug 25 07:07:39 dell1 kernel: drbd0: I am(P): 1:00000002:00000001:0000002d:00000012:10 Aug 25 07:07:39 dell1 kernel: drbd0: Peer(S): 1:00000002:00000001:0000002c:00000012:01 Aug 25 07:07:39 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate WFReportParams --> WFBitMapS Aug 25 07:07:39 dell1 kernel: drbd0: Primary/Unknown --> Primary/Secondary Aug 25 07:07:39 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate WFBitMapS --> SyncSource Aug 25 07:07:39 dell1 kernel: drbd0: Resync started as SyncSource (need to sync 0 KB [0 bits set]). Aug 25 07:07:39 dell1 kernel: drbd0: Resync done (total 1 sec; paused 0 sec; 0 K/sec) Aug 25 07:07:39 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate SyncSource --> Connected Aug 25 07:07:48 dell1 kernel: drbd0: [kupdated/9] sock_sendmsg time expired, ko = 4294967295 Aug 25 07:07:51 dell1 kernel: drbd0: drbd0_asender [24292]: cstate Connected --> NetworkFailure Aug 25 07:07:51 dell1 kernel: drbd0: asender terminated Aug 25 07:07:51 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate NetworkFailure --> BrokenPipe Aug 25 07:07:51 dell1 kernel: drbd0: worker terminated Aug 25 07:07:51 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate BrokenPipe --> Unconnected Aug 25 07:07:51 dell1 kernel: drbd0: Connection lost. Aug 25 07:07:51 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate Unconnected --> WFConnection Aug 25 07:07:51 dell1 kernel: drbd0: [kupdated/9] sock_sendmsg time expired, ko = 4294967294 Aug 25 07:07:51 dell1 kernel: drbd0: PingAck did not arrive in time. Aug 25 07:07:51 dell1 kernel: drbd0: short read expecting header on sock: r=-512 Aug 25 07:07:51 dell1 kernel: drbd0: _drbd_send_page: size=4096 len=240 sent=-4 Aug 25 07:07:51 dell1 kernel: drbd0: short sent UnplugRemote size=8 sent=-1001 Aug 25 07:23:02 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate WFConnection --> WFReportParams Aug 25 07:23:02 dell1 kernel: drbd0: Handshake successful: DRBD Network Protocol version 74 Aug 25 07:23:02 dell1 kernel: drbd0: Connection established. Aug 25 07:23:02 dell1 kernel: drbd0: I am(P): 1:00000002:00000001:0000002e:00000012:10 Aug 25 07:23:02 dell1 kernel: drbd0: Peer(S): 1:00000002:00000001:0000002d:00000012:01 Aug 25 07:23:02 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate WFReportParams --> WFBitMapS Aug 25 07:23:02 dell1 kernel: drbd0: Primary/Unknown --> Primary/Secondary Aug 25 07:23:02 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate WFBitMapS --> SyncSource Aug 25 07:23:02 dell1 kernel: drbd0: Resync started as SyncSource (need to sync 39944 KB [9986 bits set]). Aug 25 07:23:07 dell1 kernel: drbd0: Resync done (total 4 sec; paused 0 sec; 9984 K/sec) Aug 25 07:23:07 dell1 kernel: drbd0: drbd0_worker [24297]: cstate SyncSource --> Connected Aug 31 19:35:49 dell1 kernel: drbd0: Primary/Secondary --> Secondary/Secondary Aug 31 19:36:19 dell1 kernel: drbd0: Secondary/Secondary --> Secondary/Primary Aug 31 19:48:11 dell1 kernel: drbd0: Secondary/Primary --> Secondary/Secondary Aug 31 19:53:17 dell1 kernel: drbd0: Secondary/Secondary --> Secondary/Primary Aug 31 19:56:26 dell1 kernel: drbd0: Not in Primary state, no IO requests allowed Aug 31 20:38:07 dell1 kernel: drbd0: sock was shut down by peer Aug 31 20:38:07 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate Connected --> BrokenPipe Aug 31 20:38:07 dell1 kernel: drbd0: worker terminated Aug 31 20:38:07 dell1 kernel: drbd0: asender terminated Aug 31 20:38:07 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate BrokenPipe --> Unconnected Aug 31 20:38:07 dell1 kernel: drbd0: Connection lost. Aug 31 20:38:07 dell1 kernel: drbd0: drbd0_receiver [32273]: cstate Unconnected --> WFConnection