Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, We are having trouble with our primary node locking up just about every night. This happened once about two weeks ago. Then again Monday, Wednesday and this morning. We didn't notice the node was hung until Tuesday morning because everybody was off work on Monday. The failover to the secondary node goes as expected, except the secondary is not accessible via the network. That is another problem. What I'm trying to determine is if the problem with the primary is indicative of old cheap hardware starting to go bad. The log of the failure is given below along with the drbd.conf. The time the failure occurs, a backup is being done of the drbd2 device to an external USB drive. The backup has been working just fine for us for a few months. No new software has been added to the system and no kernel upgrades have been done. Any ideas? Thanks, Tom Distro: Debian Etch Kernel: 2.6.22.6 DRBD: 8.0.6 Heartbeat: 2.1.2 /var/log/syslog: Feb 21 00:04:18 zan kernel: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error } Feb 21 00:04:18 zan kernel: hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=1262256273, high=75, low=3965073, sector=1262256267 Feb 21 00:04:18 zan kernel: ide: failed opcode was: unknown Feb 21 00:04:18 zan kernel: end_request: I/O error, dev hdc, sector 1262256267 Feb 21 00:04:18 zan kernel: drbd2: got an _req_mod() errno of -5 Feb 21 00:04:18 zan kernel: drbd2: Local READ failed sec=1261436952s size=4096 Feb 21 00:04:18 zan kernel: drbd2: disk( UpToDate -> Failed ) Feb 21 00:04:18 zan kernel: drbd2: Local IO failed. Detaching... Feb 21 00:04:18 zan kernel: drbd2: helper command: /sbin/drbdadm pri-on-incon-degr Feb 21 00:04:18 zan kernel: drbd2: Sorry, I have no access to good data anymore. Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 8210 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: ReiserFS: drbd2: warning: journal-837: IO error during journal replay Feb 21 00:04:18 zan kernel: REISERFS: abort (device drbd2): Write error while updating journal header in flush_journal_list Feb 21 00:04:18 zan kernel: REISERFS: Aborting journal for filesystem on drbd2 Feb 21 00:04:18 zan kernel: drbd2: Sorry, I have no access to good data anymore. Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 4284 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: drbd2: Sorry, I have no access to good data anymore. Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 4285 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: drbd2: Sorry, I have no access to good data anymore. Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 4286 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: drbd2: Sorry, I have no access to good data anymore. Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 4287 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 4288 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 4289 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 4290 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 4291 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: Buffer I/O error on device drbd2, logical block 4292 Feb 21 00:04:18 zan kernel: lost page write due to I/O error on drbd2 Feb 21 00:04:18 zan kernel: SysRq : HELP : loglevel0-8 reBoot Crashdump tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks Feb 21 07:48:45 zan syslogd 1.4.1#18: restart. /etc/drbd.conf: global { usage-count yes; } common { syncer { rate 22M; } } resource r0 { protocol C; handlers { pri-on-incon-degr "echo O > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo O > /proc/sysrq-trigger ; halt -f"; local-io-error "echo O > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/sbin/drbd-peer-outdater"; } startup { wfc-timeout 20; degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 22M; al-extents 257; } on zan { device /dev/drbd0; disk /dev/hdd1; address 192.168.1.3:7788; meta-disk /dev/hdc1 [0]; } on jayna { device /dev/drbd0; disk /dev/hdd1; address 192.168.1.4:7788; meta-disk /dev/hdc1 [0]; } } resource r1 { protocol C; handlers { pri-on-incon-degr "echo O > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo O > /proc/sysrq-trigger ; halt -f"; local-io-error "echo O > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/sbin/drbd-peer-outdater"; } startup { wfc-timeout 20; degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 25M; after "r0"; al-extents 257; } on zan { device /dev/drbd1; disk /dev/hdd2; address 192.168.2.3:7789; meta-disk /dev/hdc2 [0]; } on jayna { device /dev/drbd1; disk /dev/hdd2; address 192.168.2.4:7789; meta-disk /dev/hdc2 [0]; } } resource r2 { protocol C; handlers { pri-on-incon-degr "echo O > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo O > /proc/sysrq-trigger ; halt -f"; local-io-error "echo O > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/sbin/drbd-peer-outdater"; } startup { wfc-timeout 20; degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 26M; al-extents 257; } on zan { device /dev/drbd2; disk /dev/hdc4; address 192.168.3.3:7790; meta-disk /dev/hdc3 [0]; } on jayna { device /dev/drbd2; disk /dev/hdc4; address 192.168.3.4:7790; meta-disk /dev/hdc3 [0]; } }