##### quartz is the primary node, turquoise the secondary one ##### I/O error occured on the disk corresponding to drbd1 ##### while we are performing daily indexing of large amount of data on drbd1 Aug 28 07:37:49 quartz kernel: SCSI error : <0 0 0 0> return code = 0x8000002 Aug 28 07:37:49 quartz kernel: Info fld=0x61d49b5, Current sda: sense key Hardware Error Aug 28 07:37:49 quartz kernel: ASC=40 ASCQ=8c Aug 28 07:37:49 quartz kernel: end_request: I/O error, dev sda, sector 102582695 #### quartz changed to diskless. Seems OK. Aug 28 07:37:49 quartz kernel: drbd1: Local IO failed. Detaching... Aug 28 07:37:49 quartz kernel: drbd1: local read failed, retrying remotely #### No other message from drbd. No message on turquoise, strange. #### Two days later while performing again indexing the kernel run out of memory Aug 30 07:42:26 quartz kernel: oom-killer: gfp_mask=0xd0 Aug 30 07:42:26 quartz kernel: Mem-info: Aug 30 07:42:26 quartz kernel: DMA per-cpu: Aug 30 07:42:26 quartz kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 30 07:42:26 quartz kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 30 07:42:26 quartz kernel: cpu 1 hot: low 2, high 6, batch 1 Aug 30 07:42:26 quartz kernel: cpu 1 cold: low 0, high 2, batch 1 Aug 30 07:42:26 quartz kernel: cpu 2 hot: low 2, high 6, batch 1 Aug 30 07:42:26 quartz kernel: cpu 2 cold: low 0, high 2, batch 1 Aug 30 07:42:26 quartz kernel: cpu 3 hot: low 2, high 6, batch 1 Aug 30 07:42:26 quartz kernel: cpu 3 cold: low 0, high 2, batch 1 Aug 30 07:42:26 quartz kernel: Normal per-cpu: Aug 30 07:42:26 quartz kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 30 07:42:26 quartz kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 30 07:42:26 quartz kernel: cpu 1 hot: low 32, high 96, batch 16 Aug 30 07:42:26 quartz kernel: cpu 1 cold: low 0, high 32, batch 16 Aug 30 07:42:26 quartz kernel: cpu 2 hot: low 32, high 96, batch 16 Aug 30 07:42:26 quartz kernel: cpu 2 cold: low 0, high 32, batch 16 Aug 30 07:42:26 quartz kernel: cpu 3 hot: low 32, high 96, batch 16 Aug 30 07:42:26 quartz kernel: cpu 3 cold: low 0, high 32, batch 16 Aug 30 07:42:26 quartz kernel: HighMem per-cpu: Aug 30 07:42:26 quartz kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 30 07:42:26 quartz kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 30 07:42:26 quartz kernel: cpu 1 hot: low 32, high 96, batch 16 Aug 30 07:42:26 quartz kernel: cpu 1 cold: low 0, high 32, batch 16 Aug 30 07:42:26 quartz kernel: cpu 2 hot: low 32, high 96, batch 16 Aug 30 07:42:26 quartz kernel: cpu 2 cold: low 0, high 32, batch 16 Aug 30 07:42:26 quartz kernel: cpu 3 hot: low 32, high 96, batch 16 Aug 30 07:42:26 quartz kernel: cpu 3 cold: low 0, high 32, batch 16 Aug 30 07:42:26 quartz kernel: Aug 30 07:42:26 quartz kernel: Free pages: 809172kB (795648kB HighMem) Aug 30 07:42:26 quartz kernel: Active:82994 inactive:10437 dirty:767 writeback:5 unstable:0 free:202293 slab:218109 mapped:69603 pagetables:1390 Aug 30 07:42:26 quartz kernel: DMA free:12636kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:5925 all_unreclaimable? yes Aug 30 07:42:26 quartz kernel: protections[]: 0 0 0 Aug 30 07:42:26 quartz kernel: Normal free:888kB min:928kB low:1856kB high:2784kB active:708kB inactive:728kB present:901120kB pages_scanned:1617 all_unreclaimable? yes Aug 30 07:42:26 quartz kernel: protections[]: 0 0 0 Aug 30 07:42:26 quartz kernel: HighMem free:795648kB min:512kB low:1024kB high:1536kB active:331268kB inactive:41020kB present:1179484kB pages_scanned:0 all_unreclaimable? no Aug 30 07:42:26 quartz kernel: protections[]: 0 0 0 Aug 30 07:42:26 quartz kernel: DMA: 3*4kB 4*8kB 3*16kB 4*32kB 4*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12636kB Aug 30 07:42:26 quartz kernel: Normal: 14*4kB 2*8kB 1*16kB 3*32kB 3*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 888kB Aug 30 07:42:26 quartz kernel: HighMem: 5384*4kB 11898*8kB 9293*16kB 5684*32kB 3049*64kB 949*128kB 112*256kB 6*512kB 0*1024kB 0*2048kB 0*4096kB = 795648kB Aug 30 07:42:26 quartz kernel: Swap cache: add 853, delete 602, find 594/626, race 0+0 Aug 30 07:42:26 quartz kernel: Free swap: 2094736kB Aug 30 07:42:26 quartz kernel: 524247 pages of RAM Aug 30 07:42:26 quartz kernel: 294871 pages of HIGHMEM Aug 30 07:42:26 quartz kernel: 5521 reserved pages Aug 30 07:42:26 quartz kernel: 84182 pages shared Aug 30 07:42:26 quartz kernel: 251 pages swap cached Aug 30 07:42:26 quartz kernel: Out of Memory: Killed process 2369 (slapd). Aug 30 07:42:34 quartz kernel: oom-killer: gfp_mask=0xd0 Aug 30 07:42:34 quartz kernel: Mem-info: Aug 30 07:42:34 quartz kernel: DMA per-cpu: Aug 30 07:42:34 quartz kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 30 07:42:34 quartz kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 30 07:42:34 quartz kernel: cpu 1 hot: low 2, high 6, batch 1 Aug 30 07:42:34 quartz kernel: cpu 1 cold: low 0, high 2, batch 1 Aug 30 07:42:34 quartz kernel: cpu 2 hot: low 2, high 6, batch 1 Aug 30 07:42:34 quartz kernel: cpu 2 cold: low 0, high 2, batch 1 Aug 30 07:42:34 quartz kernel: cpu 3 hot: low 2, high 6, batch 1 Aug 30 07:42:34 quartz kernel: cpu 3 cold: low 0, high 2, batch 1 Aug 30 07:42:34 quartz kernel: Normal per-cpu: Aug 30 07:42:34 quartz kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 30 07:42:34 quartz kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 30 07:42:34 quartz kernel: cpu 1 hot: low 32, high 96, batch 16 Aug 30 07:42:34 quartz kernel: cpu 1 cold: low 0, high 32, batch 16 Aug 30 07:42:34 quartz kernel: cpu 2 hot: low 32, high 96, batch 16 Aug 30 07:42:35 quartz kernel: cpu 2 cold: low 0, high 32, batch 16 Aug 30 07:42:35 quartz kernel: cpu 3 hot: low 32, high 96, batch 16 Aug 30 07:42:35 quartz kernel: cpu 3 cold: low 0, high 32, batch 16 Aug 30 07:42:35 quartz kernel: HighMem per-cpu: Aug 30 07:42:35 quartz kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 30 07:42:35 quartz kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 30 07:42:35 quartz kernel: cpu 1 hot: low 32, high 96, batch 16 Aug 30 07:42:35 quartz kernel: cpu 1 cold: low 0, high 32, batch 16 Aug 30 07:42:35 quartz kernel: cpu 2 hot: low 32, high 96, batch 16 Aug 30 07:42:35 quartz kernel: cpu 2 cold: low 0, high 32, batch 16 Aug 30 07:42:35 quartz kernel: cpu 3 hot: low 32, high 96, batch 16 Aug 30 07:42:35 quartz kernel: cpu 3 cold: low 0, high 32, batch 16 Aug 30 07:42:35 quartz kernel: Aug 30 07:42:35 quartz kernel: Free pages: 856172kB (842624kB HighMem) Aug 30 07:42:35 quartz kernel: Active:78698 inactive:2922 dirty:2 writeback:0 unstable:0 free:214043 slab:218172 mapped:66039 pagetables:1315 Aug 30 07:42:35 quartz kernel: DMA free:12636kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:5933 all_unreclaimable? yes Aug 30 07:42:35 quartz kernel: protections[]: 0 0 0 Aug 30 07:42:35 quartz kernel: Normal free:912kB min:928kB low:1856kB high:2784kB active:620kB inactive:256kB present:901120kB pages_scanned:957 all_unreclaimable? yes Aug 30 07:42:36 quartz kernel: protections[]: 0 0 0 Aug 30 07:42:36 quartz kernel: HighMem free:842624kB min:512kB low:1024kB high:1536kB active:314172kB inactive:11432kB present:1179484kB pages_scanned:0 all_unreclaimable? no Aug 30 07:42:36 quartz kernel: protections[]: 0 0 0 Aug 30 07:42:37 quartz kernel: DMA: 3*4kB 4*8kB 3*16kB 4*32kB 4*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12636kB Aug 30 07:42:37 quartz kernel: Normal: 10*4kB 7*8kB 1*16kB 3*32kB 3*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 912kB Aug 30 07:42:37 quartz kernel: HighMem: 6628*4kB 12812*8kB 9911*16kB 5773*32kB 3122*64kB 1028*128kB 134*256kB 9*512kB 0*1024kB 0*2048kB 0*4096kB = 842624kB Aug 30 07:42:37 quartz kernel: Swap cache: add 861, delete 713, find 594/628, race 0+0 Aug 30 07:42:37 quartz kernel: Free swap: 2095380kB Aug 30 07:42:37 quartz kernel: 524247 pages of RAM Aug 30 07:42:37 quartz kernel: 294871 pages of HIGHMEM Aug 30 07:42:37 quartz kernel: 5521 reserved pages Aug 30 07:42:37 quartz kernel: 74457 pages shared Aug 30 07:42:37 quartz kernel: 148 pages swap cached Aug 30 07:42:37 quartz kernel: Out of Memory: Killed process 30590 (amavisd). Aug 30 07:42:46 quartz kernel: oom-killer: gfp_mask=0xd0 .... a lot of messages suppressed .... #### quartz is now rebooted #### Aug 30 08:50:08 turquoise kernel: drbd1: PingAck did not arrive in time. Aug 30 08:50:08 turquoise kernel: drbd1: drbd1_asender [2750]: cstate Connected --> NetworkFailure Aug 30 08:50:08 turquoise kernel: drbd1: asender terminated Aug 30 08:50:08 turquoise kernel: drbd1: drbd1_receiver [2732]: cstate NetworkFailure --> BrokenPipe Aug 30 08:50:08 turquoise kernel: drbd1: short read expecting header on sock: r=-512 Aug 30 08:50:08 turquoise kernel: drbd1: worker terminated Aug 30 08:50:08 turquoise kernel: drbd1: drbd1_receiver [2732]: cstate BrokenPipe --> Unconnected Aug 30 08:50:08 turquoise kernel: drbd1: Connection lost. Aug 30 08:50:08 turquoise kernel: drbd1: drbd1_receiver [2732]: cstate Unconnected --> WFConnection Aug 30 09:41:31 quartz kernel: klogd 1.4.1, log source = /proc/kmsg started. Aug 30 09:41:31 quartz kernel: Linux version 2.6.9-11.ELsmp (bhcompile@decompose.build.redhat.com) (gcc version 3.4.3 20050227 (Red Hat 3.4.3-22)) #1 SMP Fri May 20 18:26:27 EDT 2005 Aug 30 09:41:31 quartz kernel: BIOS-provided physical RAM map: Aug 30 09:41:31 quartz kernel: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) ... messages suppressed ... Aug 30 09:41:49 quartz kernel: drbd: initialised. Version: 0.7.10 (api:77/proto:74) Aug 30 09:41:49 quartz kernel: drbd: SVN Revision: 1743 build by root@quartz.lmcp.jussieu.fr, 2005-06-29 14:44:43 Aug 30 09:41:49 quartz kernel: drbd: registered as block device major 147 Aug 30 09:41:57 quartz kernel: drbd1: resync bitmap: bits=11272192 words=352256 Aug 30 09:41:58 quartz kernel: drbd1: size = 43 GB (45088768 KB) Aug 30 09:42:00 quartz kernel: drbd1: 0 KB marked out-of-sync by on disk bit-map. Aug 30 09:42:00 quartz kernel: drbd1: Found 6 transactions (276 active extents) in activity log. Aug 30 09:42:00 quartz kernel: drbd1: Marked additional 128 MB as out-of-sync based on AL. Aug 30 09:42:03 quartz kernel: drbd1: drbdsetup [2961]: cstate Unconfigured --> StandAlone Aug 30 09:42:03 turquoise kernel: drbd1: drbd1_receiver [2732]: cstate WFConnection --> WFReportParams Aug 30 09:42:03 turquoise kernel: drbd1: Handshake successful: DRBD Network Protocol version 74 Aug 30 09:42:03 turquoise kernel: drbd1: Connection established. Aug 30 09:42:03 turquoise kernel: drbd1: I am(S): 1:00000002:00000001:00000019:00000003:01 Aug 30 09:42:03 turquoise kernel: drbd1: Peer(S): 1:00000002:00000001:00000019:00000003:11 Aug 30 09:42:03 turquoise kernel: drbd1: drbd1_receiver [2732]: cstate WFReportParams --> WFBitMapT Aug 30 09:42:03 quartz kernel: drbd1: drbdsetup [3045]: cstate StandAlone --> Unconnected Aug 30 09:42:04 turquoise kernel: drbd1: Secondary/Unknown --> Secondary/Secondary Aug 30 09:42:04 quartz kernel: drbd1: drbd1_receiver [3046]: cstate Unconnected --> WFConnection #### Why is turquoise SyncTarget ? The date are inconsistent on quartz. Aug 30 09:42:04 turquoise kernel: drbd1: drbd1_receiver [2732]: cstate WFBitMapT --> SyncTarget Aug 30 09:42:04 turquoise kernel: drbd1: Resync started as SyncTarget (need to sync 1052672 KB [263168 bits set]). Aug 30 09:42:05 quartz kernel: drbd1: drbd1_receiver [3046]: cstate WFConnection --> WFReportParams Aug 30 09:42:05 quartz kernel: drbd1: Handshake successful: DRBD Network Protocol version 74 Aug 30 09:42:06 quartz kernel: drbd1: Connection established. ##### The data on drbd1 are obviously not consistent but MDF_Consistent bit is set. Aug 30 09:42:06 quartz kernel: drbd1: I am(S): 1:00000002:00000001:00000019:00000003:11 Aug 30 09:42:07 quartz kernel: drbd1: Peer(S): 1:00000002:00000001:00000019:00000003:01 Aug 30 09:42:07 quartz kernel: drbd1: drbd1_receiver [3046]: cstate WFReportParams --> WFBitMapS Aug 30 09:42:09 quartz kernel: drbd1: Secondary/Unknown --> Secondary/Secondary Aug 30 09:42:09 quartz kernel: drbd1: drbd1_receiver [3046]: cstate WFBitMapS --> SyncSource Aug 30 09:42:09 quartz kernel: drbd1: Resync started as SyncSource (need to sync 1052672 KB [263168 bits set]). Aug 30 09:42:18 turquoise kernel: drbd1: Secondary/Secondary --> Secondary/Primary Aug 30 09:42:18 quartz kernel: drbd1: Secondary/Secondary --> Primary/Secondary Aug 30 09:42:20 quartz kernel: kjournald starting. Commit interval 5 seconds Aug 30 09:42:21 quartz kernel: EXT3-fs: recovery complete. Aug 30 09:42:21 quartz kernel: EXT3-fs: mounted filesystem with ordered data mode. Aug 30 09:42:28 quartz kernel: kjournald starting. Commit interval 5 seconds Aug 30 09:42:28 quartz kernel: EXT3 FS on drbd1, internal journal Aug 30 09:42:33 quartz kernel: EXT3-fs: drbd1: 79 orphan inodes deleted Aug 30 09:42:33 quartz kernel: EXT3-fs: recovery complete. Aug 30 09:42:34 quartz kernel: EXT3-fs: mounted filesystem with ordered data mode. Aug 30 09:44:23 quartz kernel: drbd1: Resync done (total 140 sec; paused 0 sec; 7516 K/sec) Aug 30 09:44:23 quartz kernel: drbd1: drbd1_worker [3020]: cstate SyncSource --> Connected Aug 30 09:44:23 turquoise kernel: drbd1: Resync done (total 140 sec; paused 0 sec; 7516 K/sec) Aug 30 09:44:23 turquoise kernel: drbd1: drbd1_worker [1646]: cstate SyncTarget --> Connected