Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Dear Florian, It seems like the problem is solved by doing #mount -t rpc-pipefs sunrpc /var/lib/nfs/rpc-pipefs/ We have checked all the cron jobs, but none indicates that why this happened, cause none actually is scheduled at those time when the IO error comes up. I am not sure how to get the log that you requested. But i would like to know how in the future. Can you please tell me how? So far, the only log that i can see for the drbd and HA is through dmesg, ha-log, ha-debug, cat /proc/drbd, or check if the smb, nfs service is running or not. . Is there other way to ensure that all my things are doing there job? The reason for us to still use DRBD 0.7 is i dare not do the migration to 0.8 yet. due to the slight different with the configuration. Last year when i try, it stop working. But we are planning to do the 'upgrading' for both OS and drbd in the near future. Basically what i will be doing is to reinstall both to Fedora 8 and resetup the DRBD, HA, HP tapeware and vmware. I think the DRBD and HA should be ok, but my tapeware.. mm.. not sure if it will work after that, because it is using some old lib and things. *sigh* Final question is, is my drbd.conf setting maximizing the throughput? I only venture into setting up the max-biffers, unplug-watermark, max-epoch-size and al-extents after i hit those IO error. Basically what i did was to uncomment the net{} and up the number for al-extents. But only for my r0, not r1. Should i do that for both resources? Should both r0 and r1 has the same settings? Again. FYI, we are using GB LAN. Hope to hear from you soon. Thank you. Warm Regards, Cindy KS TOH *************drbd.conf************************** resource r0 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { wfc-timeout 60; degr-wfc-timeout 60; # 1 minute. } disk { on-io-error detach; } net { max-buffers 2048; unplug-watermark 128; max-epoch-size 2048; } syncer { rate 100M; group 1; al-extents 1801; } on dfs1 { device /dev/drbd0; disk /dev/VolGroup01/LogVol02; address 192.168.1.10:7788; meta-disk /dev/VolGroup01/LogVol00[0]; } on dfs2 { device /dev/drbd0; disk /dev/VolGroup01/LogVol02; address 192.168.1.12:7788; meta-disk /dev/VolGroup01/LogVol00[0]; } } resource r1 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { wfc-timeout 60; ## 1 minute. degr-wfc-timeout 60; ## 1 minute. } disk { on-io-error detach; } net { } syncer { rate 100M; group 1; # sync concurrently with r0 } on dfs1 { device /dev/drbd1; disk /dev/VolGroup01/LogVol03; address 192.168.1.10:7789; meta-disk /dev/VolGroup01/LogVol00[1]; } on dfs2 { device /dev/drbd1; disk /dev/VolGroup01/LogVol03; address 192.168.1.12:7789; meta-disk /dev/VolGroup01/LogVol00[1]; } } > Date: Mon, 14 Jul 2008 21:07:23 +0200 > From: Florian Haas <florian.haas at linbit.com> > Subject: Re: [DRBD-user] DRBD I/O error? or.... > To: drbd-user at lists.linbit.com > Message-ID: <487BA3EB.6050406 at linbit.com> > Content-Type: text/plain; charset=UTF-8 > > It's nice of you to include all that information about your config, but > it would be a good idea to include some details of the _errors_ you are > actually getting (system log snippets, primarily). > > About these recurring issues you seem to be having "daily after 8pm", > it's probably wise to investigate your cron setup. > > And, is there any particular reason you're still using DRBD 0.7? > > Cheers, > Florian > > Cindy KS TOH wrote: > >> Hi, >> >> I hope someone can help me and understand what i am trying to say. >> We have encounter some I/O error according to our programmer team. >> The DRBD + HA servers were setup and tested by myself. (I am quite new >> to this myself) >> I have provided all information that i can think of in this email. >> >> Our problem is, from now and then 'something' will be running and >> transmit between my 2 drbd servers. They suspect it's drbd issue that >> caused them the problem connecting back to the storage server. >> >> [...] >> > >