Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On 20 Sep 2016 5:00 pm, "Marco Marino" <marino.mrc at gmail.com> wrote: > > Furthermore there are logs from the secondary node: > > http://pastebin.com/A2ySXDCB > > > Please compare time. It seems that also on the secondary node drbd goes to diskless mode. Why? > In the secondary log you can see I/O errors too: Sep 7 19:55:19 iscsi2 kernel: end_request: I/O error, dev sdb, sector 685931856 Sep 7 19:55:19 iscsi2 kernel: block drbd1: write: error=-5 s=685931856s Sep 7 19:55:19 iscsi2 kernel: block drbd1: disk( UpToDate -> Failed ) Sep 7 19:55:19 iscsi2 kernel: block drbd1: Local IO failed in drbd_endio_write_sec_final. Detaching... and since your policy is: disk { on-io-error detach; } thats what drbd did. No disk => no master. > > > 2016-09-20 8:44 GMT+02:00 Marco Marino <marino.mrc at gmail.com>: >> >> Hi, logs can be found here: http://pastebin.com/BGR33jN6 >> >> @digimer: >> Using local-io-error should power off the node and switch the cluster on the remaing node.... is this a good idea? >> >> Regards, >> Marco >> >> 2016-09-19 12:58 GMT+02:00 Adam Goryachev <adam at websitemanagers.com.au>: >>> >>> >>> >>> On 19/09/2016 19:06, Marco Marino wrote: >>>> >>>> >>>> >>>> 2016-09-19 10:50 GMT+02:00 Igor Cicimov <igorc at encompasscorporation.com >: >>>>> >>>>> On 19 Sep 2016 5:45 pm, "Marco Marino" <marino.mrc at gmail.com> wrote: >>>>> > >>>>> > Hi, I'm trying to build an active/passive cluster with drbd and pacemaker for a san. I'm using 2 nodes with one raid controller (megaraid) on each one. Each node has an ssd disk that works as cache for read (and write?) realizing the CacheCade proprietary tecnology. >>>>> > >>>>> Did you configure the CacheCade? If the write cache was enabled in write-back mode then suddenly removing the device from under the controller would have caused serious problems I guess since the controller expects to write to the ssd cache firts and then flush to the hdd's. Maybe this explains the read only mode? >>>> >>>> Good point. It is exactly as you wrote. How can I mitigate this behavior in a clustered (active/passive) enviroment??? As I told in the other post, I think the best solution is to poweroff the node using local-io-error and switch all resources on the other node.... But please give me some suggestions.... >>>> >>> >>>> >>>>> >>>>> > Basically, the structure of the san is: >>>>> > >>>>> > Physycal disks -> RAID -> Device /dev/sdb in the OS -> Drbd resource (that use /dev/sdb as backend) (using pacemaker with a master/slave resource) -> VG (managed with pacemaker) -> Iscsi target (with pacemaker) -> Iscsi LUNS (one for each logical volume in the VG, managed with pacemaker) >>>>> > >>>>> > Few days ago, the ssd disk was wrongly removed from the primary node of the cluster and this caused a lot of problems: drbd resource and all logical volumes went in readonly mode with a lot of I/O errors but the cluster did not switched to the other node. All filesystem on initiators went to readonly mode. There are 2 problems involved here (I think): 1) Why removing the ssd disk cause a readonly mode with I/O errors? This means that the ssd is a single point of failure for a single node san with megaraid controllers and CacheCade tecnology..... and 2) Why drbd not worked as espected? >>>>> What was the state in /proc/drbd ? >>>> >>>> >>> I think you will need to examine the logs to find out what happened. It would appear (just making a wild guess) that either the cache is happening between DRBD and iSCSI instead of between DRBD and RAID. If it happened under DRBD then DRBD should see the read/write error, and should automatically fail the local storage. It wouldn't necessarily failover to the secondary, but it would do all read/write from the secondary node. The fact this didn't happen makes it look like the failure happened above DRBD. >>> >>> At least that is my understanding of how it will work in that scenario. >>> >>> Regards, >>> Adam >>> >>> _______________________________________________ >>> drbd-user mailing list >>> drbd-user at lists.linbit.com >>> http://lists.linbit.com/mailman/listinfo/drbd-user >>> >> > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160920/b7825c3b/attachment.htm>