Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
2016-09-19 10:50 GMT+02:00 Igor Cicimov <igorc at encompasscorporation.com>: > On 19 Sep 2016 5:45 pm, "Marco Marino" <marino.mrc at gmail.com> wrote: > > > > Hi, I'm trying to build an active/passive cluster with drbd and > pacemaker for a san. I'm using 2 nodes with one raid controller (megaraid) > on each one. Each node has an ssd disk that works as cache for read (and > write?) realizing the CacheCade proprietary tecnology. > > > Did you configure the CacheCade? If the write cache was enabled in > write-back mode then suddenly removing the device from under the controller > would have caused serious problems I guess since the controller expects to > write to the ssd cache firts and then flush to the hdd's. Maybe this > explains the read only mode? > Good point. It is exactly as you wrote. How can I mitigate this behavior in a clustered (active/passive) enviroment??? As I told in the other post, I think the best solution is to poweroff the node using local-io-error and switch all resources on the other node.... But please give me some suggestions.... > > Basically, the structure of the san is: > > > > Physycal disks -> RAID -> Device /dev/sdb in the OS -> Drbd resource > (that use /dev/sdb as backend) (using pacemaker with a master/slave > resource) -> VG (managed with pacemaker) -> Iscsi target (with pacemaker) > -> Iscsi LUNS (one for each logical volume in the VG, managed with > pacemaker) > > > > Few days ago, the ssd disk was wrongly removed from the primary node of > the cluster and this caused a lot of problems: drbd resource and all > logical volumes went in readonly mode with a lot of I/O errors but the > cluster did not switched to the other node. All filesystem on initiators > went to readonly mode. There are 2 problems involved here (I think): 1) Why > removing the ssd disk cause a readonly mode with I/O errors? This means > that the ssd is a single point of failure for a single node san with > megaraid controllers and CacheCade tecnology..... and 2) Why drbd not > worked as espected? > What was the state in /proc/drbd ? > Diskless > > For point 1) I'm checking with the vendor and I doubt that I can do > something > > For point 2) I have errors in the drbd configuration. My idea is that > when an I/O error happens on the primary node, the cluster should switch to > the secondary node and shut down the damaged node. > > Here -> http://pastebin.com/79dDK66m it is possible to see the actual > drbd configuration, but I need to change a lot of things and I want to > share my ideas here: > > > > 1) The "handlers" section should be moved in the "common" section of > global_common.conf and not in the resource file. > > > > 2)I'm thinking to modify the "handlers" section as follow: > > > > handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; > /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; > reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; > /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > > /proc/sysrq-trigger ; halt -f"; # Hook into Pacemaker's fencing. > fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; } > > > > > > In this way, when an I/O error happens, the node will be powered off and > pacemaker will switch resources to the other node (or at least doesn't > create problematic behaviors...) > > > > 3) I'm thinking to move the "fencing" directive from the resource to the > global_common.conf file. Furthermore, I want to change it to > > > > fencing resource-and-stonith; > > > > > > 4) Finally, in the global "net" section I need to add: > > > > after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; > after-sb-2pri disconnect; > > > > At the end of the work configuration will be -> > http://pastebin.com/r3N1gzwx > > > > Please, give me suggestion about mistakes and possible changes. > > > > Thank you > > > > > > > > _______________________________________________ > > drbd-user mailing list > > drbd-user at lists.linbit.com > > http://lists.linbit.com/mailman/listinfo/drbd-user > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20160919/4b954ad2/attachment.htm>