[DRBD-user] Reboot either machine in the pair during active transfer and the other reboots

Henri Cook drbd at theplayboymansion.net
Sun Sep 7 00:37:54 CEST 2008

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Sorry for another post, it's something i'm working on quite actively.

So the problem then appears to be when a DRBD peer gets rebooted when
the mount is in use i.e. having a file transferred to it - the system
gets hard-rebooted (no shutdown actions are run). Shall I assume this is
a kernel error or something that's been dealt with and raise a bug with
the Ubuntu-server team to port a version > 8.0.11?

Henri Cook wrote:
> An off-list respondent suggested that I post my drbd.conf - I dont'
> have any reboot options in there (that are being used) - but I may be
> missing something. (Below)
>
> Please note that reboot-sane (a custom reboot script) emails me,
> writes to a file, waits 2 seconds, then reboots the system and is NOT
> being called in the situation I describe (that's what i originally
> thought was happening)
>
> #
> # drbd.conf
> #
>
> skip {
>
> }
>
> global {
>     # minor-count 64;
>     # dialog-refresh 5; # 5 seconds
>     usage-count yes;
> }
>
> common {
>   syncer { rate 60M; }
>   protocol C;
> }
>
> resource shared {
>
>   handlers {
>     pri-on-incon-degr "echo 'DRBD: Inconsistent - Rebooting.' >>
> /var/log/drbd.log ; /usr/local/sbin/reboot-sane";
>     pri-lost-after-sb "echo 'DRBD: A split brain situation occured.
> This node lost. Rebooted' >> /var/log/drbd.log ;
> /usr/local/sbin/reboot-sane";
>     local-io-error "echo 'DRBD: A local IO error occurred, rebooting.'
> >> /var/log/drbd.log ; /usr/local/sbin/reboot-sane";
>     pri-lost "echo 'DRBD: Pri-lost, check log files.' >>
> /var/log/drbd.log ; /usr/local/sbin/reboot-sane";
>
>     outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5 -r shared
> -p dean"; # NB: The machine we're on now is torvil, the -p option here
> is different on the other host
>    
>
>     split-brain "echo 'DRBD: A split-brain situation occurred and was
> resolved successfully.' >> /var/log/drbd.log";
>   }
>
>   startup {
>     wfc-timeout  15;
>
>     degr-wfc-timeout 30;
>
>     # In case you are using DRBD for GFS/OCFS2 you want that the
>     # startup script promotes it to primary. Nodenames are also
>     # possible instead of "both".
>     become-primary-on both;
>   }
>
>   disk {
>     on-io-error   detach;
>
>     fencing resource-only;
>   }
>
>   net {
>
>     # timeout       60;    #  6 seconds  (unit = 0.1 seconds)
>     # connect-int   10;    # 10 seconds  (unit = 1 second)
>     # ping-int      10;    # 10 seconds  (unit = 1 second)
>     # ping-timeout   5;    # 500 ms (unit = 0.1 seconds)
>
>     # max-buffers     2048;
>
>     # unplug-watermark   128;
>
>     # max-epoch-size  2048;
>
>     ko-count 4;
>
>     allow-two-primaries;
>
>     cram-hmac-alg "sha256";
>     shared-secret "w405FDS^%tngpDSFg^";
>
>     after-sb-0pri discard-older-primary;
>     after-sb-1pri discard-secondary;
>     after-sb-2pri call-pri-lost-after-sb;
>
>     rr-conflict call-pri-lost;
>
>   }
>
>   syncer {
>     al-extents 257;
>   }
>
>   on torvil {
>     device     /dev/drbd0;
>     disk       /dev/md4;
>     address    10.0.0.2:7788;
>     meta-disk  /dev/md5[0];
>   }
>
>   on dean {
>     device    /dev/drbd0;
>     disk      /dev/md4;
>     address   10.0.0.3:7788;
>     meta-disk /dev/md5[0];
>   }
> }
>
>
>
> Henri Cook wrote:
>> Please, can anyone help? This is severely affecting my setup
>>
>> If I start say, an FTP file transfer to my drbd /shared directory on
>> node A, then reboot node B which is the other machine in the
>> Primary-Primary configuration DRBD on node A register's a NetworkFailure
>> which appears to trigger a reboot action - I can't find anywhere to
>> define this behaviour, i'd very much like to stop the reboot happening.
>>
>> So to confirm behaviour, during a transfer to A onto /shared, if I
>> reboot B as soon as A loses the connection to B, A reboots also -
>> cripping the pair.
>>
>> Henri
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>   
>
>
>
>
> Henri Cook wrote:
>> Please, can anyone help? This is severely affecting my setup
>>
>> If I start say, an FTP file transfer to my drbd /shared directory on
>> node A, then reboot node B which is the other machine in the
>> Primary-Primary configuration DRBD on node A register's a NetworkFailure
>> which appears to trigger a reboot action - I can't find anywhere to
>> define this behaviour, i'd very much like to stop the reboot happening.
>>
>> So to confirm behaviour, during a transfer to A onto /shared, if I
>> reboot B as soon as A loses the connection to B, A reboots also -
>> cripping the pair.
>>
>> Henri
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20080906/5a9ea7be/attachment.htm>


More information about the drbd-user mailing list