Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Using drbd 0.7.10 on kernel 2.6.9 (Red Hat Enterprise Linux AS v4) I experienced what seems to be a bug. Here is the scenario: 1) An I/O error occurred on the disk on the primary node 2) The disk is detached and the primary node is set diskless 3) All is running without problem for 2 days 4) The kernel on the primary node run out off memory. I don't know why. Perhaps drbd is requiring to much memory (due to a memory leak ?). But I don't think it is the most important issue. 5) The primary froze and had to be rebooted 6) drbd on the primary restarted and a synchronisation from primary to secondary is then performed. Obviously it was the wrong thing to do and some data were lost. Before rebooting the primary was diskless so the consistent data was on the secondary and the synchronisation should have been from secondary to primary. Analysing the log I have found: 1) Never the secondary wrote a message saying the primary host switched to diskless because of an error on the disk. Probably it never received the information. 2) The data on the primary is tagged as consistent after the restart. After looking at the source I am asking some questions: The message in the log file "Local IO failed. Detaching" is written by the function drbd_chk_io_error. The bits MD_IO_ALLOWED, MDF_FullSync, DISKLESS and indirectly MD_DIRTY are set and the bit MDF_Consistent is cleared in this function. But contrary to the function drbd_io_error there is no call to drbd_send_param or to drbd_md_write. Often in the source the call to drbd_chk_io_error is followed by one to drbd_io_error but not everywhere. I suppose the problem was due to the fact the inconsistent status was never written to the disk and never sent to the other node. So after a restart no information in the meta data or from the other node can tell the disk was inconsistent. Am I wrong ? I attach the configuration and log files. -- François Morris Francois.Morris at impmc.jussieu.fr IMPMC, Université P. et M. Curie, CNRS case 115 - 4, place Jussieu - 75252 PARIS CEDEX 05 - FRANCE Tel: +33144275073 Fax: +33144273785 http://www.impmc.jussieu.fr/~morris -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: log.txt URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20050921/9bf88e3a/attachment.txt> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: drbd.conf URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20050921/9bf88e3a/attachment-0001.txt> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3630 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20050921/9bf88e3a/attachment.bin>