Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I'm running an older version of drbd (0.6.12). I am planning on upgrading to 0.7.20 this week. However, I'm not sure if that will necessarily solve my problem. I have three devices and the third one fails to sync. Filesystem Size Used Avail Use% Mounted on /dev/nb0 105G 83G 17G 83% /mirror partition 1 on drive 1 /dev/nb1 113G 93G 15G 86% /nbu partition 2 on drive 1 /dev/nb2 688G 6.1G 646G 1% /mbu partition 1 on drive 2 0: cs:SyncingAll st:Primary/Secondary ns:9832972 nr:0 dw:39756 dr:9821117 pe:41 ua:0 [=>..................] sync'ed: 8.8% (99433/109003)M finish: 7:40:03h speed: 3,696 (2,738) K/sec 1: cs:SyncingAll st:Primary/Secondary ns:9816424 nr:0 dw:9448 dr:9835789 pe:39 ua:0 [=>..................] sync'ed: 8.2% (107782/117368)M finish: 8:27:08h speed: 3,631 (2,743) K/sec 2: cs:BrokenPipe st:Primary/Secondary ns:5521284 nr:0 dw:3352 dr:5525309 pe:7085 ua:0 NEEDS_SYNC They all sync through the same 100 Mb pipe. I'm guessing the third one, /dev/nb2, is timing out because of too much traffic. That would explain why I see the status as 'BrokenPipe'. Is this a reasonable scenario? If so, then putting a second network card in both nodes should keep this from happening. Or have I just hit a limitation in 0.6.12 and upgrading will solve the problem. I guess since I'm planning on upgrading anyway, what I'd like to know if it is necessary to purchase two more network cards. Thanks for your help, Tom PS: Here's my drbd.conf resource drbd0 { protocol = C fsckcmd = /bin/true inittimeout=120 skip-wait disk { do-panic disk-size = 111619584k } net { sync-min = 999k sync-max = 12M # maximal average syncer bandwidth tl-size = 8000 # transfer log size, ensures strict write ordering timeout = 60 # unit: 0.1 seconds connect-int = 10 # unit: seconds ping-int = 10 # unit: seconds ko-count = 4 # if some block send times out this many times, # the peer is considered dead, even if it still # answeres ping requests } on zan { device = /dev/nb0 disk = /dev/hdb1 address = 192.168.1.3 port = 7788 } on jayna { device = /dev/nb0 disk = /dev/hdb1 address = 192.168.1.4 port = 7788 } } resource drbd1 { protocol = C fsckcmd = /bin/true inittimeout=120 skip-wait disk { do-panic disk-size = 120185300k } net { sync-min = 999k sync-max = 12M # maximal average syncer bandwidth tl-size = 8000 # transfer log size, ensures strict write ordering timeout = 60 # unit: 0.1 seconds connect-int = 10 # unit: seconds ping-int = 10 # unit: seconds ko-count = 4 # if some block send times out this many times, # the peer is considered dead, even if it still # answeres ping requests } on zan { device = /dev/nb1 disk = /dev/hdb2 address = 192.168.1.3 port = 7789 } on jayna { device = /dev/nb1 disk = /dev/hdb2 address = 192.168.1.4 port = 7789 } } resource drbd2 { protocol = C fsckcmd = /bin/true inittimeout=120 skip-wait disk { do-panic disk-size = 732572000k } net { sync-min = 999k sync-max = 12M # maximal average syncer bandwidth tl-size = 8000 # transfer log size, ensures strict write ordering timeout = 60 # unit: 0.1 seconds connect-int = 10 # unit: seconds ping-int = 10 # unit: seconds ko-count = 4 # if some block send times out this many times, # the peer is considered dead, even if it still # answeres ping requests } on zan { device = /dev/nb2 disk = /dev/hdd1 address = 192.168.1.3 port = 7790 } on jayna { device = /dev/nb2 disk = /dev/hdd1 address = 192.168.1.4 port = 7790 } }