Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
I'm running an older version of drbd (0.6.12). I am planning on upgrading to
0.7.20 this week. However, I'm not sure if that will necessarily solve my
problem. I have three devices and the third one fails to sync.
Filesystem Size Used Avail Use% Mounted on
/dev/nb0 105G 83G 17G 83% /mirror partition 1 on drive 1
/dev/nb1 113G 93G 15G 86% /nbu partition 2 on drive 1
/dev/nb2 688G 6.1G 646G 1% /mbu partition 1 on drive 2
0: cs:SyncingAll st:Primary/Secondary ns:9832972 nr:0 dw:39756 dr:9821117
pe:41 ua:0
[=>..................] sync'ed: 8.8% (99433/109003)M
finish: 7:40:03h speed: 3,696 (2,738) K/sec
1: cs:SyncingAll st:Primary/Secondary ns:9816424 nr:0 dw:9448 dr:9835789 pe:39
ua:0
[=>..................] sync'ed: 8.2% (107782/117368)M
finish: 8:27:08h speed: 3,631 (2,743) K/sec
2: cs:BrokenPipe st:Primary/Secondary ns:5521284 nr:0 dw:3352 dr:5525309
pe:7085 ua:0
NEEDS_SYNC
They all sync through the same 100 Mb pipe. I'm guessing the third
one, /dev/nb2, is timing out because of too much traffic. That would explain
why I see the status as 'BrokenPipe'. Is this a reasonable scenario?
If so, then putting a second network card in both nodes should keep this from
happening. Or have I just hit a limitation in 0.6.12 and upgrading will solve
the problem. I guess since I'm planning on upgrading anyway, what I'd like to
know if it is necessary to purchase two more network cards.
Thanks for your help,
Tom
PS: Here's my drbd.conf
resource drbd0 {
protocol = C
fsckcmd = /bin/true
inittimeout=120
skip-wait
disk {
do-panic
disk-size = 111619584k
}
net {
sync-min = 999k
sync-max = 12M # maximal average syncer bandwidth
tl-size = 8000 # transfer log size, ensures strict write ordering
timeout = 60 # unit: 0.1 seconds
connect-int = 10 # unit: seconds
ping-int = 10 # unit: seconds
ko-count = 4 # if some block send times out this many times,
# the peer is considered dead, even if it still
# answeres ping requests
}
on zan {
device = /dev/nb0
disk = /dev/hdb1
address = 192.168.1.3
port = 7788
}
on jayna {
device = /dev/nb0
disk = /dev/hdb1
address = 192.168.1.4
port = 7788
}
}
resource drbd1 {
protocol = C
fsckcmd = /bin/true
inittimeout=120
skip-wait
disk {
do-panic
disk-size = 120185300k
}
net {
sync-min = 999k
sync-max = 12M # maximal average syncer bandwidth
tl-size = 8000 # transfer log size, ensures strict write ordering
timeout = 60 # unit: 0.1 seconds
connect-int = 10 # unit: seconds
ping-int = 10 # unit: seconds
ko-count = 4 # if some block send times out this many times,
# the peer is considered dead, even if it still
# answeres ping requests
}
on zan {
device = /dev/nb1
disk = /dev/hdb2
address = 192.168.1.3
port = 7789
}
on jayna {
device = /dev/nb1
disk = /dev/hdb2
address = 192.168.1.4
port = 7789
}
}
resource drbd2 {
protocol = C
fsckcmd = /bin/true
inittimeout=120
skip-wait
disk {
do-panic
disk-size = 732572000k
}
net {
sync-min = 999k
sync-max = 12M # maximal average syncer bandwidth
tl-size = 8000 # transfer log size, ensures strict write ordering
timeout = 60 # unit: 0.1 seconds
connect-int = 10 # unit: seconds
ping-int = 10 # unit: seconds
ko-count = 4 # if some block send times out this many times,
# the peer is considered dead, even if it still
# answeres ping requests
}
on zan {
device = /dev/nb2
disk = /dev/hdd1
address = 192.168.1.3
port = 7790
}
on jayna {
device = /dev/nb2
disk = /dev/hdd1
address = 192.168.1.4
port = 7790
}
}