Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi,
my setup:
drbd-0.7.24
2 drbd devices (active/active in the 0.7.x sense), one for nfs (drbd0)
and one for database files/postgresql (drbd1).
For simplicity's sake, let's call the servers nfs and database as well.
This morning, I failed over the nfs (drbd0) device to the second server,
upgraded nfs-utils on nfs (now secondary) server and failed the resource
back over to it, restarted the services and everything went as planned...
...not quite:
although this part worked, nfs is now occasionally spewing messages like
this:
---snip---
Dec 7 10:29:43 nfs kernel: drbd1: 195 messages suppressed in
/usr/src/2.6/2.6/2.6.12.6/drbd-0.7.24/drbd/drbd_req.c:214.
Dec 7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests
allowed
Dec 7 10:29:43 nfs kernel: printk: 119 messages suppressed.
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 11645072
Dec 7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests
allowed
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 11645072
Dec 7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests
allowed
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 0
Dec 7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests
allowed
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 1
Dec 7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests
allowed
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 2
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 3
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 4
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 5
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 6
Dec 7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical
block 7
---snip---
note that drbd1 was not even mentioned in this procedure and indeed,
no-one messed with it as far as I can tell.
The only thing I noticed out of the ordinary is that the servers clocks
had wandered off each other by a few (20 or so) seconds due to a hastily
written ntpdaemon init script.
I have also invalidated the secondary database device, did a full sync
but the messages keep coming at 5-10 minute intervals.
Any ideas what these mean / how to make them stop / correct the problem
(if any) ?
(I am certain that no daemon, programme, script, whatever is trying to
do direct i/o to /dev/drbd1 - apart from drbd itself, that is)
Also note that everything, apart from those messages, seems to be in
order, i.e. cat /proc/drbd shows
---snip---
version: 0.7.24 (api:79/proto:74)
SVN Revision: 2875 build by @nfs, 2007-09-27 21:19:02
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:1791478308 nr:168499164 dw:1966908328 dr:832356966 al:15116024
bm:20140 lo:0 pe:0 ua:0 ap:0
1: cs:Connected st:Secondary/Primary ld:Consistent
ns:1653508 nr:166246668 dw:166908692 dr:8437441 al:12636 bm:8913
lo:0 pe:0 ua:0 ap:0
---snip---
on nfs and the opposite on the database.