[DRBD-user] Someone's trying to do i/o on the secondary ?

Fri Dec 7 10:08:48 CET 2007

Hi,

my setup:
drbd-0.7.24
2 drbd devices (active/active in the 0.7.x sense), one for nfs (drbd0) 
and one for database files/postgresql (drbd1).
For simplicity's sake, let's call the servers nfs and database as well.

This morning, I failed over the nfs (drbd0) device to the second server,
upgraded nfs-utils on nfs (now secondary) server and failed the resource 
back over to it, restarted the services and everything went as planned...
...not quite:
although this part worked, nfs is now occasionally spewing messages like 
this:

---snip---
Dec  7 10:29:43 nfs kernel: drbd1: 195 messages suppressed in 
/usr/src/2.6/2.6/2.6.12.6/drbd-0.7.24/drbd/drbd_req.c:214.
Dec  7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests 
allowed
Dec  7 10:29:43 nfs kernel: printk: 119 messages suppressed.
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 11645072
Dec  7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests 
allowed
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 11645072
Dec  7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests 
allowed
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 0
Dec  7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests 
allowed
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 1
Dec  7 10:29:43 nfs kernel: drbd1: Not in Primary state, no IO requests 
allowed
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 2
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 3
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 4
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 5
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 6
Dec  7 10:29:43 nfs kernel: Buffer I/O error on device drbd1, logical 
block 7
---snip---

note that drbd1 was not even mentioned in this procedure and indeed, 
no-one messed with it as far as I can tell.
The only thing I noticed out of the ordinary is that the servers clocks 
had wandered off each other by a few (20 or so) seconds due to a hastily 
written ntpdaemon init script.
I have also invalidated the secondary database device, did a full sync 
but the messages keep coming at 5-10 minute intervals.

Any ideas what these mean / how to make them stop / correct the problem 
(if any) ?

(I am certain that no daemon, programme, script, whatever is trying to 
do direct i/o to /dev/drbd1 - apart from drbd itself, that is)

Also note that everything, apart from those messages, seems to be in 
order, i.e. cat /proc/drbd shows
---snip---
version: 0.7.24 (api:79/proto:74)
SVN Revision: 2875 build by @nfs, 2007-09-27 21:19:02
 0: cs:Connected st:Primary/Secondary ld:Consistent 
    ns:1791478308 nr:168499164 dw:1966908328 dr:832356966 al:15116024 
bm:20140 lo:0 pe:0 ua:0 ap:0
 1: cs:Connected st:Secondary/Primary ld:Consistent 
    ns:1653508 nr:166246668 dw:166908692 dr:8437441 al:12636 bm:8913 
lo:0 pe:0 ua:0 ap:0
---snip---
on nfs and the opposite on the database.