[DRBD-user] Problems upgrading to pre9

Jeff Tucker jeff at jltnet.com
Wed Jul 7 16:25:55 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi,

I tried to upgrade a system running 0.7-pre8 to 0.7-pre9 last night and had 
real problems. I never was able to make it work and finally managed to 
downgrade back to pre8.

On bringing up the primary with pre9, I get lots of log entries like this:
Jul  6 18:12:57 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == 
DRBD_MAGIC )
 in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:706
Jul  6 18:13:34 omega3 last message repeated 80 times
Jul  6 18:14:29 omega3 last message repeated 274 times
Jul  6 18:15:30 omega3 last message repeated 87 times
Jul  6 18:16:40 omega3 last message repeated 397 times
Jul  6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == 
DRBD_MAGIC )
 in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:483
Jul  6 18:16:40 omega3 last message repeated 3292 times
Jul  6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == 
DRBD_MAGIC )
 in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:792
Jul  6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == 
DRBD_MAGIC )
 in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:792
Jul  6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == 
DRBD_MAGIC )
 in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:729
Jul  6 18:16:40 omega3 last message repeated 7 times
Jul  6 18:16:40 omega3 kernel: drbd0: receive_Data: (data_size & 0x1ff) in 
/home
/jeff/Software/drbd-0.7_pre9/drbd/drbd_receiver.c:1029
Jul  6 18:16:40 omega3 kernel: drbd0: error receiving Data, l: 0!
Jul  6 18:16:40 omega3 kernel: drbd0: ASSERT( mdev->cstate < Connected ) in 
/hom
e/jeff/Software/drbd-0.7_pre9/drbd/drbd_receiver.c:1659
Jul  6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == 
DRBD_MAGIC )
 in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:483
Jul  6 18:16:40 omega3 kernel: drbd0: ASSERT( thi->t_state == Restarting ) 
in /h
ome/jeff/Software/drbd-0.7_pre9/drbd/drbd_receiver.c:1790
Jul  6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == 
DRBD_MAGIC )
 in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:483

These continued, but the primary was able to run as long as it wasn't 
connected to the secondary.

But after the secondary was connected (I think that's what did it) I got:

Jul  6 18:16:54 omega3 kernel: Unable to handle kernel NULL pointer 
dereference
at virtual address 0000000b
Jul  6 18:16:54 omega3 kernel:  printing eip:
Jul  6 18:16:54 omega3 kernel: f8cc13f9
Jul  6 18:16:54 omega3 kernel: *pde = 36ec9001
Jul  6 18:16:54 omega3 kernel: *pte = 00000000
Jul  6 18:16:54 omega3 kernel: Oops: 0000 [#1]
Jul  6 18:16:54 omega3 kernel: PREEMPT SMP
Jul  6 18:16:54 omega3 kernel: Modules linked in: drbd
Jul  6 18:16:54 omega3 kernel: CPU:    1
Jul  6 18:16:54 omega3 kernel: EIP:    0060:[<f8cc13f9>]    Not tainted
Jul  6 18:16:54 omega3 kernel: EFLAGS: 00010286   (2.6.7)
Jul  6 18:16:54 omega3 kernel: EIP is at got_BlockAck+0x5e/0x1ef [drbd]
Jul  6 18:16:54 omega3 kernel: eax: f6ed62b4   ebx: 0000b0c8   ecx: 
00000000   e
dx: ffffffff
Jul  6 18:16:54 omega3 kernel: esi: 00000000   edi: f6ed6000   ebp: 
00000000   e
sp: f4f09f64
Jul  6 18:16:54 omega3 kernel: ds: 007b   es: 007b   ss: 0068
Jul  6 18:16:54 omega3 kernel: Process drbd0_asender (pid: 1062, 
threadinfo=f4f0
8000 task=f7f5ac50)
Jul  6 18:16:54 omega3 kernel: Stack: f4f09f54 00000001 00000000 f4f08000 
000002
86 00000018 f6ed62d4 f6ed6000
Jul  6 18:16:54 omega3 kernel:        00000020 f8cc1aed f6ed6000 f6ed62b4 
000000
18 f6ed6440 00000282 0000000f
Jul  6 18:16:54 omega3 kernel:        00000020 f6ed62b4 00000003 00000001 
000000
00 00000000 f4f08000 f6ed6438
Jul  6 18:16:54 omega3 kernel: Call Trace:
Jul  6 18:16:54 omega3 kernel:  [<f8cc1aed>] drbd_asender+0x1bc/0x458 [drbd]
Jul  6 18:16:54 omega3 kernel:  [<f8cc6eb2>] drbd_thread_setup+0x79/0xff 
[drbd]
Jul  6 18:16:54 omega3 kernel:  [<f8cc6e39>] drbd_thread_setup+0x0/0xff 
[drbd]
Jul  6 18:16:54 omega3 kernel:  [<c0104271>] kernel_thread_helper+0x5/0xb
Jul  6 18:16:54 omega3 kernel:
Jul  6 18:16:54 omega3 kernel: Code: 8b 42 0c 35 67 02 74 83 39 d0 74 05 b9 
01 0
0 00 00 85 c9 74
Jul  6 18:16:54 omega3 kernel:  <3>drbd0: 
/home/jeff/Software/drbd-0.7_pre9/drbd
/drbd_receiver.c:1102: sector: 0, size: 0
Jul  6 18:16:54 omega3 kernel: drbd0: error receiving RSDataRequest, l: 24!
Jul  6 18:16:54 omega3 kernel: drbd0: ASSERT( mdev->cstate < Connected ) in 
/hom
e/jeff/Software/drbd-0.7_pre9/drbd/drbd_receiver.c:1659

and a bunch more errors. At that point, I couldn't un-export the NFS 
partition, couldn't unmount /dev/nb0 and couldn't reboot the machine 
without driving to it and hitting the reset button. (If anyone knows a way 
around that, I'd like to hear it. It was up and running, but these 
filesystems were blocking the normal shutdown procedures.)

The slave has similar error messages about the ASSERT in drbd_bitmap.c.

I rebooted the machines one at a time. The machines were active during the 
upgrade, so there was definitely data being written to the primary with the 
secondary offline while it was being rebooted with the new module. When the 
secondary would start up and the machines would connect, the master would 
show that it was syncing to the secondary. However, the sync never 
completed or progressed very far. I think it was the start of the sync that 
was making it crash.

All of this was done with kernel 2.6.7. I can do some testing and I can 
bring down the machines entirely for short periods if I need to. I was 
hoping to be able to upgrade on the fly. However, these machines are 
pre-production and being used for beta testing at the present time. I was 
eventually able to move both machines back to -pre8 where they connect and 
sync just fine.

Thanks
Jeff

-- 
jeff at jltnet.com



More information about the drbd-user mailing list