Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I tried to upgrade a system running 0.7-pre8 to 0.7-pre9 last night and had real problems. I never was able to make it work and finally managed to downgrade back to pre8. On bringing up the primary with pre9, I get lots of log entries like this: Jul 6 18:12:57 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == DRBD_MAGIC ) in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:706 Jul 6 18:13:34 omega3 last message repeated 80 times Jul 6 18:14:29 omega3 last message repeated 274 times Jul 6 18:15:30 omega3 last message repeated 87 times Jul 6 18:16:40 omega3 last message repeated 397 times Jul 6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == DRBD_MAGIC ) in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:483 Jul 6 18:16:40 omega3 last message repeated 3292 times Jul 6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == DRBD_MAGIC ) in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:792 Jul 6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == DRBD_MAGIC ) in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:792 Jul 6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == DRBD_MAGIC ) in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:729 Jul 6 18:16:40 omega3 last message repeated 7 times Jul 6 18:16:40 omega3 kernel: drbd0: receive_Data: (data_size & 0x1ff) in /home /jeff/Software/drbd-0.7_pre9/drbd/drbd_receiver.c:1029 Jul 6 18:16:40 omega3 kernel: drbd0: error receiving Data, l: 0! Jul 6 18:16:40 omega3 kernel: drbd0: ASSERT( mdev->cstate < Connected ) in /hom e/jeff/Software/drbd-0.7_pre9/drbd/drbd_receiver.c:1659 Jul 6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == DRBD_MAGIC ) in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:483 Jul 6 18:16:40 omega3 kernel: drbd0: ASSERT( thi->t_state == Restarting ) in /h ome/jeff/Software/drbd-0.7_pre9/drbd/drbd_receiver.c:1790 Jul 6 18:16:40 omega3 kernel: drbd0: ASSERT( b->bm[b->bm_words] == DRBD_MAGIC ) in /home/jeff/Software/drbd-0.7_pre9/drbd/drbd_bitmap.c:483 These continued, but the primary was able to run as long as it wasn't connected to the secondary. But after the secondary was connected (I think that's what did it) I got: Jul 6 18:16:54 omega3 kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000000b Jul 6 18:16:54 omega3 kernel: printing eip: Jul 6 18:16:54 omega3 kernel: f8cc13f9 Jul 6 18:16:54 omega3 kernel: *pde = 36ec9001 Jul 6 18:16:54 omega3 kernel: *pte = 00000000 Jul 6 18:16:54 omega3 kernel: Oops: 0000 [#1] Jul 6 18:16:54 omega3 kernel: PREEMPT SMP Jul 6 18:16:54 omega3 kernel: Modules linked in: drbd Jul 6 18:16:54 omega3 kernel: CPU: 1 Jul 6 18:16:54 omega3 kernel: EIP: 0060:[<f8cc13f9>] Not tainted Jul 6 18:16:54 omega3 kernel: EFLAGS: 00010286 (2.6.7) Jul 6 18:16:54 omega3 kernel: EIP is at got_BlockAck+0x5e/0x1ef [drbd] Jul 6 18:16:54 omega3 kernel: eax: f6ed62b4 ebx: 0000b0c8 ecx: 00000000 e dx: ffffffff Jul 6 18:16:54 omega3 kernel: esi: 00000000 edi: f6ed6000 ebp: 00000000 e sp: f4f09f64 Jul 6 18:16:54 omega3 kernel: ds: 007b es: 007b ss: 0068 Jul 6 18:16:54 omega3 kernel: Process drbd0_asender (pid: 1062, threadinfo=f4f0 8000 task=f7f5ac50) Jul 6 18:16:54 omega3 kernel: Stack: f4f09f54 00000001 00000000 f4f08000 000002 86 00000018 f6ed62d4 f6ed6000 Jul 6 18:16:54 omega3 kernel: 00000020 f8cc1aed f6ed6000 f6ed62b4 000000 18 f6ed6440 00000282 0000000f Jul 6 18:16:54 omega3 kernel: 00000020 f6ed62b4 00000003 00000001 000000 00 00000000 f4f08000 f6ed6438 Jul 6 18:16:54 omega3 kernel: Call Trace: Jul 6 18:16:54 omega3 kernel: [<f8cc1aed>] drbd_asender+0x1bc/0x458 [drbd] Jul 6 18:16:54 omega3 kernel: [<f8cc6eb2>] drbd_thread_setup+0x79/0xff [drbd] Jul 6 18:16:54 omega3 kernel: [<f8cc6e39>] drbd_thread_setup+0x0/0xff [drbd] Jul 6 18:16:54 omega3 kernel: [<c0104271>] kernel_thread_helper+0x5/0xb Jul 6 18:16:54 omega3 kernel: Jul 6 18:16:54 omega3 kernel: Code: 8b 42 0c 35 67 02 74 83 39 d0 74 05 b9 01 0 0 00 00 85 c9 74 Jul 6 18:16:54 omega3 kernel: <3>drbd0: /home/jeff/Software/drbd-0.7_pre9/drbd /drbd_receiver.c:1102: sector: 0, size: 0 Jul 6 18:16:54 omega3 kernel: drbd0: error receiving RSDataRequest, l: 24! Jul 6 18:16:54 omega3 kernel: drbd0: ASSERT( mdev->cstate < Connected ) in /hom e/jeff/Software/drbd-0.7_pre9/drbd/drbd_receiver.c:1659 and a bunch more errors. At that point, I couldn't un-export the NFS partition, couldn't unmount /dev/nb0 and couldn't reboot the machine without driving to it and hitting the reset button. (If anyone knows a way around that, I'd like to hear it. It was up and running, but these filesystems were blocking the normal shutdown procedures.) The slave has similar error messages about the ASSERT in drbd_bitmap.c. I rebooted the machines one at a time. The machines were active during the upgrade, so there was definitely data being written to the primary with the secondary offline while it was being rebooted with the new module. When the secondary would start up and the machines would connect, the master would show that it was syncing to the secondary. However, the sync never completed or progressed very far. I think it was the start of the sync that was making it crash. All of this was done with kernel 2.6.7. I can do some testing and I can bring down the machines entirely for short periods if I need to. I was hoping to be able to upgrade on the fly. However, these machines are pre-production and being used for beta testing at the present time. I was eventually able to move both machines back to -pre8 where they connect and sync just fine. Thanks Jeff -- jeff at jltnet.com