Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello, I'm trying to set up DRBD 0.7.5 on 2.6.10-rc1 with two 4GB RAM dual Xeon machines replicating ten ~500GB volumes. My drbd.conf is mostly standard stuff, I left all the timeouts , protocol, startup, and disk directives as default. I should probably mention that I had to patch the original 0.7.5 drbd because of a variable name change in 2.6.10-rc1 (can't remember which variable it was right now, but I doubt it has anything to do with my problem, it's a pretty trivial patch) So I have 10 volumes on this machine going from vol21 to vol30. Problem 1: --------- When I start drbd for the first time I get something like this : /etc/init.d/drbd start Starting DRBD resources: [vol21]Child process does not terminate! Exiting. If I start again: Starting DRBD resources: [vol21][vol22]Child process does not terminate! Exiting. and so on : one more volume is completed every time I launch the script. Looking at the source code, I believe the timeout has to do with the SLEEPS_LONG / SLEEPS_VERY_LONG values that probably aren't enough in my case (but i'm not sure if I'm hitting the "long" or "very long" timeout). Can these be made into module parameters, or at least #define's ? Eventually, after enough launches, all volumes are corrected initialized, except... Problem 2: --------- It goes on like this until [vol27], where the init script hangs, and I have this in the kernel logs: drbd8: Creating state block allocation failed: out of vmalloc space - use vmalloc=<size> to increase size. drbd8: BM resizing failed. Leaving size unchanged at size = 488333312 KB drbd8: Assuming that all blocks are out of sync (aka FullSync) drbd8: drbd_bm_set_all: (!(b && b->bm)) in drivers/block/drbd/drbd_bitmap.c:521 [<f88ea0b9>] drbd_bm_set_all+0x15a/0x15f [drbd] [<f88ec6b5>] drbd_ioctl_set_disk+0x625/0x739 [drbd] [<c0240164>] __copy_to_user_ll+0x68/0x6c [<f88edd9b>] drbd_ioctl+0x7d7/0xc04 [drbd] [<f88ed5c4>] drbd_ioctl+0x0/0xc04 [drbd] [<c026a650>] blkdev_ioctl+0x86/0x41b [<c015bf06>] block_ioctl+0x0/0x15 [<c01659fb>] sys_ioctl+0x101/0x27a [<c0105963>] syscall_call+0x7/0xb Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: f88e9f9c *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: drbd tg3 CPU: 0 EIP: 0060:[<f88e9f9c>] Not tainted VLI EFLAGS: 00010046 (2.6.10-rc1) EIP is at drbd_bm_set_all+0x3d/0x15f [drbd] eax: 00000000 ebx: 00000000 ecx: c036b154 edx: 00000000 esi: f5fe6180 edi: f5bd2960 ebp: f6479c00 esp: f5a92e40 ds: 007b es: 007b ss: 0068 Process drbdsetup (pid: 18448, threadinfo=f5a92000 task=f63a5a20) Stack: f8900540 00000008 f88fff2d f8900320 00000209 f6479c04 00000000 f5fe6180 f5bd2960 bffff220 f88ec6b5 f5bd2960 00000008 000001c9 f7eaac58 c03b8380 00000008 f5fe6b80 f76420bc f7f6e280 f65dc680 00000001 00000005 00000000 Call Trace: [<f88ec6b5>] drbd_ioctl_set_disk+0x625/0x739 [drbd] [<c0240164>] __copy_to_user_ll+0x68/0x6c [<f88edd9b>] drbd_ioctl+0x7d7/0xc04 [drbd] [<f88ed5c4>] drbd_ioctl+0x0/0xc04 [drbd] [<c026a650>] blkdev_ioctl+0x86/0x41b [<c015bf06>] block_ioctl+0x0/0x15 [<c01659fb>] sys_ioctl+0x101/0x27a [<c0105963>] syscall_call+0x7/0xb Code: 00 85 ed 74 07 8b 4d 00 85 c9 75 05 b8 01 00 00 00 85 c0 0f 85 f1 00 00 00 8d 45 04 89 44 24 14 e8 eb 68 a2 c7 8b 45 00 8b 55 14 <81>3c 90 67 02 74 83 74 31 89 f8 c7 44 24 0c 0e 02 00 00 c7 44 As the kernel log hinted, I was able to work around this by booting the kernel with the vmalloc=256M option. All volumes were then able to initialize (given enough /etc/init.d/drbd starts, see problem 1) with no error. But maybe an oops is not the most appropriate behaviour for drbd to have in this case ? Problem 3 (most serious) --------- There's a new, more serious oops that happened on the target machine, a few minutes after I finally started synchronisation: Nov 29 17:49:01 pers23fs kernel: drbd8: drbd8_receiver [12097]: cstate Connected --> WFBitMapT Nov 29 17:49:01 pers23fs kernel: drbd8: Secondary/Secondary --> Secondary/Primary Nov 29 17:49:01 pers23fs kernel: drbd9: I am(S): 0:00000001:00000001:00000001:00000001:01 Nov 29 17:49:01 pers23fs kernel: drbd9: Peer(P): 1:00000002:00000001:00000001:00000001:11 Nov 29 17:49:01 pers23fs kernel: drbd9: drbd9_receiver [12879]: cstate Connected --> WFBitMapT Nov 29 17:49:01 pers23fs kernel: drbd9: Secondary/Secondary --> Secondary/Primary Nov 29 17:49:26 pers23fs kernel: drbd1: drbd1_receiver [6773]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:26 pers23fs kernel: drbd1: Resync started as SyncTarget (need to sync 488325120 KB [122081280 bits set]). Nov 29 17:49:27 pers23fs kernel: drbd0: drbd0_receiver [6471]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd0: Resync started as SyncTarget (need to sync 488333312 KB [122083328 bits set]). Nov 29 17:49:27 pers23fs kernel: ------------[ cut here ]------------ Nov 29 17:49:27 pers23fs kernel: kernel BUG at kernel/timer.c:267! Nov 29 17:49:27 pers23fs kernel: invalid operand: 0000 [#1] Nov 29 17:49:27 pers23fs kernel: SMP Nov 29 17:49:27 pers23fs kernel: Modules linked in: drbd tg3 Nov 29 17:49:27 pers23fs kernel: CPU: 2 Nov 29 17:49:27 pers23fs kernel: EIP: 0060:[mod_timer+96/106] Not tainted VLI Nov 29 17:49:27 pers23fs kernel: EFLAGS: 00010246 (2.6.10-rc1) Nov 29 17:49:27 pers23fs kernel: EIP is at mod_timer+0x60/0x6a Nov 29 17:49:27 pers23fs kernel: eax: 00000000 ebx: ce57032c ecx: c0369a88 edx: 00000282 Nov 29 17:49:27 pers23fs kernel: esi: 0026b60e edi: ce570000 ebp: 0000000e esp: d6094f20 Nov 29 17:49:27 pers23fs kernel: ds: 007b es: 007b ss: 0068 Nov 29 17:49:27 pers23fs kernel: Process drbd0_receiver (pid: 6471, threadinfo=d6094000 task=eee58020) Nov 29 17:49:27 pers23fs kernel: Stack: ce570000 c011b1ce 00000292 ce5703c8 f0910978 ce57032c 0026b60e f0926d36 Nov 29 17:49:27 pers23fs kernel: 1d1b6000 0746d800 00000000 00000000 003a36c0 ce570000 f09152dc ce570000 Nov 29 17:49:27 pers23fs kernel: 0000000e 000003e8 f0871000 ce5701dc 00000001 f0871000 003a36c0 ce570000 Nov 29 17:49:27 pers23fs kernel: Call Trace: Nov 29 17:49:27 pers23fs kernel: [printk+23/27] printk+0x17/0x1b Nov 29 17:49:27 pers23fs kernel: [pg0+809961848/1068884992] drbd_start_resync+0x1f1/0x2a1 [drbd] Nov 29 17:49:27 pers23fs kernel: [pg0+809980636/1068884992] receive_bitmap+0x230/0x27d [drbd] Nov 29 17:49:27 pers23fs kernel: [pg0+809980076/1068884992] receive_bitmap+0x0/0x27d [drbd] Nov 29 17:49:27 pers23fs kernel: [pg0+809982014/1068884992] drbdd+0x59/0x13f [drbd] Nov 29 17:49:27 pers23fs kernel: [pg0+809984814/1068884992] drbdd_init+0x70/0x16e [drbd] Nov 29 17:49:27 pers23fs kernel: [pg0+810007830/1068884992] drbd_thread_setup+0x78/0xe0 [drbd] Nov 29 17:49:27 pers23fs kernel: [pg0+810007710/1068884992] drbd_thread_setup+0x0/0xe0 [drbd] Nov 29 17:49:27 pers23fs kernel: [kernel_thread_helper+5/11] kernel_thread_helper+0x5/0xb Nov 29 17:49:27 pers23fs kernel: Code: 5c 24 14 8b 74 24 0c 8b 5c 24 08 83 c4 10 e9 12 fe ff ff 8b 43 1c 85 c0 74 e1 b8 01 00 00 00 8b 5c 24 08 8b 74 24 0c 83 c4 10 c3 <0f> 0b 0b 01 75 3d 32 c0 eb b0 56 53 83 ec 04 8b 74 24 10 81 7e Nov 29 17:49:27 pers23fs kernel: <6>drbd6: drbd6_receiver [10242]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd6: Resync started as SyncTarget (need to sync 488325120 KB [122081280 bits set]). Nov 29 17:49:27 pers23fs kernel: drbd6: drbd6_receiver [10242]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd6: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd8: drbd8_receiver [12097]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd8: Resync started as SyncTarget (need to sync 488325120 KB [122081280 bits set]). Nov 29 17:49:27 pers23fs kernel: drbd8: drbd8_receiver [12097]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd8: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd3: drbd3_receiver [8077]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd3: Resync started as SyncTarget (need to sync 488325120 KB [122081280 bits set]). Nov 29 17:49:27 pers23fs kernel: drbd3: drbd3_receiver [8077]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd3: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd9: drbd9_receiver [12879]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd9: Resync started as SyncTarget (need to sync 488333312 KB [122083328 bits set]). Nov 29 17:49:27 pers23fs kernel: drbd9: drbd9_receiver [12879]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd9: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd2: drbd2_receiver [7154]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd2: Resync started as SyncTarget (need to sync 488333312 KB [122083328 bits set]). Nov 29 17:49:27 pers23fs kernel: drbd2: drbd2_receiver [7154]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd2: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd3: Syncer continues. Nov 29 17:49:27 pers23fs kernel: drbd3: drbd2_worker [7152]: cstate PausedSyncT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd8: Syncer continues. Nov 29 17:49:27 pers23fs kernel: drbd8: drbd2_worker [7152]: cstate PausedSyncT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd5: drbd5_receiver [9607]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd5: Resync started as SyncTarget (need to sync 488333312 KB [122083328 bits set]). Nov 29 17:49:27 pers23fs kernel: drbd1: drbd5_receiver [9607]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd1: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd3: drbd5_receiver [9607]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd3: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd8: drbd5_receiver [9607]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd8: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd4: drbd4_receiver [8581]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd4: Resync started as SyncTarget (need to sync 488333312 KB [122083328 bits set]). Nov 29 17:49:27 pers23fs kernel: drbd4: drbd4_receiver [8581]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd4: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd2: Syncer continues. Nov 29 17:49:27 pers23fs kernel: drbd2: drbd1_worker [6771]: cstate PausedSyncT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd7: drbd7_receiver [10916]: cstate WFBitMapT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd7: Resync started as SyncTarget (need to sync 488333312 KB [122083328 bits set]). Nov 29 17:49:27 pers23fs kernel: drbd7: drbd7_receiver [10916]: cstate SyncTarget --> PausedSyncT Nov 29 17:49:27 pers23fs kernel: drbd7: Syncer waits for sync group. Nov 29 17:49:27 pers23fs kernel: drbd4: Syncer continues. Nov 29 17:49:27 pers23fs kernel: drbd4: drbd3_worker [8075]: cstate PausedSyncT --> SyncTarget Nov 29 17:49:27 pers23fs kernel: drbd9: Syncer continues. Nov 29 17:49:27 pers23fs kernel: drbd9: drbd3_worker [8075]: cstate PausedSyncT --> SyncTarget The machine was frozen after that. Am I doing anything wrong or is this a genuine bug ? Thanks for any help!