Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Philipp, I have done my tests again with 2.6.10-rc2 and 0.7.6 , same hardware and volume setup. With the increased sleep, the drbd init script now takes a longer time to give the "child process does not terminate" error, but it does appear, and it even looks like things go faster if I just ctrl-c and start the script again, rather than waiting for it to timeout and start again. So eventually I was able to start all resources on both machines, and sync started at about 1500KB/s on each resource. However on the SyncTarget, there was one resource where sync had not started (or had stopped), and was shown as "secondary/secondary", whereas on the syncsource the same resource was "primary/secondary". Then another one stopped and showed the same thing. So I tried to force connect again from the synctarget : pers23fs:~# drbdadm connect vol24 Hangup pers23fs:~# drbdadm connect vol24 Child process does not terminate! Exiting. pers23fs:~# drbdadm connect vol24 Child process does not terminate! Exiting. I found this strange, so I had a look at kernel messages: drbd3: drbd3_receiver [2748]: cstate WFBitMapT --> SyncTarget drbd3: Resync started as SyncTarget (need to sync 488325120 KB [122081280 bits set]). ------------[ cut here ]------------ kernel BUG at kernel/timer.c:266! invalid operand: 0000 [#1] SMP Modules linked in: drbd tg3 CPU: 2 EIP: 0060:[<c011f937>] Not tainted VLI EFLAGS: 00010246 (2.6.10-rc2) EIP is at mod_timer+0x60/0x6a eax: 00000000 ebx: ee0212d0 ecx: c035b9f0 edx: 00000286 esi: 000904fd edi: ee020f9c ebp: 0000000e esp: ee22df20 ds: 007b es: 007b ss: 0068 Process drbd3_receiver (pid: 2748, threadinfo=ee22d000 task=efe78520) Stack: ee020f9c c0117c5e 00000292 ee02136c f090fa95 ee0212d0 000904fd f0924d3e 1d1b4000 0746d000 00000000 00000000 003a3680 ee020f9c f0913ef7 ee020f9c 0000000e 000003a8 f08d9000 ee020f9c 00000001 f08d9000 003a3680 ee020f9c Call Trace: [<c0117c5e>] printk+0x17/0x1b [<f090fa95>] drbd_start_resync+0x1f4/0x2a4 [drbd] [<f0913ef7>] receive_bitmap+0x230/0x27d [drbd] [<f0913cc7>] receive_bitmap+0x0/0x27d [drbd] [<f0914461>] drbdd+0x59/0x13f [drbd] [<f0914f2e>] drbdd_init+0x70/0x16e [drbd] [<f091a6f2>] drbd_thread_setup+0x78/0xe0 [drbd] [<f091a67a>] drbd_thread_setup+0x0/0xe0 [drbd] [<c0100761>] kernel_thread_helper+0x5/0xb Code: 5c 24 14 8b 74 24 0c 8b 5c 24 08 83 c4 10 e9 12 fe ff ff 8b 43 1c 85 c0 74 e1 b8 01 00 00 00 8b 5c 24 08 8b 74 24 0c 83 c4 10 c3 <0f> 0b 0a 01 7a 61 31 c0 eb b0 56 53 83 ec 04 8b 74 24 10 81 7e <6>drbd4: drbd4_receiver [3203]: cstate WFBitMapT --> SyncTarget drbd4: Resync started as SyncTarget (need to sync 488333312 KB [122083328 bits set]). ------------[ cut here ]------------ kernel BUG at kernel/timer.c:266! invalid operand: 0000 [#2] SMP Modules linked in: drbd tg3 CPU: 2 EIP: 0060:[<c011f937>] Not tainted VLI EFLAGS: 00010246 (2.6.10-rc2) EIP is at mod_timer+0x60/0x6a eax: 00000000 ebx: ee021804 ecx: c035b9f0 edx: 00000286 esi: 00090516 edi: ee0214d0 ebp: 0000000e esp: ef54bf20 ds: 007b es: 007b ss: 0068 Process drbd4_receiver (pid: 3203, threadinfo=ef54b000 task=ee72d520) Stack: ee0214d0 c0117c5e 00000292 ee0218a0 f090fa95 ee021804 00090516 f0924d3e 1d1b6000 0746d800 00000000 00000000 003a36c0 ee0214d0 f0913ef7 ee0214d0 0000000e 000003e8 f08df000 ee0214d0 00000001 f08df000 003a36c0 ee0214d0 Call Trace: [<c0117c5e>] printk+0x17/0x1b [<f090fa95>] drbd_start_resync+0x1f4/0x2a4 [drbd] [<f0913ef7>] receive_bitmap+0x230/0x27d [drbd] [<f0913cc7>] receive_bitmap+0x0/0x27d [drbd] [<f0914461>] drbdd+0x59/0x13f [drbd] [<f0914f2e>] drbdd_init+0x70/0x16e [drbd] [<f091a6f2>] drbd_thread_setup+0x78/0xe0 [drbd] [<f091a67a>] drbd_thread_setup+0x0/0xe0 [drbd] [<c0100761>] kernel_thread_helper+0x5/0xb Code: 5c 24 14 8b 74 24 0c 8b 5c 24 08 83 c4 10 e9 12 fe ff ff 8b 43 1c 85 c0 74 e1 b8 01 00 00 00 8b 5c 24 08 8b 74 24 0c 83 c4 10 c3 <0f> 0b 0a 01 7a 61 31 c0 eb b0 56 53 83 ec 04 8b 74 24 10 81 7e <6>drbd5: drbd5_receiver [3271]: cstate WFBitMapT --> SyncTarget drbd5: Resync started as SyncTarget (need to sync 479191040 KB [119797760 bits set]). drbd5: Secondary/Secondary --> Secondary/Primary drbd8: drbd8_receiver [3473]: cstate WFBitMapT --> SyncTarget drbd8: Resync started as SyncTarget (need to sync 484261888 KB [121065472 bits set]). drbd8: Secondary/Secondary --> Secondary/Primary drbd7: drbd7_receiver [3406]: cstate WFBitMapT --> SyncTarget drbd7: Resync started as SyncTarget (need to sync 488333312 KB [122083328 bits set]). drbd7: Secondary/Secondary --> Secondary/Primary drbd9: drbd9_receiver [3540]: cstate WFBitMapT --> SyncTarget drbd9: Resync started as SyncTarget (need to sync 486760448 KB [121690112 bits set]). drbd9: Secondary/Secondary --> Secondary/Primary drbd0: drbd0_receiver [2062]: cstate WFBitMapT --> SyncTarget drbd0: Resync started as SyncTarget (need to sync 482697216 KB [120674304 bits set]). drbd0: Secondary/Secondary --> Secondary/Primary drbd2: drbd2_receiver [2351]: cstate WFBitMapT --> SyncTarget drbd2: Resync started as SyncTarget (need to sync 488333312 KB [122083328 bits set]). drbd2: Secondary/Secondary --> Secondary/Primary drbd6: drbd6_receiver [3339]: cstate WFBitMapT --> SyncTarget drbd6: Resync started as SyncTarget (need to sync 486801408 KB [121700352 bits set]). drbd6: Secondary/Secondary --> Secondary/Primary Unable to handle kernel NULL pointer dereference at virtual address 00000504 printing eip: c0302a87 *pde = 00000000 Oops: 0002 [#3] SMP Modules linked in: drbd tg3 CPU: 3 EIP: 0060:[<c0302a87>] Not tainted VLI EFLAGS: 00010002 (2.6.10-rc2) EIP is at _spin_lock_irqsave+0x5/0x1d eax: 00000202 ebx: 00000001 ecx: 00000000 edx: 00000504 esi: ee72d520 edi: ee0214d0 ebp: 00000001 esp: ee254d20 ds: 007b es: 007b ss: 0068 Process drbdsetup (pid: 3784, threadinfo=ee254000 task=ee26f520) Stack: c0121588 00000000 00000000 00000000 00000002 ee0218d0 ee0214d0 c0121e2d 00000001 00000001 ee72d520 f091a96e 00000001 ee72d520 ee108900 0000181e 0000000a 0000000a 00000004 0000181e ee0218d0 f090ba7c ee0218d0 00000000 Call Trace: [<c0121588>] force_sig_info+0x27/0xa6 [<c0121e2d>] force_sig+0x1f/0x23 [<f091a96e>] _drbd_thread_stop+0x80/0x1d0 [drbd] [<f090ba7c>] drbd_ioctl_set_net+0x163/0x2c2 [drbd] [<f090ce25>] drbd_ioctl+0x875/0xd00 [drbd] [<c0111304>] do_page_fault+0x18b/0x58d [<c015a11a>] sys_fstat64+0x37/0x39 [<f090c5b0>] drbd_ioctl+0x0/0xd00 [drbd] [<c025eaa8>] blkdev_ioctl+0x86/0x41b [<c0162385>] sys_ioctl+0xb7/0x20a [<c010244f>] syscall_call+0x7/0xb Code: 00 01 74 05 e8 6b ef ff ff c3 f0 83 28 01 79 05 e8 7f ef ff ff c3 c6 00 01 c3 f0 81 00 00 00 00 01 c3 f0 ff 00 c3 89 c2 9c 58 fa <f0> fe 0a 79 12 a9 00 02 00 00 74 01 fb f3 90 80 3a 00 7e f9 fa Now I'm in trouble... I think :/