Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi Philipp,
I have done my tests again with 2.6.10-rc2 and 0.7.6 , same hardware and
volume setup.
With the increased sleep, the drbd init script now takes a longer time
to give the "child process does not terminate" error, but it does
appear, and it even looks like things go faster if I just ctrl-c and
start the script again, rather than waiting for it to timeout and start
again.
So eventually I was able to start all resources on both machines, and
sync started at about 1500KB/s on each resource.
However on the SyncTarget, there was one resource where sync had not
started (or had stopped), and was shown as "secondary/secondary",
whereas on the syncsource the same resource was "primary/secondary".
Then another one stopped and showed the same thing.
So I tried to force connect again from the synctarget :
pers23fs:~# drbdadm connect vol24
Hangup
pers23fs:~# drbdadm connect vol24
Child process does not terminate!
Exiting.
pers23fs:~# drbdadm connect vol24
Child process does not terminate!
Exiting.
I found this strange, so I had a look at kernel messages:
drbd3: drbd3_receiver [2748]: cstate WFBitMapT --> SyncTarget
drbd3: Resync started as SyncTarget (need to sync 488325120 KB
[122081280 bits set]).
------------[ cut here ]------------
kernel BUG at kernel/timer.c:266!
invalid operand: 0000 [#1]
SMP
Modules linked in: drbd tg3
CPU: 2
EIP: 0060:[<c011f937>] Not tainted VLI
EFLAGS: 00010246 (2.6.10-rc2)
EIP is at mod_timer+0x60/0x6a
eax: 00000000 ebx: ee0212d0 ecx: c035b9f0 edx: 00000286
esi: 000904fd edi: ee020f9c ebp: 0000000e esp: ee22df20
ds: 007b es: 007b ss: 0068
Process drbd3_receiver (pid: 2748, threadinfo=ee22d000 task=efe78520)
Stack: ee020f9c c0117c5e 00000292 ee02136c f090fa95 ee0212d0 000904fd
f0924d3e
1d1b4000 0746d000 00000000 00000000 003a3680 ee020f9c f0913ef7
ee020f9c
0000000e 000003a8 f08d9000 ee020f9c 00000001 f08d9000 003a3680
ee020f9c
Call Trace:
[<c0117c5e>] printk+0x17/0x1b
[<f090fa95>] drbd_start_resync+0x1f4/0x2a4 [drbd]
[<f0913ef7>] receive_bitmap+0x230/0x27d [drbd]
[<f0913cc7>] receive_bitmap+0x0/0x27d [drbd]
[<f0914461>] drbdd+0x59/0x13f [drbd]
[<f0914f2e>] drbdd_init+0x70/0x16e [drbd]
[<f091a6f2>] drbd_thread_setup+0x78/0xe0 [drbd]
[<f091a67a>] drbd_thread_setup+0x0/0xe0 [drbd]
[<c0100761>] kernel_thread_helper+0x5/0xb
Code: 5c 24 14 8b 74 24 0c 8b 5c 24 08 83 c4 10 e9 12 fe ff ff 8b 43 1c
85 c0 74 e1 b8 01 00 00 00 8b 5c 24 08 8b 74 24 0c 83 c4 10 c3 <0f> 0b
0a 01 7a 61 31 c0 eb b0 56 53 83 ec 04 8b 74 24 10 81 7e
<6>drbd4: drbd4_receiver [3203]: cstate WFBitMapT --> SyncTarget
drbd4: Resync started as SyncTarget (need to sync 488333312 KB
[122083328 bits set]).
------------[ cut here ]------------
kernel BUG at kernel/timer.c:266!
invalid operand: 0000 [#2]
SMP
Modules linked in: drbd tg3
CPU: 2
EIP: 0060:[<c011f937>] Not tainted VLI
EFLAGS: 00010246 (2.6.10-rc2)
EIP is at mod_timer+0x60/0x6a
eax: 00000000 ebx: ee021804 ecx: c035b9f0 edx: 00000286
esi: 00090516 edi: ee0214d0 ebp: 0000000e esp: ef54bf20
ds: 007b es: 007b ss: 0068
Process drbd4_receiver (pid: 3203, threadinfo=ef54b000 task=ee72d520)
Stack: ee0214d0 c0117c5e 00000292 ee0218a0 f090fa95 ee021804 00090516
f0924d3e
1d1b6000 0746d800 00000000 00000000 003a36c0 ee0214d0 f0913ef7
ee0214d0
0000000e 000003e8 f08df000 ee0214d0 00000001 f08df000 003a36c0
ee0214d0
Call Trace:
[<c0117c5e>] printk+0x17/0x1b
[<f090fa95>] drbd_start_resync+0x1f4/0x2a4 [drbd]
[<f0913ef7>] receive_bitmap+0x230/0x27d [drbd]
[<f0913cc7>] receive_bitmap+0x0/0x27d [drbd]
[<f0914461>] drbdd+0x59/0x13f [drbd]
[<f0914f2e>] drbdd_init+0x70/0x16e [drbd]
[<f091a6f2>] drbd_thread_setup+0x78/0xe0 [drbd]
[<f091a67a>] drbd_thread_setup+0x0/0xe0 [drbd]
[<c0100761>] kernel_thread_helper+0x5/0xb
Code: 5c 24 14 8b 74 24 0c 8b 5c 24 08 83 c4 10 e9 12 fe ff ff 8b 43 1c
85 c0 74 e1 b8 01 00 00 00 8b 5c 24 08 8b 74 24 0c 83 c4 10 c3 <0f> 0b
0a 01 7a 61 31 c0 eb b0 56 53 83 ec 04 8b 74 24 10 81 7e
<6>drbd5: drbd5_receiver [3271]: cstate WFBitMapT --> SyncTarget
drbd5: Resync started as SyncTarget (need to sync 479191040 KB
[119797760 bits set]).
drbd5: Secondary/Secondary --> Secondary/Primary
drbd8: drbd8_receiver [3473]: cstate WFBitMapT --> SyncTarget
drbd8: Resync started as SyncTarget (need to sync 484261888 KB
[121065472 bits set]).
drbd8: Secondary/Secondary --> Secondary/Primary
drbd7: drbd7_receiver [3406]: cstate WFBitMapT --> SyncTarget
drbd7: Resync started as SyncTarget (need to sync 488333312 KB
[122083328 bits set]).
drbd7: Secondary/Secondary --> Secondary/Primary
drbd9: drbd9_receiver [3540]: cstate WFBitMapT --> SyncTarget
drbd9: Resync started as SyncTarget (need to sync 486760448 KB
[121690112 bits set]).
drbd9: Secondary/Secondary --> Secondary/Primary
drbd0: drbd0_receiver [2062]: cstate WFBitMapT --> SyncTarget
drbd0: Resync started as SyncTarget (need to sync 482697216 KB
[120674304 bits set]).
drbd0: Secondary/Secondary --> Secondary/Primary
drbd2: drbd2_receiver [2351]: cstate WFBitMapT --> SyncTarget
drbd2: Resync started as SyncTarget (need to sync 488333312 KB
[122083328 bits set]).
drbd2: Secondary/Secondary --> Secondary/Primary
drbd6: drbd6_receiver [3339]: cstate WFBitMapT --> SyncTarget
drbd6: Resync started as SyncTarget (need to sync 486801408 KB
[121700352 bits set]).
drbd6: Secondary/Secondary --> Secondary/Primary
Unable to handle kernel NULL pointer dereference at virtual address 00000504
printing eip:
c0302a87
*pde = 00000000
Oops: 0002 [#3]
SMP
Modules linked in: drbd tg3
CPU: 3
EIP: 0060:[<c0302a87>] Not tainted VLI
EFLAGS: 00010002 (2.6.10-rc2)
EIP is at _spin_lock_irqsave+0x5/0x1d
eax: 00000202 ebx: 00000001 ecx: 00000000 edx: 00000504
esi: ee72d520 edi: ee0214d0 ebp: 00000001 esp: ee254d20
ds: 007b es: 007b ss: 0068
Process drbdsetup (pid: 3784, threadinfo=ee254000 task=ee26f520)
Stack: c0121588 00000000 00000000 00000000 00000002 ee0218d0 ee0214d0
c0121e2d
00000001 00000001 ee72d520 f091a96e 00000001 ee72d520 ee108900
0000181e
0000000a 0000000a 00000004 0000181e ee0218d0 f090ba7c ee0218d0
00000000
Call Trace:
[<c0121588>] force_sig_info+0x27/0xa6
[<c0121e2d>] force_sig+0x1f/0x23
[<f091a96e>] _drbd_thread_stop+0x80/0x1d0 [drbd]
[<f090ba7c>] drbd_ioctl_set_net+0x163/0x2c2 [drbd]
[<f090ce25>] drbd_ioctl+0x875/0xd00 [drbd]
[<c0111304>] do_page_fault+0x18b/0x58d
[<c015a11a>] sys_fstat64+0x37/0x39
[<f090c5b0>] drbd_ioctl+0x0/0xd00 [drbd]
[<c025eaa8>] blkdev_ioctl+0x86/0x41b
[<c0162385>] sys_ioctl+0xb7/0x20a
[<c010244f>] syscall_call+0x7/0xb
Code: 00 01 74 05 e8 6b ef ff ff c3 f0 83 28 01 79 05 e8 7f ef ff ff c3
c6 00 01 c3 f0 81 00 00 00 00 01 c3 f0 ff 00 c3 89 c2 9c 58 fa <f0> fe
0a 79 12 a9 00 02 00 00 74 01 fb f3 90 80 3a 00 7e f9 fa
Now I'm in trouble... I think :/