[DRBD-user] Oopses/timeouts on 0.7.5 with many big volumes

Renaud Guerin renaud.guerin at wanadooportails.com
Tue Nov 30 19:03:25 CET 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Philipp,

I have done my tests again with 2.6.10-rc2 and 0.7.6 , same hardware and 
volume setup.

With the increased sleep, the drbd init script now takes a longer time 
to give the "child process does not terminate" error, but it does 
appear, and it even looks like things go faster if I just ctrl-c and 
start the script again, rather than waiting for it to timeout and start 
again.

So eventually I was able to start all resources on both machines, and 
sync started at about 1500KB/s on each resource.

However on the SyncTarget, there was one resource where sync had not 
started (or had stopped), and was shown as "secondary/secondary", 
whereas on the syncsource the same resource was  "primary/secondary".
Then another one stopped and showed the same thing.

So I tried to force connect again from the synctarget :

pers23fs:~# drbdadm connect vol24
Hangup
pers23fs:~# drbdadm connect vol24
Child process does not terminate!
Exiting.
pers23fs:~# drbdadm connect vol24
Child process does not terminate!
Exiting.

I found this strange, so I had a look at kernel messages:


drbd3: drbd3_receiver [2748]: cstate WFBitMapT --> SyncTarget
drbd3: Resync started as SyncTarget (need to sync 488325120 KB 
[122081280 bits set]).
------------[ cut here ]------------
kernel BUG at kernel/timer.c:266!
invalid operand: 0000 [#1]
SMP
Modules linked in: drbd tg3
CPU:    2
EIP:    0060:[<c011f937>]    Not tainted VLI
EFLAGS: 00010246   (2.6.10-rc2)
EIP is at mod_timer+0x60/0x6a
eax: 00000000   ebx: ee0212d0   ecx: c035b9f0   edx: 00000286
esi: 000904fd   edi: ee020f9c   ebp: 0000000e   esp: ee22df20
ds: 007b   es: 007b   ss: 0068
Process drbd3_receiver (pid: 2748, threadinfo=ee22d000 task=efe78520)
Stack: ee020f9c c0117c5e 00000292 ee02136c f090fa95 ee0212d0 000904fd 
f0924d3e
       1d1b4000 0746d000 00000000 00000000 003a3680 ee020f9c f0913ef7 
ee020f9c
       0000000e 000003a8 f08d9000 ee020f9c 00000001 f08d9000 003a3680 
ee020f9c
Call Trace:
 [<c0117c5e>] printk+0x17/0x1b
 [<f090fa95>] drbd_start_resync+0x1f4/0x2a4 [drbd]
 [<f0913ef7>] receive_bitmap+0x230/0x27d [drbd]
 [<f0913cc7>] receive_bitmap+0x0/0x27d [drbd]
 [<f0914461>] drbdd+0x59/0x13f [drbd]
 [<f0914f2e>] drbdd_init+0x70/0x16e [drbd]
 [<f091a6f2>] drbd_thread_setup+0x78/0xe0 [drbd]
 [<f091a67a>] drbd_thread_setup+0x0/0xe0 [drbd]
 [<c0100761>] kernel_thread_helper+0x5/0xb
Code: 5c 24 14 8b 74 24 0c 8b 5c 24 08 83 c4 10 e9 12 fe ff ff 8b 43 1c 
85 c0 74 e1 b8 01 00 00 00 8b 5c 24 08 8b 74 24 0c 83 c4 10 c3 <0f> 0b 
0a 01 7a 61 31 c0 eb b0 56 53 83 ec 04 8b 74 24 10 81 7e
 <6>drbd4: drbd4_receiver [3203]: cstate WFBitMapT --> SyncTarget
drbd4: Resync started as SyncTarget (need to sync 488333312 KB 
[122083328 bits set]).
------------[ cut here ]------------
kernel BUG at kernel/timer.c:266!
invalid operand: 0000 [#2]
SMP
Modules linked in: drbd tg3
CPU:    2
EIP:    0060:[<c011f937>]    Not tainted VLI
EFLAGS: 00010246   (2.6.10-rc2)
EIP is at mod_timer+0x60/0x6a
eax: 00000000   ebx: ee021804   ecx: c035b9f0   edx: 00000286
esi: 00090516   edi: ee0214d0   ebp: 0000000e   esp: ef54bf20
ds: 007b   es: 007b   ss: 0068
Process drbd4_receiver (pid: 3203, threadinfo=ef54b000 task=ee72d520)
Stack: ee0214d0 c0117c5e 00000292 ee0218a0 f090fa95 ee021804 00090516 
f0924d3e
       1d1b6000 0746d800 00000000 00000000 003a36c0 ee0214d0 f0913ef7 
ee0214d0
       0000000e 000003e8 f08df000 ee0214d0 00000001 f08df000 003a36c0 
ee0214d0
Call Trace:
 [<c0117c5e>] printk+0x17/0x1b
 [<f090fa95>] drbd_start_resync+0x1f4/0x2a4 [drbd]
 [<f0913ef7>] receive_bitmap+0x230/0x27d [drbd]
 [<f0913cc7>] receive_bitmap+0x0/0x27d [drbd]
 [<f0914461>] drbdd+0x59/0x13f [drbd]
 [<f0914f2e>] drbdd_init+0x70/0x16e [drbd]
 [<f091a6f2>] drbd_thread_setup+0x78/0xe0 [drbd]
 [<f091a67a>] drbd_thread_setup+0x0/0xe0 [drbd]
 [<c0100761>] kernel_thread_helper+0x5/0xb
Code: 5c 24 14 8b 74 24 0c 8b 5c 24 08 83 c4 10 e9 12 fe ff ff 8b 43 1c 
85 c0 74 e1 b8 01 00 00 00 8b 5c 24 08 8b 74 24 0c 83 c4 10 c3 <0f> 0b 
0a 01 7a 61 31 c0 eb b0 56 53 83 ec 04 8b 74 24 10 81 7e
 <6>drbd5: drbd5_receiver [3271]: cstate WFBitMapT --> SyncTarget
drbd5: Resync started as SyncTarget (need to sync 479191040 KB 
[119797760 bits set]).
drbd5: Secondary/Secondary --> Secondary/Primary
drbd8: drbd8_receiver [3473]: cstate WFBitMapT --> SyncTarget
drbd8: Resync started as SyncTarget (need to sync 484261888 KB 
[121065472 bits set]).
drbd8: Secondary/Secondary --> Secondary/Primary
drbd7: drbd7_receiver [3406]: cstate WFBitMapT --> SyncTarget
drbd7: Resync started as SyncTarget (need to sync 488333312 KB 
[122083328 bits set]).
drbd7: Secondary/Secondary --> Secondary/Primary
drbd9: drbd9_receiver [3540]: cstate WFBitMapT --> SyncTarget
drbd9: Resync started as SyncTarget (need to sync 486760448 KB 
[121690112 bits set]).
drbd9: Secondary/Secondary --> Secondary/Primary
drbd0: drbd0_receiver [2062]: cstate WFBitMapT --> SyncTarget
drbd0: Resync started as SyncTarget (need to sync 482697216 KB 
[120674304 bits set]).
drbd0: Secondary/Secondary --> Secondary/Primary
drbd2: drbd2_receiver [2351]: cstate WFBitMapT --> SyncTarget
drbd2: Resync started as SyncTarget (need to sync 488333312 KB 
[122083328 bits set]).
drbd2: Secondary/Secondary --> Secondary/Primary
drbd6: drbd6_receiver [3339]: cstate WFBitMapT --> SyncTarget
drbd6: Resync started as SyncTarget (need to sync 486801408 KB 
[121700352 bits set]).
drbd6: Secondary/Secondary --> Secondary/Primary
Unable to handle kernel NULL pointer dereference at virtual address 00000504
 printing eip:
c0302a87
*pde = 00000000
Oops: 0002 [#3]
SMP
Modules linked in: drbd tg3
CPU:    3
EIP:    0060:[<c0302a87>]    Not tainted VLI
EFLAGS: 00010002   (2.6.10-rc2)
EIP is at _spin_lock_irqsave+0x5/0x1d
eax: 00000202   ebx: 00000001   ecx: 00000000   edx: 00000504
esi: ee72d520   edi: ee0214d0   ebp: 00000001   esp: ee254d20
ds: 007b   es: 007b   ss: 0068
Process drbdsetup (pid: 3784, threadinfo=ee254000 task=ee26f520)
Stack: c0121588 00000000 00000000 00000000 00000002 ee0218d0 ee0214d0 
c0121e2d
       00000001 00000001 ee72d520 f091a96e 00000001 ee72d520 ee108900 
0000181e
       0000000a 0000000a 00000004 0000181e ee0218d0 f090ba7c ee0218d0 
00000000
Call Trace:
 [<c0121588>] force_sig_info+0x27/0xa6
 [<c0121e2d>] force_sig+0x1f/0x23
 [<f091a96e>] _drbd_thread_stop+0x80/0x1d0 [drbd]
 [<f090ba7c>] drbd_ioctl_set_net+0x163/0x2c2 [drbd]
 [<f090ce25>] drbd_ioctl+0x875/0xd00 [drbd]
 [<c0111304>] do_page_fault+0x18b/0x58d
 [<c015a11a>] sys_fstat64+0x37/0x39
 [<f090c5b0>] drbd_ioctl+0x0/0xd00 [drbd]
 [<c025eaa8>] blkdev_ioctl+0x86/0x41b
 [<c0162385>] sys_ioctl+0xb7/0x20a
 [<c010244f>] syscall_call+0x7/0xb
Code: 00 01 74 05 e8 6b ef ff ff c3 f0 83 28 01 79 05 e8 7f ef ff ff c3 
c6 00 01 c3 f0 81 00 00 00 00 01 c3 f0 ff 00 c3 89 c2 9c 58 fa <f0> fe 
0a 79 12 a9 00 02 00 00 74 01 fb f3 90 80 3a 00 7e f9 fa

Now I'm in trouble... I think :/



More information about the drbd-user mailing list