[DRBD-user] DRBD 8.3.6 + 8.3.2 crash during first sync

Maxence DUNNEWIND maxence at dunnewind.net
Tue Jan 26 08:33:40 CET 2010


Hi,

My setup is : 

2 hosts connected through openvpn point to point in udp mode.

I want a first sync of a 100GB device disk. The drbd device uses 2 lvs for
data/meta (It's only a part of my setup, so, no, I can't use the 'basic' layout
drbd under a pv).

After about 10 hours of sync, I get a kernel Oops :

[54599.996257] block drbd4: error receiving RSDataReply, l: 4120!
[54599.996322] BUG: unable to handle kernel NULL pointer dereference at
0000000000000008
[54599.996371] IP: [<ffffffff8040d3ab>] sk_stream_wait_memory+0x88/0x1e5
[54599.996404] PGD 400cb9067 PUD 41f156067 PMD 0 
[54599.996431] Oops: 0002 [#1] SMP 
[54599.996455] last sysfs file: /sys/devices/virtual/block/drbd4/dev
[54599.996483] CPU 0 
[54599.996502] Modules linked in: hmac tun kvm_amd kvm bridge stp drbd cn loop
snd_pcsp snd_pcm snd_timer i2c_nforce2 snd shpchp soundcore
 k8temp snd_page_alloc i2c_core serio_raw pci_hotplug psmouse evdev button
 processor ext3 jbd mbcache dm_mod sd_mod crc_t10dif ata_generic
  ide_pci_generic ohci_hcd sata_nv amd74xx libata forcedeth scsi_mod ide_core
  ehci_hcd floppy thermal fan thermal_sys
  [54599.996692] Pid: 5185, comm: drbd4_worker Not tainted 2.6.30-2-amd64 #1
  H8DMR-82
  [54599.996736] RIP: 0010:[<ffffffff8040d3ab>]  [<ffffffff8040d3ab>]
  sk_stream_wait_memory+0x88/0x1e5
  [54599.996783] RSP: 0018:ffff88040e139a40  EFLAGS: 00010246
  [54599.996809] RAX: 0000000000000000 RBX: 00000000000005dc RCX:
  000000000000c8d8
  [54599.996838] RDX: 0000000000000008 RSI: 0000000000000000 RDI:
  ffffffff804065c6
  [54599.996867] RBP: ffff8803e24f1440 R08: 0000000000000000 R09:
  ffff8803e24f1440
  [54599.996896] R10: 0000000000000000 R11: ffff88040e139b08 R12:
  00000000000005dc
  [54599.996925] R13: 0000000000000000 R14: ffff88040e139b08 R15:
  7fffffffffffffff
  [54599.996954] FS:  00007fa0859c6950(0000) GS:ffffc20000000000(0000)
  knlGS:0000000000000000
  [54599.996998] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
  [54599.997025] CR2: 0000000000000008 CR3: 00000004010aa000 CR4:
  00000000000006e0
  [54599.997054] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
  0000000000000000
  [54599.997083] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
  0000000000000400
  [54599.997112] Process drbd4_worker (pid: 5185, threadinfo ffff88040e138000,
  task ffff88041f176080)
  [54599.997157] Stack:
  [54599.997176]  0000000000000000 ffff88041f176080 ffffffff80254742
  ffff88040e139a58
  [54599.997207]  ffff88040e139a58 0000000000000000 ffff8802060f3880
  ffff8803e35e6140
  [54599.997252]  ffff8803e24f1440 ffff88040e139b14 ffff880401c9c000
  0000000000000000
  [54599.997313] Call Trace:
  [54599.997334]  [<ffffffff80254742>] ? autoremove_wake_function+0x0/0x2e
  [54599.997364]  [<ffffffff8043efda>] ? tcp_sendmsg+0x6fa/0x85b
  [54599.997391]  [<ffffffff80403f24>] ? sock_sendmsg+0xa3/0xbb
  [54599.997419]  [<ffffffff80254742>] ? autoremove_wake_function+0x0/0x2e
  [54599.997447]  [<ffffffff8020e5a9>] ? __switch_to+0xae/0x263
  [54599.997475]  [<ffffffff80235f65>] ? dequeue_entity+0xf/0x11f
  [54599.997503]  [<ffffffff804041f2>] ? kernel_sendmsg+0x2c/0x3e
  [54599.997530]  [<ffffffffa0222386>] ? drbd_send+0xb9/0x1cf [drbd]
  [54599.997578]  [<ffffffff804b45e8>] ? schedule+0x9/0x1e
  [54599.997605]  [<ffffffffa0222cec>] ? _drbd_send_cmd+0x170/0x184 [drbd]
  [54599.997644]  [<ffffffffa022320d>] ? drbd_send_cmd+0x64/0x8c [drbd]
  [54599.997682]  [<ffffffffa0223388>] ? drbd_send_b_ack+0x37/0x40 [drbd]
  [54599.997720]  [<ffffffffa0210210>] ? drbd_may_finish_epoch+0x122/0x2f8
  [drbd]
  [54599.997759]  [<ffffffffa021070f>] ? w_flush+0x54/0x5d [drbd]
  [54599.997795]  [<ffffffffa020a8b2>] ? drbd_worker+0x4c2/0x4cf [drbd]
  [54599.997831]  [<ffffffff804b47df>] ? schedule_timeout+0x9b/0xb6
  [54599.997859]  [<ffffffff804b47cf>] ? schedule_timeout+0x8b/0xb6
  [54599.997886]  [<ffffffffa0223d03>] ? drbd_thread_setup+0x16f/0x231 [drbd]
  [54599.997925]  [<ffffffff80210aca>] ? child_rip+0xa/0x20
  [54599.997951]  [<ffffffffa0223b94>] ? drbd_thread_setup+0x0/0x231 [drbd]
  [54599.997989]  [<ffffffff80210ac0>] ? child_rip+0x0/0x20
  [54599.998015] Code: f4 ff ba 32 00 00 00 89 d1 31 d2 f7 f1 83 c2 02 41 89 d4
  4d 89 e5 49 bf ff ff ff ff ff ff ff 7f 48 8b 85 e8 01 00 00 
  48 8d 50 08 <f0> 80 48 08 01 48 8b 7d 78 ba 01 00 00 00 48 89 e6 e8 09 75 e4 
  [54599.998161] RIP  [<ffffffff8040d3ab>] sk_stream_wait_memory+0x88/0x1e5
  [54599.998190]  RSP <ffff88040e139a40>
  [54599.998211] CR2: 0000000000000008
  [54599.998505] ---[ end trace 71b4da7bbc4c7789 ]---


The drbd device was then in "Timeout" mode. I tried : 
drbdsetup /dev/drbd4 down (my drbd setup is managed by a software which handle
all virtualisation stuff, and which directly uses drbdsetup).
I got 2 messages  :
No response from the DRBD driver! Is the module loaded?
No response from the DRBD driver! Is the module loaded?

Of course, it is loaded, 3 other devices works. It is now in Disconnecting
state, but does not disconnect : 
 4: cs:Disconnecting ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
     ns:0 nr:75029472 dw:75029472 dr:0 al:0 bm:4574 lo:0 pe:0 ua:1278 ap:0 ep:2
     wo:b oos:30857852

On primary, the device is in WFConecction state : 
16: cs:WFConnection ro:Primary/Unknown ds:UpToDate/Inconsistent C r----
    ns:75034364 nr:0 dw:1038560 dr:75513180 al:284 bm:4544 lo:0 pe:0 ua:0 ap:0
    ep:1 wo:b oos:30942364

The crash appears on the node which has 8.3.6 module, the one with 8.3.2 works
still well.

Now it seems the only way to free /dev/drbd4 is to reboot the host :'(

Any help will be welcome.

Regards,

Maxence

-- 
Maxence DUNNEWIND
Contact : maxence at dunnewind.net
Site : http://www.dunnewind.net
06 32 39 39 93
GPG : 18AE 61E4 D0B0 1C7C AAC9  E40D 4D39 68DB 0D2E B533
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20100126/7024dc39/attachment-0001.pgp>


More information about the drbd-user mailing list