[DRBD-user] Kernel crash in drbd0_asender, drbd 8.3.9

Mrten mrten+drbd at ii.nl
Fri Jul 8 09:55:36 CEST 2011

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Tonight my secondary crashed while importing a large set of (openstreetmap) data with
postgresql on the primary:

I had to extract this from the remote syslogging facility, I do not know whether 
it is fully complete:

Jul  8 00:03:28 nadir kernel: [125018.133603] block drbd0: Digest integrity check FAILED.
Jul  8 00:03:28 nadir kernel: [125018.138980] block drbd0: error receiving Data, l: 131112!
Jul  8 00:03:28 nadir kernel: [125018.144492] block drbd0: peer( Primary -> Unknown ) conn( Connected -> ProtocolError ) pdsk( UpToDate -> DUnknown ) 
Jul  8 00:03:28 nadir kernel: [125018.144953] block drbd0: asender terminated
Jul  8 00:03:28 nadir kernel: [125018.144956] block drbd0: Terminating drbd0_asender
Jul  8 00:03:29 nadir kernel: [125018.440849] block drbd0: Connection closed
Jul  8 00:03:29 nadir kernel: [125018.440853] block drbd0: conn( ProtocolError -> Unconnected ) 
Jul  8 00:03:29 nadir kernel: [125018.440856] block drbd0: receiver terminated
Jul  8 00:03:29 nadir kernel: [125018.440858] block drbd0: Restarting drbd0_receiverJul  8 00:03:29 nadir kernel: [125018.440860] block drbd0: receiver (re)started
Jul  8 00:03:29 nadir kernel: [125018.440863] block drbd0: conn( Unconnected -> WFConnection ) 
Jul  8 00:03:29 nadir kernel: [125018.536341] block drbd0: Handshake successful: Agreed network protocol version 95
Jul  8 00:03:29 nadir kernel: [125018.536483] block drbd0: Peer authenticated using 20 bytes of 'sha1' HMAC
Jul  8 00:03:29 nadir kernel: [125018.536487] block drbd0: conn( WFConnection -> WFReportParams ) 
Jul  8 00:03:29 nadir kernel: [125018.536506] block drbd0: Starting asender thread (from drbd0_receiver [2131])
Jul  8 00:03:29 nadir kernel: [125018.536552] block drbd0: data-integrity-alg: md5
Jul  8 00:03:29 nadir kernel: [125018.536573] block drbd0: max_segment_size ( = BIO size ) = 65536
Jul  8 00:03:29 nadir kernel: [125018.536588] block drbd0: drbd_sync_handshake:
Jul  8 00:03:29 nadir kernel: [125018.536590] block drbd0: self E34E53A9081B1F64:0000000000000000:0F9D607E2DF004B4:4DEE98D54259C08D bits:0 flags:0
Jul  8 00:03:29 nadir kernel: [125018.536596] block drbd0: peer 55D7620CF1CEA711:E34E53A9081B1F65:0F9D607E2DF004B5:4DEE98D54259C08D bits:69526 flags:0
Jul  8 00:03:29 nadir kernel: [125018.536599] block drbd0: uuid_compare()=-1 by rule 50
Jul  8 00:03:29 nadir kernel: [125018.536603] block drbd0: peer( Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate ) 
Jul  8 00:03:30 nadir kernel: [125019.926570] block drbd0: conn( WFBitMapT -> WFSyncUUID ) 
Jul  8 00:03:30 nadir kernel: [125019.933012] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0
Jul  8 00:03:30 nadir kernel: [125019.934579] block drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0)
Jul  8 00:03:30 nadir kernel: [125019.934583] block drbd0: conn( WFSyncUUID -> SyncTarget ) disk( UpToDate -> Inconsistent ) 
Jul  8 00:03:30 nadir kernel: [125019.934588] block drbd0: Began resync as SyncTarget (will sync 313168 KB [78292 bits set]).
Jul  8 00:03:31 nadir kernel: [125020.350423] general protection fault: 0000 [#1] SMP 
Jul  8 00:03:31 nadir kernel: [125020.355525] last sysfs file: /sys/module/drbd/parameters/cn_idx
Jul  8 00:03:31 nadir kernel: [125020.361534] CPU 0 
Jul  8 00:03:31 nadir kernel: [125020.363457] Modules linked in: sha1_generic drbd lru_cache coretemp dcdbas pl2303 usbserial ghes ipmi_si ipmi_msghandler bonding power_meter 
hed lp parport xfs exportfs raid10 raid456 usb_storage async_pq async_xor uas xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 ahci raid0 bnx2 libahci igb multipath 
dca linear
Jul  8 00:03:31 nadir kernel: [125020.393345] 
Jul  8 00:03:31 nadir kernel: [125020.394940] Pid: 30452, comm: drbd0_asender Tainted: G        W   2.6.38-8-server #42-Ubuntu Dell Inc. PowerEdge R210/05KX61
Jul  8 00:03:31 nadir kernel: [125020.406362] RIP: 0010:[<ffffffff811175c9>]  [<ffffffff811175c9>] put_page+0x9/0x40
Jul  8 00:03:31 nadir kernel: [125020.414055] RSP: 0018:ffff8803eed93ae0  EFLAGS: 00010246
Jul  8 00:03:31 nadir kernel: [125020.419455] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000004
Jul  8 00:03:31 nadir kernel: [125020.426689] RDX: ffff8804066a1500 RSI: ffff8804066a14b4 RDI: 656e6f6870656c65
Jul  8 00:03:31 nadir kernel: [125020.433921] RBP: ffff8803eed93ae0 R08: 00000000975c6e7f R09: 0000000000004100
Jul  8 00:03:31 nadir kernel: [125020.441155] R10: ffff8804060ab400 R11: 0000000000000018 R12: ffff88041e4c2000
Jul  8 00:03:31 nadir kernel: [125020.448390] R13: ffff8804060ab470 R14: ffff8804060ab874 R15: 0000000000000000
Jul  8 00:03:31 nadir kernel: [125020.455623] FS:  0000000000000000(0000) GS:ffff8800bf200000(0000) knlGS:0000000000000000
Jul  8 00:03:31 nadir kernel: [125020.463809] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul  8 00:03:31 nadir kernel: [125020.469643] CR2: 0000000001f17130 CR3: 0000000001a03000 CR4: 00000000000006f0
Jul  8 00:03:31 nadir kernel: [125020.476875] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul  8 00:03:31 nadir kernel: [125020.484110] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul  8 00:03:31 nadir kernel: [125020.491344] Process drbd0_asender (pid: 30452, threadinfo ffff8803eed92000, task ffff8804133ddb80)
Jul  8 00:03:31 nadir kernel: [125020.500396] Stack:
Jul  8 00:03:31 nadir kernel: [125020.502513]  ffff8803eed93b00 ffffffff814d70c4 ffff88041e4c2000 0000000000000020
Jul  8 00:03:31 nadir kernel: [125020.510098]  ffff8803eed93b20 ffffffff814d710e 0000000000000020 ffff88041e4c2000
Jul  8 00:03:31 nadir kernel: [125020.517677]  ffff8803eed93bd0 ffffffff81527953 ffff8803eed93b50 0000000000000246
Jul  8 00:03:31 nadir kernel: [125020.525257] Call Trace:
Jul  8 00:03:31 nadir kernel: [125020.527803]  [<ffffffff814d70c4>] skb_release_data+0xb4/0xe0
Jul  8 00:03:31 nadir kernel: [125020.533555]  [<ffffffff814d710e>] __kfree_skb+0x1e/0xa0
Jul  8 00:03:31 nadir kernel: [125020.538871]  [<ffffffff81527953>] tcp_recvmsg+0xb03/0xbb0
Jul  8 00:03:31 nadir kernel: [125020.544361]  [<ffffffff8154a88b>] inet_recvmsg+0x6b/0x80
Jul  8 00:03:31 nadir kernel: [125020.549769]  [<ffffffff8104da6c>] ? resched_task+0x2c/0x80
Jul  8 00:03:31 nadir kernel: [125020.555344]  [<ffffffff814ce3cd>] sock_recvmsg+0xfd/0x130
Jul  8 00:03:31 nadir kernel: [125020.560834]  [<ffffffffa02495aa>] ? __bm_change_bits_to.clone.8+0xaa/0x140 [drbd]
Jul  8 00:03:31 nadir kernel: [125020.568418]  [<ffffffffa022264c>] ? lc_get+0x3c/0x130 [lru_cache]
Jul  8 00:03:31 nadir kernel: [125020.574602]  [<ffffffffa0253680>] drbd_recv_short.clone.22+0x70/0x80 [drbd]
Jul  8 00:03:31 nadir kernel: [125020.581656]  [<ffffffffa025d09f>] drbd_asender+0x15f/0x590 [drbd]
Jul  8 00:03:31 nadir kernel: [125020.587843]  [<ffffffffa02651a0>] ? drbd_thread_setup+0x0/0xf0 [drbd]
Jul  8 00:03:31 nadir kernel: [125020.594378]  [<ffffffffa0265204>] drbd_thread_setup+0x64/0xf0 [drbd]
Jul  8 00:03:31 nadir kernel: [125020.600823]  [<ffffffffa02651a0>] ? drbd_thread_setup+0x0/0xf0 [drbd]
Jul  8 00:03:31 nadir kernel: [125020.607356]  [<ffffffff810871f6>] kthread+0x96/0xa0
Jul  8 00:03:31 nadir kernel: [125020.612331]  [<ffffffff8100cde4>] kernel_thread_helper+0x4/0x10
Jul  8 00:03:31 nadir kernel: [125020.618339]  [<ffffffff81087160>] ? kthread+0x0/0xa0
Jul  8 00:03:31 nadir kernel: [125020.623393]  [<ffffffff8100cde0>] ? kernel_thread_helper+0x0/0x10
Jul  8 00:03:31 nadir kernel: [125020.629574] Code: de fe ff ff eb c9 48 8b 03 eb e6 89 c2 0f 1f 44 00 00 e9 5d ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 <48> f7 07 00 c0 00 00 75 1f 8b 47 08 f0 ff 4f 08 0f 94 c0 84 c0 
Jul  8 00:03:31 nadir kernel: [125020.649899] RIP  [<ffffffff811175c9>] put_page+0x9/0x40
Jul  8 00:03:31 nadir kernel: [125020.655237]  RSP <ffff8803eed93ae0>
Jul  8 00:03:31 nadir kernel: [125020.659209] ---[ end trace 2aa310a34052a046 ]---

Is there anything I can do to help debug this? If I must, I can 
repeat the import of the data (220G, it's not done yet :). Is the 
config needed? Should I try upgrading the module?

thanks for any reply,
Maarten.



More information about the drbd-user mailing list