[DRBD-user] Kernel Oops, RH9.0, 2.4.20-19.9

Tim Hasson tim at aidasystems.com
Fri Jun 18 22:23:48 CEST 2004

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Quoting Lars Ellenberg <Lars.Ellenberg at linbit.com>,
on Mon, 14 Jun 2004 09:40:47 +0200:

> / 2004-06-13 21:39:38 -0700
> \ Tim Hasson:
> > Here's the results after I upgraded to kernel 2.4.26 from kernel.org (also
> > recompiled drbd 0.6.12)
> > 
> > 
> > 
>                                                                              
>  |
> .--- slightly edited syslog:
> | drbd: ===> drbd start <===
> | drbd: modprobe -s drbd minor_count=2
> | kernel: drbd: initialised. Version: 0.6.12 (api:64/proto:62)
> | kernel: drbd0: Creating state file
> | kernel: "/var/lib/drbd/drbd0"
> | kernel: klogd 1.4.1, ---------- state change ----------
> | kernel: drbd1: Creating state file
> | kernel: "/var/lib/drbd/drbd1"
> | kernel: drbd0: Connection established. size=35277516 KB / blksize=4096 B
> | kernel: drbd1: Connection established. size=35277516 KB / blksize=4096 B
> | drbd: drbdsetup /dev/nb1 wait_connect -t 0
> | drbd: 'drbd0' SyncingAll, waiting for this to finish
> | drbd: 'drbd1' SyncingAll, waiting for this to finish
> | drbd: drbdsetup /dev/nb0 wait_sync
> | drbd: drbdsetup /dev/nb1 wait_sync
> | 
> | about one minute later:
> | 
> | kernel: KERNEL: assertion (tp->copied_seq == tp->rcv_nxt || \
> | 		(flags&(MSG_PEEK|MSG_TRUNC))) failed at tcp.c(1603)
> | last message repeated 11 times
> | kernel: KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540)
> | kernel: KERNEL: assertion (skb==NULL || before(tp->copied_seq, \
> | 		TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290)
> | kernel: KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540)
> | kernel: KERNEL: assertion (skb==NULL || before(tp->copied_seq, \
> | 		TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290)
> | kernel: KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540)
> | kernel: KERNEL: assertion (skb==NULL || before(tp->copied_seq, \
> | 		TCP_SKB_CB(skb)->end_seq)) failed at tcp.c(1290)
> | kernel: KERNEL: assertion (flags&MSG_PEEK) failed at tcp.c(1540)
> | last message repeated 3 times
> `---
> 
> > Any ideas?
> 
> your NIC is broken and cannot stand the load of a drbd full sync anymore.
> I bet you can crash that box by just running a 
>  netcat -l -p 7777 > /dev/null < /dev/null # on drbd2
> and do a
>  netcat drbd2 7777 < /dev/zero     # from drbd1 or any other box...
> 
> 
> 	Lars Ellenberg
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
> 


Well nc didn't crash it.

Replaced the gigabit nic (corresponding to the ip that's setup in drbd.conf)
with a 10/100 nic. Locked up at 2% of the sync after drbd start

Tried disconnecting one of the two drives, same thing. Swapped it with the
other
drive, same thing.


Here's the last output (after replacing the gigabit nic with a 10/100)


Jun 14 15:51:59 drbd2 drbd: ===> drbd start <===
Jun 14 15:51:59 drbd2 drbd: modprobe -s drbd minor_count=2
Jun 14 15:51:59 drbd2 kernel: drbd: initialised. Version: 0.6.12
(api:64/proto:62)
Jun 14 15:51:59 drbd2 drbd: drbdsetup /dev/nb0 disk /dev/sdb1 --do-panic
--disk-size=35277516k
Jun 14 15:52:00 drbd2 drbd: drbdsetup /dev/nb0 net 192.168.0.87:7788
192.168.0.86:7788 C --sync-min=10M --sync-max=25M --sync-nice=-10
--tl-size=5000 --timeo
ut=60 --connect-int=10 --ping-int=10
Jun 14 15:52:00 drbd2 drbd: drbdsetup /dev/nb1 disk /dev/sdc1 --do-panic
--disk-size=35277516k
Jun 14 15:52:00 drbd2 kernel: drbd1: Creating state file
Jun 14 15:52:00 drbd2 kernel: "/var/lib/drbd/drbd1"
Jun 14 15:52:00 drbd2 kernel: klogd 1.4.1, ---------- state change ----------
Jun 14 15:52:00 drbd2 kernel: drbd0: Connection established. size=35277516 KB /
blksize=4096 B
Jun 14 15:52:00 drbd2 drbd: drbdsetup /dev/nb1 net 192.168.0.87:7789
192.168.0.86:7789 C --sync-min=10M --sync-max=25M --sync-nice=-10
--tl-size=5000 --timeo
ut=60 --connect-int=10 --ping-int=10
Jun 14 15:52:00 drbd2 drbd: drbdsetup /dev/nb0 wait_connect -t 0
Jun 14 15:52:00 drbd2 drbd: drbdsetup /dev/nb1 wait_connect -t 0
Jun 14 15:52:00 drbd2 kernel: drbd1: Connection established. size=35277516 KB /
blksize=4096 B
Jun 14 15:52:00 drbd2 drbd: 'drbd0' SyncingAll, waiting for this to finish
Jun 14 15:52:00 drbd2 drbd: drbdsetup /dev/nb0 wait_sync
Jun 14 15:52:00 drbd2 drbd: 'drbd1' SyncingAll, waiting for this to finish
Jun 14 15:52:00 drbd2 drbd: drbdsetup /dev/nb1 wait_sync
Jun 14 15:52:41 drbd2 kernel: Unable to handle kernel NULL pointer
dereference<1>Unable to handle kernel NULL pointer dereference at virtual
address 00000038
Jun 14 15:52:41 drbd2 kernel:  printing eip:
Jun 14 15:52:41 drbd2 kernel: c01fd865
Jun 14 15:52:41 drbd2 kernel: *pde = 00000000
Jun 14 15:52:41 drbd2 kernel: Oops: 0000
Jun 14 15:52:41 drbd2 kernel: CPU:    0
Jun 14 15:52:41 drbd2 kernel: EIP:    0010:[<c01fd865>]    Not tainted
Jun 14 15:52:41 drbd2 kernel: EFLAGS: 00010286
Jun 14 15:52:41 drbd2 kernel: eax: 00004100   ebx: 00000008   ecx: 00000000  
edx: 00000000
Jun 14 15:52:41 drbd2 kernel:  at virtual address 00000034
Jun 14 15:52:41 drbd2 kernel:  printing eip:
Jun 14 15:52:41 drbd2 kernel: c01fd7c2
Jun 14 15:52:41 drbd2 kernel: *pde = 00000000
Jun 14 15:52:41 drbd2 kernel: esi: de693ed0   edi: de825314   ebp: de693f3c  
esp: de693ebc
Jun 14 15:52:41 drbd2 kernel: ds: 0018   es: 0018   ss: 0018
Jun 14 15:52:41 drbd2 kernel: Process drbdd_1 (pid: 1091, stackpage=de693000)
Jun 14 15:52:41 drbd2 kernel: Stack: de825314 de693f3c 00000008 00004100
de693ed0 00000000 00000000 00000000
Jun 14 15:52:41 drbd2 kernel:        00000000 00000000 de3bbc54 dfb2fe40
00000292 de825314 de90d310 c0000000
Jun 14 15:52:42 drbd2 kernel:        00000000 e091ab48 de825314 de693f3c
00000008 00004100 de693f9c 00000008
Jun 14 15:52:42 drbd2 kernel: Call Trace:    [<e091ab48>] [<e091aa30>]
[<e091b25b>] [<e091be2e>] [<e091fe6a>]
Jun 14 15:52:42 drbd2 kernel:   [<e0916c39>] [<c010752e>] [<e0916c10>]
Jun 14 15:52:42 drbd2 kernel:
Jun 14 15:52:42 drbd2 kernel: Code: ff 52 38 85 c0 89 c3 78 24 8b 45 10 85 c0
75
3d 80 7f 2a 00
Jun 14 15:52:42 drbd2 kernel:  Oops: 0000
Jun 14 15:52:42 drbd2 kernel: CPU:    1
Jun 14 15:52:42 drbd2 kernel: EIP:    0010:[<c01fd7c2>]    Not tainted
Jun 14 15:52:42 drbd2 kernel: EFLAGS: 00010246
Jun 14 15:52:42 drbd2 kernel: eax: 00000018   ebx: 00000000   ecx: 00000000  
edx: 00000000
Jun 14 15:52:42 drbd2 kernel: esi: de827ed4   edi: de827e20   ebp: de825314  
esp: de827e10
Jun 14 15:52:42 drbd2 kernel: ds: 0018   es: 0018   ss: 0018
Jun 14 15:52:42 drbd2 kernel: Process drbd_asender_1 (pid: 1120,
stackpage=de827000)
Jun 14 15:52:42 drbd2 kernel: Stack: de825314 de827ed4 00000018 de827e20
00000460 00000000 00000000 00000000
Jun 14 15:52:42 drbd2 kernel:        00000000 00000000 de826000 00000296
de826000 00000296 00000000 de825314
Jun 14 15:52:42 drbd2 kernel:        e0917c8b de825314 de827ed4 00000018
df117820 de7a45b8 00000018 de826000
Jun 14 15:52:42 drbd2 kernel: Call Trace:    [<e0917c8b>] [<e0917860>]
[<e09174f3>] [<e091a6e5>] [<e091c03b>]
Jun 14 15:52:42 drbd2 kernel:   [<e091bec0>] [<e0916c39>] [<c010752e>]
[<e0916c10>]
Jun 14 15:52:42 drbd2 kernel:
Jun 14 15:52:42 drbd2 kernel: Code: ff 52 34 89 c3 8b 44 24 1c 85 c0 75 11 83
c4
30 89 d8 5b 5e
Jun 14 15:52:42 drbd2 kernel:  <1>Unable to handle kernel paging request at
virtual address 7567696a
Jun 14 15:52:42 drbd2 kernel:  printing eip:
Jun 14 15:52:42 drbd2 kernel: c012aa61
Jun 14 15:52:42 drbd2 kernel: *pde = 00000000
Jun 14 15:52:42 drbd2 kernel: Oops: 0002
Jun 14 15:52:42 drbd2 kernel: CPU:    1
Jun 14 15:52:42 drbd2 kernel: EIP:    0010:[<c012aa61>]    Not tainted
Jun 14 15:52:42 drbd2 kernel: EFLAGS: 00010002
Jun 14 15:52:42 drbd2 kernel: eax: de827e84   ebx: de53fe84   ecx: 0000012b  
edx: 75676966
Jun 14 15:52:42 drbd2 kernel: esi: 00001770   edi: 00000000   ebp: de8f06f4  
esp: de53fe3c
Jun 14 15:52:42 drbd2 kernel: ds: 0018   es: 0018   ss: 0018
Jun 14 15:52:42 drbd2 kernel: Process drbd_asender_0 (pid: 1062,
stackpage=de53f000)
Jun 14 15:52:42 drbd2 kernel: Stack: 00000296 c0129e19 de53fe84 c022b412
de53e000 e091813c de53fe84 de53fed4
Jun 14 15:52:42 drbd2 kernel:        00000018 00000000 de68e5b8 de68e5b4
de53e000 00000000 00000000 ffffff95
Jun 14 15:52:42 drbd2 kernel:        00000000 de68e7c0 00000000 00000000
0000ed5f de53fe84 e0917860 de90d000
Jun 14 15:52:42 drbd2 kernel: Call Trace:    [<c0129e19>] [<c022b412>]
[<e091813c>] [<e0917860>] [<e09174f3>]
Jun 14 15:52:42 drbd2 kernel:   [<e091a6e5>] [<e091c03b>] [<e091bec0>]
[<e0916c39>] [<c010752e>] [<e0916c10>]
Jun 14 15:52:42 drbd2 kernel:
Jun 14 15:52:42 drbd2 kernel: Code: 89 5a 04 89 13 89 43 04 89 18 5b c3 8d 76
00
81 f9 ff 3f 00
Jun 14 16:00:03 drbd2 syslogd 1.4.1: restart.


Any ideas?

-- 
Respectfully,
Tim Hasson




More information about the drbd-user mailing list