Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, I had a similar problem yesterday on 8.2.4. Created a new resource on my SUSE 10.3, which connected no problem. The inital sync did not cut in, so it was forced using 'drdbadm invalidate'. (Is this normal, feels like a bug to me?) This completed, but left the box in a state where no new processes could be created. Nothing in /var/log/messages. So I had to disconnect power. It worked second time. I tried to replicate this morning in order to see any more messages, but now have another problem. DRBD jumped into state 'stalled' from where it will not budge, shown below. However I can reboot this time, and it is now completing. I didn't see these problems with my previous installs (8.2.1) which I've run for months on several servers without DRBD issue. This suggests to me that there may be problems with this version. Regards, Ben. # uname -a Linux 2.6.22.16-0.1-default #1 SMP 2008/01/23 14:28:52 UTC x86_64 x86_64 x86_64 GNU/Linux # drbdadm invalidate all # cat /proc/drbd version: 8.2.4 (api:88/proto:86-88) GIT-hash: fc00c6e00a1b6039bfcebe37afa3e7e28dbd92fa build by root at hp-tm-09, 2008-02-04 12:03:51 0: cs:SyncTarget st:Primary/Secondary ds:Inconsistent/UpToDate B r--- ns:1568354472 nr:435944 dw:1568790416 dr:1449550146 al:683817 bm:1390 lo:0 pe:2560 ua:0 ap:0 [>....................] sync'ed: 0.1% (787456/787456)M stalled resync: used:5/31 hits:2555 misses:5 starving:0 dirty:0 changed:5 act_log: used:4/257 hits:391404801 misses:970356 starving:8676 dirty:284404 changed:683817 # tail /var/log/messages 08:07:30 kernel: drbd0: conn( Connected -> StartingSyncT ) disk( UpToDate -> Inconsistent ) 08:07:30 kernel: drbd0: Writing meta data super block now. 08:07:30 kernel: drbd0: writing of bitmap took 28 jiffies 08:07:30 kernel: drbd0: 769 GB (201588736 bits) marked out-of-sync by on disk bit-map. 08:07:30 kernel: drbd0: Writing meta data super block now. 08:07:30 kernel: drbd0: conn( StartingSyncT -> WFSyncUUID ) 08:07:30 kernel: drbd0: conn( WFSyncUUID -> SyncTarget ) 08:07:30 kernel: drbd0: Began resync as SyncTarget (will sync 806354944 KB [201588736 bits set]). 08:07:30 kernel: drbd0: Writing meta data super block now. # drbdadm dump all # /etc/drbd.conf common { net { max-buffers 40000; unplug-watermark 40000; max-epoch-size 16384; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; al-extents 257; } startup { degr-wfc-timeout 120; } handlers { pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } } resource dbms-04 { protocol B; on hp-tm-09 { device /dev/drbd0; disk /dev/cciss/c0d0p4; address 192.168.95.18:7789; meta-disk /dev/cciss/c0d0p3 [0]; } on hp-tm-06 { device /dev/drbd0; disk /dev/cciss/c0d0p4; address 192.168.95.17:7788; meta-disk /dev/cciss/c0d0p3 [0]; } disk { on-io-error detach; size 769G; } } Jeffrey goris wrote: > Hi, > > A couple of weeks ago I upgraded from DRBD 0.7.25 to 8.2.4 and all seems to be > working fine. In the last two weeks I got my hands on some free hardware and > upgraded one of the nodes. I was about to change the new node from DRBD > secondary (HA standby) to primary (HA active) when I remembered with DRBD 8.2.4 > that I could do an online verify. So I entered "drbdadm verify data1" on the > primary node, and it all seemed to go swimmingly. However, after the verify > completed (with no oos:0 at least in the first 99% of the verify), the secondary > node (the one with the new hardware) suffered a kernel oops. I brought the > secondary node back up, invalidated the data, connected it, let it completely > synchronise and perfomed an online verify again on the primary with a kernel > oops occurring again after the verify completed. I repeated this once more and I > got a kernel oops for the third time. It is about 2 minutes after the verify > completes that the oops occurs. The data on the primary appears to be completely > fine. Any idea why this is occuring? > > A few other notes, after upgrading the hardware and prior to performing the > online verify, I have also done a couple of other things. In hindsight, I should > not have tried to do so many things in one upgrade cos' now I can't be sure if > any of the other things have had anything to do with the kernel oops. > 1. The primary node is running Fedora 7 with kernel 2.6.22.9-91.fc7. The > upgraded node is running Fedora 8 with kernel 2.6.23.14-107.fc8. > 2. I went from having two DRBD resources (data1 and data2) to just one resource > (data1). The resource data2 is completely unconfigured and shows up in > /proc/drbd as " 1: cs:Unconfigured". The other node has only ever had the one > resource configured. > 3. I have set up LVM2 on top of DRBD. This was the reason for point 1 above. I > plan on having one big DRBD partition and using LVM to chop it up and create > smaller file systems. I'm not using LVM anywhere else on this system. After the > upgrade I play to expand /dev/drbd0 partition (which requires an online software > RAID expansion of /dev/md2 underneath) which should be fun. As an aside, does it > look like I have I up LVM2 on DRBD okay? > > All the gory details are below. > > Cheers, > Jeff. > > ========= Kernel Oops ========== > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: Oops: 0000 [#1] SMP > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: EIP: 0060:[<c047d9a0>] Not tainted VLI > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: CPU: 0 > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: EFLAGS: 00010086 (2.6.23.14-107.fc8 #1) > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: EIP is at kmem_cache_alloc+0x5a/0x99 > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: esi: c0735230 edi: 00000292 ebp: 000080d0 esp: d15c1ebc > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: eax: 00000000 ebx: 83c38953 ecx: deb7720e edx: c119d8e0 > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: Stack: 00000000 00000000 db4d3800 00000000 d1877be0 d1877be0 deb7720e > db4d3800 > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: 00000000 cd424718 c04af479 d1877be0 cd424718 00000000 c04af43d > c047fbc1 > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: Process exim (pid: 5915, ti=d15c1000 task=cbb63230 task.ti=d15c1000) > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: ddcf9e00 c9c23110 d1877be0 ffffff9c d15c1f30 00000004 c047fcf2 > d1877be0 > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<deb7720e>] if6_seq_open+0x14/0x46 [ipv6] > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: Call Trace: > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<c04af479>] proc_reg_open+0x3c/0x4c > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<c04af43d>] proc_reg_open+0x0/0x4c > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<c047fbc1>] __dentry_open+0xd5/0x18c > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<c047fcf2>] nameidata_to_filp+0x24/0x33 > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<c047fa72>] get_unused_fd_flags+0x52/0xc5 > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<c047fd87>] do_sys_open+0x48/0xca > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<c047fd38>] do_filp_open+0x37/0x3e > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<c040518a>] syscall_call+0x7/0xb > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: ======================= > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: [<c047fe42>] sys_open+0x1c/0x1e > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: Code: 00 00 00 85 d2 74 06 83 7a 0c 00 75 17 89 54 24 04 89 f0 89 ea 89 > 0c 24 83 c9 ff e8 37 fa ff ff 89 c3 eb 0d 8b 5a 0c 0f b7 42 0a <8b> 04 83 89 42 > 0c 89 f8 50 9d 8d 04 05 00 00 00 00 90 66 85 ed > > Message from syslogd at sauron at Feb 5 08:27:47 ... > kernel: EIP: [<c047d9a0>] kmem_cache_alloc+0x5a/0x99 SS:ESP 0068:d15c1ebc > > ========== /etc/drbd.conf ========== > resource data1 { > protocol C; > handlers { > pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; > pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; > local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; > outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; > } > startup { > wfc-timeout 0; # Infinite! > degr-wfc-timeout 120; # 2 minutes. > } > disk { > on-io-error detach; > } > net { > # timeout 60; # 6 seconds (unit = 0.1 seconds) > # connect-int 10; # 10 seconds (unit = 1 second) > # ping-int 10; # 10 seconds (unit = 1 second) > # max-buffers 2048; > # max-epoch-size 2048; > # ko-count 4; > # on-disconnect reconnect; > } > syncer { > rate 50M; # 8 Mbit/s > verify-alg crc32c; > # al-extents 257; > } > on sauron.whiterabbit.com.au { > device /dev/drbd0; > disk /dev/md2; > address 172.16.0.10:7788; > meta-disk internal; > } > on shelob.whiterabbit.com.au { > device /dev/drbd0; > disk /dev/md2; > address 172.16.0.11:7788; > meta-disk internal; > } > } > > > ========== /etc/lvm/lvm.conf ========== > Just listing the modified/new entries. Everything else is stock lvm.conf. > > # (JG) Filter for DRBD devices only > filter = [ "a/drbd.*/" , "r/.*/" ] > # (JG) Types fo allow DRBD block devices > types = [ "drbd", 16 ] > # (JG) Don't automatically activate any VGs or LVs as this is done by Heartbeat > volume_list = "" > > ========== /etc/ha.d/haresources ========== > shelob.whiterabbit.com.au drbddisk::data1 LVM::VolGroup00 > Filesystem::/dev/VolGroup00/LogVol00::/mnt/home::ext3 172.16.0.9 60.241.247.218 > nfs cyrus-imapd httpd > > ========== LVM Steps ========== > #On the primary node: > pvcreate /dev/drbd0 > vgcreate VolGroup00 /dev/drbd0 > vgdisplay VolGroup00 > --- Volume group --- > VG Name VolGroup00 > System ID > Format lvm2 > Metadata Areas 1 > Metadata Sequence No 4 > VG Access read/write > VG Status resizable > MAX LV 0 > Cur LV 1 > Open LV 1 > Max PV 0 > Cur PV 1 > Act PV 1 > VG Size 19.08 GB > PE Size 4.00 MB > Total PE 4884 > Alloc PE / Size 4884 / 19.08 GB > Free PE / Size 0 / 0 > VG UUID Mivt0j-0tk4-1DLM-9981-ilRy-yVFq-bptQLk > lvcreate --extents 4884 --name LogVol00 VolGroup00 > lvscan > ACTIVE '/dev/VolGroup00/LogVol00' [19.08 GB] inherit > mkfs.ext3 -b 4096 -L "/mnt/data1" /dev/VolGroup00/LogVol00 > > # On the secondary node > lvscan > No volume groups found > vgscan > Reading all physical volumes. This may take a while... > No volume groups found > pvscan > No matching physical volumes found > ==================== > > As an aside, have I set up LVM2 on DRBD okay? > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > ************************************************************************* This e-mail is confidential and may be legally privileged. It is intended solely for the use of the individual(s) to whom it is addressed. Any content in this message is not necessarily a view or statement from Road Tech Computer Systems Limited but is that of the individual sender. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. We use reasonable endeavours to virus scan all e-mails leaving the company but no warranty is given that this e-mail and any attachments are virus free. You should undertake your own virus checking. The right to monitor e-mail communications through our networks is reserved by us Road Tech Computer Systems Ltd. Shenley Hall, Rectory Lane, Shenley, Radlett, Hertfordshire, WD7 9AN. - VAT Registration No GB 449 3582 17 Registered in England No: 02017435, Registered Address: Charter Court, Midland Road, Hemel Hempstead, Hertfordshire, HP2 5GE. *************************************************************************