Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Fri, Jul 14, 2006 at 10:46:27PM +1000, Bradley Baetz wrote: > [please cc me on replies; I'm not subscribed to the list] > If I do reproduce it, I'll see if I can get a tcpdump too. ...and I managed it. To reproduce: Reboot the secondary Wait for it to come up and sync hb_standby the master postgres/kjournald gets stuck in D, and heartbeat's sanity checks kick in and it tries reboot -f, which also fails in D state I also have: [root at dbtools01 ~]# cat /proc/drbd version: 0.7.20 (api:79/proto:74) SVN Revision: 2260 build by root at build03, 2006-07-14 10:39:06 0: cs:Connected st:Primary/Secondary ld:Consistent ns:760504 nr:452 dw:49876 dr:742518 al:11 bm:311 lo:0 pe:3 ua:0 ap:3 [root at dbtools02 ~]# cat /proc/drbd version: 0.7.20 (api:79/proto:74) SVN Revision: 2260 build by root at build03, 2006-07-14 10:39:06 0: cs:Connected st:Secondary/Primary ld:Consistent ns:0 nr:208 dw:208 dr:0 al:0 bm:6 lo:0 pe:0 ua:0 ap:0 reboot the master with reboot -n -f, and when it comes back up: [root at dbtools01 ~]# cat /proc/drbd version: 0.7.20 (api:79/proto:74) SVN Revision: 2260 build by root at build03, 2006-07-14 10:39:06 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:68892 nr:0 dw:0 dr:72000 al:0 bm:223 lo:1 pe:7 ua:777 ap:0 [=>..................] sync'ed: 8.4% (811772/880640)K stalled [root at dbtools02 ~]# cat /proc/drbd version: 0.7.20 (api:79/proto:74) SVN Revision: 2260 build by root at build03, 2006-07-14 10:39:06 0: cs:SyncTarget st:Secondary/Secondary ld:Inconsistent ns:0 nr:69076 dw:69076 dr:0 al:0 bm:14 lo:0 pe:819 ua:0 ap:0 [=>..................] sync'ed: 8.4% (811772/880640)K stalled tcpdump shows the same data being sent from the secondary to the primary: 83 74 02 67 00 0d 00 00 which is being acked appropriately, and then resent again and again and again and.... I don't suppose theres an ethereal plugin for the DRBD protocol? :) mount on the primary is stuck in D state: Jul 14 23:16:15 dbtools01 kernel: mount D 00000000 2184 2907 2887 (NOTLB) Jul 14 23:16:15 dbtools01 kernel: f5d82ac8 00000082 c16ab2e0 00000000 00000000 00000001 00001000 00000001 Jul 14 23:16:15 dbtools01 kernel: 00000001 00000001 f7fcef30 c1816de0 00000001 00029d50 ba9003e8 00000061 Jul 14 23:16:15 dbtools01 kernel: f7e110b0 f619c130 f619c29c f593365c 00000001 f5933284 00000246 f593328c Jul 14 23:16:15 dbtools01 kernel: Call Trace: Jul 14 23:16:15 dbtools01 kernel: [<c02cfbb1>] __down+0x81/0xdb Jul 14 23:16:15 dbtools01 kernel: [<c011e71b>] default_wake_function+0x0/0xc Jul 14 23:16:15 dbtools01 kernel: [<c02cfd28>] __down_failed+0x8/0xc Jul 14 23:16:15 dbtools01 kernel: [<f8af934c>] .text.lock.drbd_main+0x41/0x18c [drbd] Jul 14 23:16:15 dbtools01 kernel: [<f8af263c>] drbd_make_request_common+0x499/0x744 [drbd] Jul 14 23:16:15 dbtools01 kernel: [<f885e89c>] __split_bio+0xfd/0x103 [dm_mod] Jul 14 23:16:15 dbtools01 kernel: [<f8af2aa7>] drbd_make_request_26+0x1c0/0x1c9 [drbd] Jul 14 23:16:15 dbtools01 kernel: [<c022431c>] generic_make_request+0x18e/0x19e Jul 14 23:16:15 dbtools01 kernel: [<c0120291>] autoremove_wake_function+0x0/0x2d Jul 14 23:16:15 dbtools01 kernel: [<c02243f6>] submit_bio+0xca/0xd2 Jul 14 23:16:15 dbtools01 kernel: [<c015e7c9>] bio_alloc+0x100/0x168 Jul 14 23:16:15 dbtools01 kernel: [<c015e180>] submit_bh+0x141/0x166 Jul 14 23:16:15 dbtools01 kernel: [<c015cc59>] __block_write_full_page+0x1f0/0x2ea Jul 14 23:16:15 dbtools01 kernel: [<c0160822>] blkdev_get_block+0x0/0x46 Jul 14 23:16:15 dbtools01 kernel: [<c015dfc8>] block_write_full_page+0xc5/0xce Jul 14 23:16:15 dbtools01 kernel: [<c0160822>] blkdev_get_block+0x0/0x46 Jul 14 23:16:15 dbtools01 kernel: [<c0177bfa>] mpage_writepages+0x1c2/0x314 Jul 14 23:16:15 dbtools01 kernel: [<c0160915>] blkdev_writepage+0x0/0xc Jul 14 23:16:15 dbtools01 kernel: [<c0144a4d>] do_writepages+0x19/0x27 Jul 14 23:16:15 dbtools01 kernel: [<c013f8f7>] __filemap_fdatawrite_range+0x7a/0x85 Jul 14 23:16:15 dbtools01 kernel: [<c013f911>] filemap_fdatawrite+0xf/0x13 Jul 14 23:16:15 dbtools01 kernel: [<c015b5d9>] sync_blockdev+0x18/0x32 Jul 14 23:16:15 dbtools01 kernel: [<f88714b6>] journal_recover+0xa2/0xab [jbd] Jul 14 23:16:15 dbtools01 kernel: [<f887410e>] journal_load+0x3c/0x6b [jbd] Jul 14 23:16:15 dbtools01 kernel: [<f88f760e>] ext3_load_journal+0x124/0x160 [ext3] Jul 14 23:16:15 dbtools01 kernel: [<f88f6f4e>] ext3_fill_super+0x70c/0x9a2 [ext3] Jul 14 23:16:15 dbtools01 kernel: [<c016025f>] get_sb_bdev+0xe3/0x120 Jul 14 23:16:15 dbtools01 kernel: [<c02d0ca2>] __cond_resched+0x14/0x39 Jul 14 23:16:15 dbtools01 kernel: [<f88f7f42>] ext3_get_sb+0xe/0x11 [ext3] Jul 14 23:16:15 dbtools01 kernel: [<f88f6842>] ext3_fill_super+0x0/0x9a2 [ext3] Jul 14 23:16:15 dbtools01 kernel: [<c016042b>] do_kern_mount+0x8a/0x147 Jul 14 23:16:15 dbtools01 kernel: [<c01732a3>] do_new_mount+0x61/0x90 Jul 14 23:16:15 dbtools01 kernel: [<c01738f0>] do_mount+0x178/0x190 Jul 14 23:16:15 dbtools01 kernel: [<c0173c47>] sys_mount+0x91/0x108 Jul 14 23:16:15 dbtools01 kernel: [<c02d268f>] syscall_call+0x7/0xb Its a bit odd that its doing this. Is there perhaps some bio that DRBD isn't handling properly? And when it comes back up and replays the ext3 journal, it hits the same bio again? Bradley