Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hi, Some info may help here. When oops occurs on secondary node, I found it log with "meta connection shut down by peer". <6>[ 7423.143569] block drbd0: peer( Primary -> Unknown ) conn( SyncTarget -> TearDown ) pdsk( UpToDate -> DUnknown ) <3>[ 7423.191698] drbd drbd0: meta connection shut down by peer. And disconnect successfully if log with "ack_receiver terminated". <6>[ 7407.584922] block drbd0: peer( Primary -> Unknown ) conn( SyncTarget -> TearDown ) pdsk( UpToDate -> DUnknown ) <6>[ 7407.585424] drbd drbd0: ack_receiver terminated 2017-08-10 15:37 GMT+08:00 li songmin <lisongmin9 at gmail.com>: > After a more test, it seems only occur from 8.3 to 8.4, 8.4 to 8.4 work > well. > > Could we sure this problem not affect with 8.4 to 8.4? > > And also see the crash in attachment > > 2017-08-09 22:21 GMT+08:00 li songmin <lisongmin9 at gmail.com>: > >> >> Hi, >> >> when I upgrade drbd from 8.3.15 to 8.4.6-5, there is an oops cause by >> NULL pointer Error. >> >> upgrade step as follow: >> >> 1. primary node work with drbd 8.3.15 as normal >> 2. stop drbd 8.3.15 on secondary node, and upgrade it to 8.4.6-5. >> 3. start secondary node, now data begin sync from primary node. >> 4. upgrade primary node with follow step >> 1. stop business service on drbd >> 2. disconnect drbd for umount quickly <-- oops on secondary node >> here? >> 3. umount filesystem >> 4. primary -> secondary >> 5. connect drbd and waiting sync complete. >> 6. business service may start on secondary node now. >> 7. stop drbd 8.3.15 on primary node, and upgrade it to 8.4.6-5. >> >> call stack: >> >> <6>[66071016.839607] block drbd0: peer( Primary -> Unknown ) >> conn( Connected -> TearDown ) pdsk( UpToDate -> DUnknown ) >> <3>[66071016.840029] drbd drbd0: meta connection shut down by peer. >> <6>[66071016.840037] drbd drbd0: ack_receiver terminated >> <6>[66071016.840040] drbd drbd0: Terminating drbd_a_drbd0 >> <1>[66071017.154996] BUG: unable to handle kernel NULL point >> er dereference at (null) >> <1>[66071017.155013] IP: [<ffffffff8107b4e7>] __queue_work+0x17/0x3f0 >> <4>[66071017.155030] PGD 0 >> <0>[66071017.155037] Oops: 0000 [#1] SMP >> <4>[66071017.155048] CPU 0 >> <4>[66071017.155051] Modules linked in: softdog drbd(FN) crc >> 32c libcrc32c ib_ipoib ib_cm mlx4_en mlx4_ib ib_sa ib_mad ib >> _core mlx4_core mpt3sas mptctl mptbase lp parport_pc ppdev s >> t parport ide_cd_mod ide_core joydev ipmi_devintf ipmi_si ip >> mi_msghandler tcp_diag inet_diag nls_utf8 dm_snapshot af_pac >> ket md5 binfmt_misc edd bonding cpufreq_conservative cpufreq >> _userspace cpufreq_powersave acpi_cpufreq mperf microcode >> fuse loop dm_mod cdc_ether igb usbnet shpchp tpm_tis dca >> tpm ipv6 pci_hotplug sr_mod ipv6_lib tpm_bios ptp mii sg >> pcspkr i2c_i801 rtc_cmos cdrom pps_core wmi button >> ext3 jbd mbcache usbhid hid ttm drm_kms_helper drm i2c_ >> algo_bit sysimgblt sysfillrect i2c_core syscopyarea ehci_ >> hcd usbcore sd_mod usb_common crc_t10dif processor thermal_s >> ys hwmon scsi_dh_alua scsi_dh_hp_sw scsi_dh_emc scsi_dh_rdac >> scsi_dh ahci libahci libata mpt2sas scsi_transport_sas >> raid_class megaraid_sas scsi_mod [last unloaded: drbd] >> <4>[66071017.155207] Supported: No, Unsupported modules are loaded >> <4>[66071017.155214] >> <4>[66071017.155221] Pid: 0, comm: swapper Tainted: GF B >> N 3.0.76-0.11-default #1 IBM System x3650 M4 : -[7915OSC]-/00Y8494 >> <4>[66071017.155234] RIP: 0010:[<ffffffff8107b4e7>] [<fffff >> fff8107b4e7>] __queue_work+0x17/0x3f0 >> <4>[66071017.155246] RSP: 0018:ffff88047fc03cc0 EFLAGS: 00010086 >> <4>[66071017.155253] RAX: 0000000000000000 RBX: ffff88046503 >> 2c00 RCX: 0000000000000000 >> <4>[66071017.155261] RDX: ffff8804294fb3e0 RSI: 000000000000 >> 0000 RDI: 0000000000000000 >> <4>[66071017.155269] RBP: 0000000000000000 R08: 000000000000 >> 0000 R09: 00000004d9030553 >> <4>[66071017.155277] R10: 00000004d903059c R11: 000000000000 >> 0001 R12: ffff880465032800 >> <4>[66071017.155285] R13: ffff8804294fb3e0 R14: 000000000000 >> 0000 R15: 0000000000003800 >> <4>[66071017.155293] FS: 0000000000000000(0000) GS:ffff8804 >> 7fc00000(0000) knlGS:0000000000000000 >> <4>[66071017.155302] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> <4>[66071017.155309] CR2: 0000000000000000 CR3: 0000000001a0 >> 9000 CR4: 00000000000407f0 >> <4>[66071017.155317] DR0: 0000000000000000 DR1: 000000000000 >> 0000 DR2: 0000000000000000 >> <4>[66071017.155325] DR3: 0000000000000000 DR6: 00000000ffff >> 0ff0 DR7: 0000000000000400 >> <4>[66071017.155333] Process swapper (pid: 0, threadinfo fff >> fffff81a00000, task ffffffff81a11020) >> <0>[66071017.155341] Stack: >> <4>[66071017.155346] ffff88046557b000 ffff880438b41248 ffff >> 880465bfe000 ffff880465032c00 >> <4>[66071017.155363] ffff880465032870 ffff880465032800 ffff >> 88076b08e000 ffff8804294fb3c0 >> <4>[66071017.155376] 0000000000003800 ffffffff8107b906 ffff >> 8804654e8e10 ffffffff8107b95a >> <0>[66071017.155389] Call Trace: >> <4>[66071017.155408] [<ffffffff8107b906>] queue_work_on+0x16/0x30 >> <4>[66071017.155419] [<ffffffff8107b95a>] queue_work+0x1a/0x20 >> <4>[66071017.155440] [<ffffffffa06aa8cd>] drbd_endio_write_ >> sec_final+0x25d/0x650 [drbd] >> <4>[66071017.155488] [<ffffffff8122404b>] blk_update_request+0x10b/0x440 >> <4>[66071017.155502] [<ffffffff8122439f>] blk_update_bidi_r >> equest+0x1f/0x90 >> <4>[66071017.155513] [<ffffffff81225517>] blk_end_bidi_request+0x27/0x80 >> <4>[66071017.155538] [<ffffffffa000a5ea>] scsi_end_request+ >> 0x3a/0xb0 [scsi_mod] >> <4>[66071017.155572] [<ffffffffa000a9dc>] scsi_io_completio >> n+0x10c/0x5b0 [scsi_mod] >> <4>[66071017.155598] [<ffffffff8122b8b5>] blk_done_softirq+0x75/0x90 >> <4>[66071017.155611] [<ffffffff81066eaf>] __do_softirq+0xef/0x220 >> <4>[66071017.155627] [<ffffffff814657dc>] call_softirq+0x1c/0x30 >> <4>[66071017.155642] [<ffffffff81004445>] do_softirq+0x65/0xa0 >> <4>[66071017.155654] [<ffffffff81066ca5>] irq_exit+0xc5/0xe0 >> <4>[66071017.155667] [<ffffffff81465433>] call_function_sin >> gle_interrupt+0x13/0x20 >> <4>[66071017.155683] [<ffffffff812ba85e>] intel_idle+0x9e/0x130 >> <4>[66071017.155697] [<ffffffff813769eb>] cpuidle_idle_call+0x11b/0x280 >> <4>[66071017.155710] [<ffffffff81002126>] cpu_idle+0x66/0xb0 >> <4>[66071017.155722] [<ffffffff81beeeff>] start_kernel+0x376/0x447 >> <4>[66071017.155736] [<ffffffff81bee3c9>] x86_64_start_kern >> el+0x123/0x13d >> <0>[66071017.155746] Code: 8f b5 00 7d d4 5b c3 66 66 66 66 >> 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 41 89 fe 41 55 49 89 >> d5 41 54 55 48 89 f5 53 48 83 ec 18 <8b> 16 f6 c2 40 0f 85 6 >> 4 02 00 00 f6 c2 02 0f 85 f5 01 00 00 41 >> <1>[66071017.155810] RIP [<ffffffff8107b4e7>] __queue_work+0x17/0x3f0 >> <4>[66071017.155820] RSP <ffff88047fc03cc0> >> <0>[66071017.155826] CR2: 0000000000000000 >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170810/89ab3971/attachment.htm>