<div dir="ltr"><div><div><div><br></div>Hi,<br><br></div>Some info may help here.<br><br></div>When oops occurs on secondary node, I found it log with &quot;meta connection shut down by peer&quot;.<br><div><br>&lt;6&gt;[ 7423.143569] block drbd0: peer( Primary -&gt; Unknown ) conn( SyncTarget -&gt; TearDown ) pdsk( UpToDate -&gt; DUnknown ) <br>&lt;3&gt;[ 7423.191698] drbd drbd0: meta connection shut down by peer.<br><br>And disconnect successfully if log with &quot;ack_receiver terminated&quot;.<br><br>&lt;6&gt;[ 7407.584922] block drbd0: peer( Primary -&gt; Unknown ) conn( SyncTarget -&gt; TearDown ) pdsk( UpToDate -&gt; DUnknown ) <br>&lt;6&gt;[ 7407.585424] drbd drbd0: ack_receiver terminated<br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2017-08-10 15:37 GMT+08:00 li songmin <span dir="ltr">&lt;<a href="mailto:lisongmin9@gmail.com" target="_blank">lisongmin9@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>After a more test, it seems only occur from 8.3 to 8.4, 8.4 to 8.4 work well.<br><br></div><div>Could we sure this problem not affect with 8.4 to 8.4?<br><br></div><div>And also see the crash in attachment<br></div><div><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2017-08-09 22:21 GMT+08:00 li songmin <span dir="ltr">&lt;<a href="mailto:lisongmin9@gmail.com" target="_blank">lisongmin9@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_quote"><div dir="ltr"><div></div><div>Hi,<br><br></div>when I upgrade drbd from 8.3.15 to 8.4.6-5, there is an oops cause by NULL pointer Error.<br><div><br></div><div>upgrade step as follow:<br><br></div><div>1.  primary node work with drbd 8.3.15 as normal<br></div><div>2. stop drbd 8.3.15 on secondary node, and upgrade it to 8.4.6-5.<br></div><div>3. start secondary node, now data begin sync from primary node.<br>4. upgrade primary node with follow step<br></div><div>     1. stop business service on drbd<br>      2. disconnect drbd for umount quickly  &lt;--  oops on secondary node here?<br></div><div>      3.  umount filesystem<br></div><div>      4. primary -&gt; secondary<br></div><div></div><div>      5. connect drbd and waiting sync complete.<br></div><div>      6. business service may start on secondary node now.<br></div><div>      7. stop drbd 8.3.15 on primary node, and upgrade it to 8.4.6-5.<br></div><div><br></div><div>call stack:<br></div><div><br>&lt;6&gt;[66071016.839607] block drb<wbr>d0: peer( Primary -&gt; Unknown )<wbr> conn( Connected -&gt; TearDown )<wbr> pdsk( UpToDate -&gt; DUnknown ) <br>&lt;3&gt;[66071016.840029] drbd drbd<wbr>0: meta connection shut down b<wbr>y peer.<br>&lt;6&gt;[66071016.840037] drbd drbd<wbr>0: ack_receiver terminated<br>&lt;6&gt;[66071016.840040] drbd drbd<wbr>0: Terminating drbd_a_drbd0<br>&lt;1&gt;[66071017.154996] BUG: unab<wbr>le to handle kernel NULL point<wbr>er dereference at           (n<wbr>ull)<br>&lt;1&gt;[66071017.155013] IP: [&lt;fff<wbr>fffff8107b4e7&gt;] __queue_work+0<wbr>x17/0x3f0<br>&lt;4&gt;[66071017.155030] PGD 0 <br>&lt;0&gt;[66071017.155037] Oops: 000<wbr>0 [#1] SMP <br>&lt;4&gt;[66071017.155048] CPU 0 <br>&lt;4&gt;[66071017.155051] Modules l<wbr>inked in: softdog drbd(FN) crc<wbr>32c libcrc32c ib_ipoib ib_cm m<wbr>lx4_en mlx4_ib ib_sa ib_mad ib<wbr>_core mlx4_core mpt3sas mptctl<wbr> mptbase lp parport_pc ppdev s<wbr>t parport ide_cd_mod ide_core <wbr>joydev ipmi_devintf ipmi_si ip<wbr>mi_msghandler tcp_diag inet_di<wbr>ag nls_utf8 dm_snapshot af_pac<wbr>ket md5 binfmt_misc edd bondin<wbr>g cpufreq_conservative cpufreq<wbr>_userspace cpufreq_powersave <wbr>acpi_cpufreq mperf microcode <wbr>fuse loop dm_mod cdc_ether <wbr>igb usbnet shpchp tpm_tis dca <wbr>tpm ipv6 pci_hotplug sr_mod <wbr>ipv6_lib tpm_bios ptp mii sg <wbr>pcspkr i2c_i801 rtc_cmos <wbr>cdrom pps_core wmi button <wbr>ext3 jbd mbcache usbhid hid <wbr>ttm drm_kms_helper drm i2c_<wbr>algo_bit sysimgblt sysfillrect<wbr> i2c_core syscopyarea ehci_<wbr>hcd usbcore sd_mod usb_common <wbr>crc_t10dif processor thermal_s<wbr>ys hwmon scsi_dh_alua scsi_dh_<wbr>hp_sw scsi_dh_emc scsi_dh_rdac<wbr> scsi_dh ahci libahci libata <wbr>mpt2sas scsi_transport_sas <wbr>raid_class megaraid_sas scsi_<wbr>mod [last unloaded: drbd]<br>&lt;4&gt;[66071017.155207] Supported<wbr>: No, Unsupported modules are <wbr>loaded<br>&lt;4&gt;[66071017.155214] <br>&lt;4&gt;[66071017.155221] Pid: 0, c<wbr>omm: swapper Tainted: GF   B  <wbr>    N  3.0.76-0.11-default #1 <wbr>IBM System x3650 M4 : -[7915OS<wbr>C]-/00Y8494<br>&lt;4&gt;[66071017.155234] RIP: 0010<wbr>:[&lt;ffffffff8107b4e7&gt;]  [&lt;fffff<wbr>fff8107b4e7&gt;] __queue_work+0x1<wbr>7/0x3f0<br>&lt;4&gt;[66071017.155246] RSP: 0018<wbr>:ffff88047fc03cc0  EFLAGS: 000<wbr>10086<br>&lt;4&gt;[66071017.155253] RAX: 0000<wbr>000000000000 RBX: ffff88046503<wbr>2c00 RCX: 0000000000000000<br>&lt;4&gt;[66071017.155261] RDX: ffff<wbr>8804294fb3e0 RSI: 000000000000<wbr>0000 RDI: 0000000000000000<br>&lt;4&gt;[66071017.155269] RBP: 0000<wbr>000000000000 R08: 000000000000<wbr>0000 R09: 00000004d9030553<br>&lt;4&gt;[66071017.155277] R10: 0000<wbr>0004d903059c R11: 000000000000<wbr>0001 R12: ffff880465032800<br>&lt;4&gt;[66071017.155285] R13: ffff<wbr>8804294fb3e0 R14: 000000000000<wbr>0000 R15: 0000000000003800<br>&lt;4&gt;[66071017.155293] FS:  0000<wbr>000000000000(0000) GS:ffff8804<wbr>7fc00000(0000) knlGS:000000000<wbr>0000000<br>&lt;4&gt;[66071017.155302] CS:  0010<wbr> DS: 0000 ES: 0000 CR0: 000000<wbr>008005003b<br>&lt;4&gt;[66071017.155309] CR2: 0000<wbr>000000000000 CR3: 0000000001a0<wbr>9000 CR4: 00000000000407f0<br>&lt;4&gt;[66071017.155317] DR0: 0000<wbr>000000000000 DR1: 000000000000<wbr>0000 DR2: 0000000000000000<br>&lt;4&gt;[66071017.155325] DR3: 0000<wbr>000000000000 DR6: 00000000ffff<wbr>0ff0 DR7: 0000000000000400<br>&lt;4&gt;[66071017.155333] Process s<wbr>wapper (pid: 0, threadinfo fff<wbr>fffff81a00000, task ffffffff81<wbr>a11020)<br>&lt;0&gt;[66071017.155341] Stack:<br>&lt;4&gt;[66071017.155346]  ffff8804<wbr>6557b000 ffff880438b41248 ffff<wbr>880465bfe000 ffff880465032c00<br>&lt;4&gt;[66071017.155363]  ffff8804<wbr>65032870 ffff880465032800 ffff<wbr>88076b08e000 ffff8804294fb3c0<br>&lt;4&gt;[66071017.155376]  00000000<wbr>00003800 ffffffff8107b906 ffff<wbr>8804654e8e10 ffffffff8107b95a<br>&lt;0&gt;[66071017.155389] Call Trac<wbr>e:<br>&lt;4&gt;[66071017.155408]  [&lt;ffffff<wbr>ff8107b906&gt;] queue_work_on+0x1<wbr>6/0x30<br>&lt;4&gt;[66071017.155419]  [&lt;ffffff<wbr>ff8107b95a&gt;] queue_work+0x1a/0<wbr>x20<br>&lt;4&gt;[66071017.155440]  [&lt;ffffff<wbr>ffa06aa8cd&gt;] drbd_endio_write_<wbr>sec_final+0x25d/0x650 [drbd]<br>&lt;4&gt;[66071017.155488]  [&lt;ffffff<wbr>ff8122404b&gt;] blk_update_reques<wbr>t+0x10b/0x440<br>&lt;4&gt;[66071017.155502]  [&lt;ffffff<wbr>ff8122439f&gt;] blk_update_bidi_r<wbr>equest+0x1f/0x90<br>&lt;4&gt;[66071017.155513]  [&lt;ffffff<wbr>ff81225517&gt;] blk_end_bidi_requ<wbr>est+0x27/0x80<br>&lt;4&gt;[66071017.155538]  [&lt;ffffff<wbr>ffa000a5ea&gt;] scsi_end_request+<wbr>0x3a/0xb0 [scsi_mod]<br>&lt;4&gt;[66071017.155572]  [&lt;ffffff<wbr>ffa000a9dc&gt;] scsi_io_completio<wbr>n+0x10c/0x5b0 [scsi_mod]<br>&lt;4&gt;[66071017.155598]  [&lt;ffffff<wbr>ff8122b8b5&gt;] blk_done_softirq+<wbr>0x75/0x90<br>&lt;4&gt;[66071017.155611]  [&lt;ffffff<wbr>ff81066eaf&gt;] __do_softirq+0xef<wbr>/0x220<br>&lt;4&gt;[66071017.155627]  [&lt;ffffff<wbr>ff814657dc&gt;] call_softirq+0x1c<wbr>/0x30<br>&lt;4&gt;[66071017.155642]  [&lt;ffffff<wbr>ff81004445&gt;] do_softirq+0x65/0<wbr>xa0<br>&lt;4&gt;[66071017.155654]  [&lt;ffffff<wbr>ff81066ca5&gt;] irq_exit+0xc5/0xe<wbr>0<br>&lt;4&gt;[66071017.155667]  [&lt;ffffff<wbr>ff81465433&gt;] call_function_sin<wbr>gle_interrupt+0x13/0x20<br>&lt;4&gt;[66071017.155683]  [&lt;ffffff<wbr>ff812ba85e&gt;] intel_idle+0x9e/0<wbr>x130<br>&lt;4&gt;[66071017.155697]  [&lt;ffffff<wbr>ff813769eb&gt;] cpuidle_idle_call<wbr>+0x11b/0x280<br>&lt;4&gt;[66071017.155710]  [&lt;ffffff<wbr>ff81002126&gt;] cpu_idle+0x66/0xb<wbr>0<br>&lt;4&gt;[66071017.155722]  [&lt;ffffff<wbr>ff81beeeff&gt;] start_kernel+0x37<wbr>6/0x447<br>&lt;4&gt;[66071017.155736]  [&lt;ffffff<wbr>ff81bee3c9&gt;] x86_64_start_kern<wbr>el+0x123/0x13d<br>&lt;0&gt;[66071017.155746] Code: 8f <wbr>b5 00 7d d4 5b c3 66 66 66 66 <wbr>2e 0f 1f 84 00 00 00 00 00 41 <wbr>57 41 56 41 89 fe 41 55 49 89 <wbr>d5 41 54 55 48 89 f5 53 48 83 <wbr>ec 18 &lt;8b&gt; 16 f6 c2 40 0f 85 6<wbr>4 02 00 00 f6 c2 02 0f 85 f5 0<wbr>1 00 00 41 <br>&lt;1&gt;[66071017.155810] RIP  [&lt;ff<wbr>ffffff8107b4e7&gt;] __queue_work+<wbr>0x17/0x3f0<br>&lt;4&gt;[66071017.155820]  RSP &lt;fff<wbr>f88047fc03cc0&gt;<br>&lt;0&gt;[66071017.155826] CR2: 0000<wbr>000000000000<br><br><br></div></div>
</div><br></div>
</blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>