Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
When I debug the crash, when oops occurs, the cstate of connection is C_WF_REPORT_PARAMS but not C_TEAR_DOWN. So this problem may also occurs up to 8.4.10 in may opinion. the order of change state and init ack_sender in conn_connect function is: ``` rv = conn_request_state(connection, NS(conn, C_WF_REPORT_PARAMS), CS_VERBOSE); <-- change cstate here if (rv < SS_SUCCESS || connection->cstate != C_WF_REPORT_PARAMS) { clear_bit(STATE_SENT, &connection->flags); return 0; } drbd_thread_start(&connection->ack_receiver); /* opencoded create_singlethread_workqueue(), * to be able to use format string arguments */ connection->ack_sender = <-- init ack_sender here #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,3,0) alloc_ordered_workqueue("drbd_as_%s", WQ_MEM_RECLAIM, connection->resource->name); #else create_singlethread_workqueue("drbd_ack_sender"); #endif if (!connection->ack_sender) { drbd_err(connection, "Failed to create workqueue ack_sender\n"); return 0; } ``` and the oops point valid ack_sender by cstate: ``` if (connection->cstate >= C_WF_REPORT_PARAMS) { kref_get(&device->kref); /* put is in drbd_send_acks_wf() */ if (!queue_work(connection->ack_sender, &peer_device->send_acks_work)) <-- oops here. kref_put(&device->kref, drbd_destroy_device); } ``` 2017-08-10 18:21 GMT+08:00 Lars Ellenberg <lars.ellenberg at linbit.com>: > On Wed, Aug 09, 2017 at 05:20:22PM +0800, li songmin wrote: > > Hi, > > > > when I upgrade fdrbd rom 8.3.15 to 8.4.6-5, there is an oops cause by > NULL > > pointer Error. > > We are at 8.4.10 already. > Just saying. > > > > > upgrade step as follow: > > > > 1. primary node work as normal > > 2. stop drbd 8.3.15 on secondary node, and upgrade it to 8.4.6-5. > > 3. start secondary node, now data begin sync from primary node. > > 4. upgrade primary node with follow step > > 1. stop business service on drbd > > 2. disconnect drbd for unmount quickly <-- oops on secondary node > > here? > > Why disconnect? > > > 3. umount filesystem > > 4. primary -> secondary > > 5. connect drbd and waiting sync complete. > > 6. business service may start on secondary node now. > > 7. stop drbd 8.3.15 on primary node, and upgrade it to 8.4.6-5. > > > > call stack: > > > <4>[66071017.155051] Modules linked in: softdog drbd(FN) > > What did you need to force the module for? > Probably *that* is your problem right there. > > > -- > : Lars Ellenberg > : LINBIT | Keeping the Digital World Running > : DRBD -- Heartbeat -- Corosync -- Pacemaker > > DRBD® and LINBIT® are registered trademarks of LINBIT > __ > please don't Cc me, but send to list -- I'm subscribed > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170811/09f86ebb/attachment.htm>