[DRBD-user] drbd 8.4.6-5 oops when disconnect

li songmin lisongmin9 at gmail.com
Sun Aug 13 16:18:06 CEST 2017

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


It seems connection->cstate only update via P_CONN_ST_CHG_REQ, but  drbd
8.3 send req via P_STATE_CHG_REQ.

2017-08-11 14:12 GMT+08:00 li songmin <lisongmin9 at gmail.com>:

>
> When I debug the crash, when oops occurs, the cstate of connection is
> C_WF_REPORT_PARAMS but not C_TEAR_DOWN.
>
> So this problem may also occurs up to 8.4.10 in may opinion.
>
> the order of change state and init ack_sender in conn_connect function is:
>
> ```
>     rv = conn_request_state(connection, NS(conn, C_WF_REPORT_PARAMS),
> CS_VERBOSE);  <-- change cstate here
>     if (rv < SS_SUCCESS || connection->cstate != C_WF_REPORT_PARAMS) {
>         clear_bit(STATE_SENT, &connection->flags);
>         return 0;
>     }
>
>     drbd_thread_start(&connection->ack_receiver);
>     /* opencoded create_singlethread_workqueue(),
>      * to be able to use format string arguments */
>     connection->ack_sender =
> <-- init ack_sender here
> #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,3,0)
>         alloc_ordered_workqueue("drbd_as_%s", WQ_MEM_RECLAIM,
> connection->resource->name);
> #else
>         create_singlethread_workqueue("drbd_ack_sender");
> #endif
>     if (!connection->ack_sender) {
>         drbd_err(connection, "Failed to create workqueue ack_sender\n");
>         return 0;
>     }
>
> ```
>
> and the oops point valid ack_sender by cstate:
>
> ```
>     if (connection->cstate >= C_WF_REPORT_PARAMS) {
>         kref_get(&device->kref); /* put is in drbd_send_acks_wf() */
>         if (!queue_work(connection->ack_sender,
> &peer_device->send_acks_work))  <-- oops here.
>             kref_put(&device->kref, drbd_destroy_device);
>     }
> ```
>
> 2017-08-10 18:21 GMT+08:00 Lars Ellenberg <lars.ellenberg at linbit.com>:
>
>> On Wed, Aug 09, 2017 at 05:20:22PM +0800, li songmin wrote:
>> > Hi,
>> >
>> > when I upgrade fdrbd rom 8.3.15 to 8.4.6-5, there is an oops cause by
>> NULL
>> > pointer Error.
>>
>> We are at 8.4.10 already.
>> Just saying.
>>
>> >
>> > upgrade step as follow:
>> >
>> > 1.  primary node work as normal
>> > 2. stop drbd 8.3.15 on secondary node, and upgrade it to 8.4.6-5.
>> > 3. start secondary node, now data begin sync from primary node.
>> > 4. upgrade primary node with follow step
>> >      1. stop business service on drbd
>> >       2. disconnect drbd for unmount quickly  <--  oops on secondary
>> node
>> > here?
>>
>> Why disconnect?
>>
>> >       3.  umount filesystem
>> >       4. primary -> secondary
>> >       5. connect drbd and waiting sync complete.
>> >       6. business service may start on secondary node now.
>> >       7. stop drbd 8.3.15 on primary node, and upgrade it to 8.4.6-5.
>> >
>> > call stack:
>>
>> > <4>[66071017.155051] Modules linked in: softdog drbd(FN)
>>
>> What did you need to force the module for?
>> Probably *that* is your problem right there.
>>
>>
>> --
>> : Lars Ellenberg
>> : LINBIT | Keeping the Digital World Running
>> : DRBD -- Heartbeat -- Corosync -- Pacemaker
>>
>> DRBD® and LINBIT® are registered trademarks of LINBIT
>> __
>> please don't Cc me, but send to list -- I'm subscribed
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20170813/7e0087dc/attachment.htm>


More information about the drbd-user mailing list