[DRBD-user] Oops in drbd_asender on Debian Jessie

Patrick Feisthammel (Citrin Informatik GmbH) patrick.feisthammel at citrin.ch
Wed Aug 5 08:45:26 CEST 2015

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hi Digimer

It is not related with the size of the drbd. It is related to TRIM commands:

Bug occures in this setup:

- source-drbd on thin provisioned LVM
- target-drbd on "classic" LVM
- virtual machine on source-drbd issues a TRIM command (e.g. fstrim)


Bug does not show up in this setup:

- source-drbd on thin provisioned LVM
- target-drbd on thin provisioned LVM


Bug does not show up in this setup:

- source-drbd on thin provisioned LVM
- target-drbd on "classic" LVM
- virtual machine on source-drbd does no TRIM commands


source-drbd and target-drbd are connected and UpToDate on both ends 
(fully synced) at the start of the test.

My current guess is: drbd does assume TRIM is supported on target side 
(becaus it is on source side) and fails on error handling.


Cheers,
Patrick


Am 04.08.2015 um 16:18 schrieb Digimer:
> I've used it extensively on arrays up to ~40 TB without issue. So I
> suspect there is another problem at play.
>
> Do you have a test environment that you can reproduce this problem in?
> If so, then I would recommend testing upgrading the userland and the
> kernel modules. 8.4.6 is also out and there is a lot of bug fixes from
> .3 to .6. I understand wanting to stick with provided packages, which is
> why I am asking, as a test, to try the upgrade in a dev environment.
>
> On 04/08/15 10:13 AM, Patrick Feisthammel (Citrin Informatik GmbH) wrote:
>> Hi Digimer
>>
>> Version from /proc/drbd is
>>    version: 8.4.3 (api:1/proto:86-101)
>>
>> The policy is to stay on the packages provided by the platform, if
>> possible.
>>
>> Until know it happens only with one 25GB partition. But it gives a bad
>> feeling if drbd can cause a repeated reboot of the physical server.
>>
>> Cheers,
>> Patrick
>>
>> Am 04.08.2015 um 15:17 schrieb Digimer:
>>> What version of DRBD itself? (cat /proc/drbd). Not sure if it will help,
>>> but 8.9.3 is out, can you try upgrading?
>>>
>>> On 04/08/15 03:16 AM, Patrick Feisthammel (Citrin Informatik GmbH) wrote:
>>>> Hi
>>>>
>>>> We have repated Oops with drbd.
>>>>
>>>> Kernel 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u2 (2015-07-17)
>>>> x86_64 GNU/Linux
>>>> drbd-utils is 8.9.2~rc1-2
>>>>
>>>> This happens on different hardware (same software versions). It seams
>>>> only to happen with one specific drbd source.
>>>> Kernel Oops is on the receiving site.
>>>>
>>>> More OOps can be produced if helpful. Any suggestion to solve the issue?
>>>>
>>>> Aug  4 08:23:15 octa12 kernel: [34843.532236] drbd k126nfs1: Starting
>>>> worker thread (from drbdsetup-84 [7570])
>>>> Aug  4 08:23:15 octa12 kernel: [34843.534488] block drbd133: disk(
>>>> Diskless -> Attaching )
>>>> Aug  4 08:23:15 octa12 kernel: [34843.535065] drbd k126nfs1: Method to
>>>> ensure write ordering: drain
>>>> Aug  4 08:23:15 octa12 kernel: [34843.535069] block drbd133: max BIO
>>>> size = 4096
>>>> Aug  4 08:23:15 octa12 kernel: [34843.535074] block drbd133:
>>>> drbd_bm_resize called with capacity == 52427128
>>>> Aug  4 08:23:15 octa12 kernel: [34843.535283] block drbd133: resync
>>>> bitmap: bits=6553391 words=102397 pages=200
>>>> Aug  4 08:23:15 octa12 kernel: [34843.535285] block drbd133: size = 25
>>>> GB (26213564 KB)
>>>> Aug  4 08:23:15 octa12 kernel: [34843.535345] block drbd133: Writing the
>>>> whole bitmap, size changed
>>>> Aug  4 08:23:15 octa12 kernel: [34843.537060] block drbd133: bitmap
>>>> WRITE of 200 pages took 0 jiffies
>>>> Aug  4 08:23:15 octa12 kernel: [34843.537064] block drbd133: 25 GB
>>>> (6553391 bits) marked out-of-sync by on disk bit-map.
>>>> Aug  4 08:23:15 octa12 kernel: [34843.539599] block drbd133: bitmap READ
>>>> of 200 pages took 1 jiffies
>>>> Aug  4 08:23:15 octa12 kernel: [34843.539730] block drbd133: recounting
>>>> of set bits took additional 0 jiffies
>>>> Aug  4 08:23:15 octa12 kernel: [34843.539732] block drbd133: 25 GB
>>>> (6553391 bits) marked out-of-sync by on disk bit-map.
>>>> Aug  4 08:23:15 octa12 kernel: [34843.539741] block drbd133: Suspended
>>>> AL updates
>>>> Aug  4 08:23:15 octa12 kernel: [34843.539744] block drbd133: disk(
>>>> Attaching -> Inconsistent )
>>>> Aug  4 08:23:15 octa12 kernel: [34843.539746] block drbd133: attached to
>>>> UUIDs
>>>> 0000000000000004:0000000000000000:0000000000000000:0000000000000000
>>>> Aug  4 08:23:15 octa12 kernel: [34843.541494] drbd k126nfs1: conn(
>>>> StandAlone -> Unconnected )
>>>> Aug  4 08:23:15 octa12 kernel: [34843.541509] drbd k126nfs1: Starting
>>>> receiver thread (from drbd_w_k126nfs1 [7572])
>>>> Aug  4 08:23:15 octa12 kernel: [34843.543448] drbd k126nfs1: receiver
>>>> (re)started
>>>> Aug  4 08:23:15 octa12 kernel: [34843.543459] drbd k126nfs1: conn(
>>>> Unconnected -> WFConnection )
>>>> Aug  4 08:23:19 octa12 kernel: [34847.542570] drbd k126nfs1: Handshake
>>>> successful: Agreed network protocol version 101
>>>> Aug  4 08:23:19 octa12 kernel: [34847.542572] drbd k126nfs1: Agreed to
>>>> support TRIM on protocol level
>>>> Aug  4 08:23:19 octa12 kernel: [34847.542601] drbd k126nfs1: conn(
>>>> WFConnection -> WFReportParams )
>>>> Aug  4 08:23:19 octa12 kernel: [34847.542603] drbd k126nfs1: Starting
>>>> asender thread (from drbd_r_k126nfs1 [7577])
>>>> Aug  4 08:23:19 octa12 kernel: [34847.558766] block drbd133: max BIO
>>>> size = 286720
>>>> Aug  4 08:23:19 octa12 kernel: [34847.558776] block drbd133:
>>>> drbd_sync_handshake:
>>>> Aug  4 08:23:19 octa12 kernel: [34847.558779] block drbd133: self
>>>> 0000000000000004:0000000000000000:0000000000000000:0000000000000000
>>>> bits:6553391 flags:0
>>>> Aug  4 08:23:19 octa12 kernel: [34847.558782] block drbd133: peer
>>>> D523A8E0A929C7CF:D9BDE1030AAC1F1B:BFE2851AAA4046D4:BFE1851AAA4046D4
>>>> bits:44229 flags:0
>>>> Aug  4 08:23:19 octa12 kernel: [34847.558783] block drbd133:
>>>> uuid_compare()=-2 by rule 20
>>>> Aug  4 08:23:19 octa12 kernel: [34847.558785] block drbd133: Becoming
>>>> sync target due to disk states.
>>>> Aug  4 08:23:19 octa12 kernel: [34847.558786] block drbd133: Writing the
>>>> whole bitmap, full sync required after drbd_sync_handshake.
>>>> Aug  4 08:23:19 octa12 kernel: [34847.560780] block drbd133: bitmap
>>>> WRITE of 200 pages took 0 jiffies
>>>> Aug  4 08:23:19 octa12 kernel: [34847.560784] block drbd133: 25 GB
>>>> (6553391 bits) marked out-of-sync by on disk bit-map.
>>>> Aug  4 08:23:19 octa12 kernel: [34847.560840] block drbd133: peer(
>>>> Unknown -> Primary ) conn( WFReportParams -> WFBitMapT ) pdsk( DUnknown
>>>> -> UpToDate )
>>>> Aug  4 08:23:19 octa12 kernel: [34847.560844] block drbd133: Resumed AL
>>>> updates
>>>> Aug  4 08:23:19 octa12 kernel: [34847.588181] block drbd133: receive
>>>> bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23;
>>>> compression: 100.0%
>>>> Aug  4 08:23:19 octa12 kernel: [34847.588290] block drbd133: send bitmap
>>>> stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression:
>>>> 100.0%
>>>> Aug  4 08:23:19 octa12 kernel: [34847.588294] block drbd133: conn(
>>>> WFBitMapT -> WFSyncUUID )
>>>> Aug  4 08:23:19 octa12 kernel: [34847.591990] block drbd133: updated
>>>> sync uuid
>>>> D9BEE1030AAC1F1A:0000000000000000:0000000000000000:0000000000000000
>>>> Aug  4 08:23:19 octa12 kernel: [34847.592078] block drbd133: helper
>>>> command: /sbin/drbdadm before-resync-target minor-133
>>>> Aug  4 08:23:19 octa12 kernel: [34847.596515] block drbd133: helper
>>>> command: /sbin/drbdadm before-resync-target minor-133 exit code 0 (0x0)
>>>> Aug  4 08:23:19 octa12 kernel: [34847.596528] block drbd133: conn(
>>>> WFSyncUUID -> SyncTarget )
>>>> Aug  4 08:23:19 octa12 kernel: [34847.596537] block drbd133: Began
>>>> resync as SyncTarget (will sync 26213564 KB [6553391 bits set]).
>>>> Aug  4 08:23:23 octa12 kernel: [34849.338284] PGD 1814067 PUD 281c11067
>>>> PMD 281a41067 PTE 8010000079ee3067
>>>> Aug  4 08:23:23 octa12 kernel: [34849.338312] Oops: 0011 [#1] SMP
>>>> Aug  4 08:23:23 octa12 kernel: [34849.338328] Modules linked in:
>>>> xt_comment xt_tcpudp ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
>>>> nf_nat_ipv6 ip6table_filter ip6_tables nf_nat_ftp xt_REDIRECT
>>>> xt_conntrack iptable_mangle nf_conntrack_ftp ipt_REJECT xt_LOG xt_limit
>>>> iptable_filter xt_multiport iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
>>>> nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables xen_gntdev xen_evtchn
>>>> xenfs xen_privcmd nfsd auth_rpcgss oid_registry nfs_acl nfs lockd
>>>> fscache sunrpc bridge joydev hid_generic iTCO_wdt iTCO_vendor_support
>>>> evdev x86_pkg_temp_thermal intel_powerclamp coretemp crc32_pclmul
>>>> ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper
>>>> ablk_helper cryptd psmouse serio_raw pcspkr sb_edac edac_core usbhid hid
>>>> i2c_i801 ttm drm_kms_helper drm mei_me mei lpc_ich mfd_core ioatdma
>>>> shpchp tpm_tis wmi tpm ipmi_si ipmi_msghandler processor thermal_sys
>>>> button 8021q garp stp mrp llc drbd lru_cache libcrc32c autofs4 ext4
>>>> crc16 mbcache jbd2 btrfs xor raid6_pq dm_mod sg sd_mod crc_t10dif
>>>> crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel isci
>>>> ahci libahci libsas ehci_pci ehci_hcd megaraid_sas usbcore usb_common
>>>> libata scsi_transport_sas scsi_mod igb i2c_algo_bit i2c_core dca ptp
>>>> pps_core
>>>> Aug  4 08:23:23 octa12 kernel: [34849.338919] CPU: 1 PID: 7584 Comm:
>>>> drbd_a_k126nfs1 Not tainted 3.16.0-4-amd64 #1 Debian
>>>> 3.16.7-ckt11-1+deb8u2
>>>> Aug  4 08:23:23 octa12 kernel: [34849.338970] Hardware name:
>>>> Thomas-Krenn.AG X9DRW-3LN4F+/X9DRW-3TF+/X9DRW-3LN4F+/X9DRW-3TF+, BIOS
>>>> 3.2 01/15/2015
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339021] task: ffff8801ff59f570 ti:
>>>> ffff880079ee0000 task.ti: ffff880079ee0000
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339067] RIP:
>>>> e030:[<ffff880079ee3d88>]  [<ffff880079ee3d88>] 0xffff880079ee3d88
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339116] RSP:
>>>> e02b:ffff880079ee3d90  EFLAGS: 00010212
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339144] RAX: 00000000fffffffc RBX:
>>>> ffffffffffffffff RCX: 0000000000001ab3
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339174] RDX: 0000000000001ab3 RSI:
>>>> 00000000fffffe01 RDI: ffffffff81463f75
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339205] RBP: ffff8801ff59f570 R08:
>>>> ffff880079ee0000 R09: 0000000000000000
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339236] R10: ffff8802001f4810 R11:
>>>> 0000000000000000 R12: 0000000000000001
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339267] R13: 0000000000000000 R14:
>>>> 0000000000000010 R15: ffff8801ff729800
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339302] FS: 0000000000000000(0000)
>>>> GS:ffff880274640000(0000) knlGS:0000000000000000
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339349] CS:  e033 DS: 0000 ES:
>>>> 0000 CR0: 0000000080050033
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339378] CR2: ffff880079ee3d88 CR3:
>>>> 00000001fe428000 CR4: 0000000000042660
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339409] Stack:
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339431]  ffff880079ee3d88
>>>> 0000000000000010 0000000000000000 0000000000000000
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339487]  ffff880079ee3d90
>>>> 0000000000000001 0000000000000000 0000000000000000
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339543]  0000000000004100
>>>> ffffffffa039a7be ffff8801ff729880 0000001000000000
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339600] Call Trace:
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339632] [<ffffffffa039a7be>] ?
>>>> drbd_asender+0x27e/0x750 [drbd]
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339667] [<ffffffffa03a3d00>] ?
>>>> drbd_destroy_connection+0xc0/0xc0 [drbd]
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339703] [<ffffffffa03a3d46>] ?
>>>> drbd_thread_setup+0x46/0x130 [drbd]
>>>> Aug  4 08:23:23 octa12 kernel: [34849.339737] [<ffffffffa03a3d00>] ?
>>>> drbd_destroy_connection+0xc0/0xc0 [drbd]
>>>>
>>>> _______________________________________________
>>>> drbd-user mailing list
>>>> drbd-user at lists.linbit.com
>>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>




More information about the drbd-user mailing list