[DRBD-user] resource stays in WFBitMapS/WFBitMapT state

Tomas Lavicky tomas.lavicky at racom.eu
Thu Mar 5 18:10:15 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello,
I run two node HP Proliant DL140 / Ubuntu 8.04 LTS / Heartbeat 2.1.3-2 cluster 
with bunch of KVM virtual servers. Images for virtual servers are stored in 
two DRBD 8.0.11 / OCFS2 1.3.9 dual-primary mode resources.
Yesterday one node hung up and other one was killed probably via watchdog. 
After boot one resource succesfly went to " 0: cs:Connected 
st:Primary/Primary ds:UpToDate/UpToDate C r---" but other one remains in 
Partial synchronization state:

" 1: cs:WFBitMapS st:Secondary/Secondary ds:UpToDate/Outdated C r-p-"

" 1: cs:WFBitMapT st:Secondary/Secondary ds:Outdated/UpToDate C r---"

Neither source nor target change to SyncSource or SyncTarget as expected.

Strange messages appear in /var/log/kern.log on target:

root at rrhb-gandalfb:~# grep drbd /var/log/kern.log|head -n 50
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.710796] drbd: initialised. 
Version: 8.0.11 (api:86/proto:86)
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.710804] drbd: GIT-hash: 
b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by phil at mescal, 2008-02-12 
11:56:43
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.710807] drbd: registered as block 
device major 147
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.710808] drbd: minor_table @ 
0xffff810222551900
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.732238] drbd0: disk( Diskless -> 
Attaching )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.732245] drbd0: Starting worker 
thread (from cqueue/3 [5075])
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.752807] drbd0: Found 6 
transactions (324 active extents) in activity log.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.752813] drbd0: max_segment_size ( 
= BIO size ) = 32768
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.752818] drbd0: drbd_bm_resize 
called with capacity == 97656016
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.753340] drbd0: resync bitmap: 
bits=12207002 words=190735
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.753346] drbd0: size = 46 GB 
(48828008 KB)
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.783630] drbd0: reading of bitmap 
took 3 jiffies
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.784968] drbd0: recounting of set 
bits took additional 0 jiffies
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.784973] drbd0: 0 KB (0 bits) 
marked out-of-sync by on disk bit-map.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.785036] drbd0: Marked additional 
1028 MB as out-of-sync based on AL.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.915641] drbd0: disk( Attaching -> 
Consistent )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.915649] drbd0: Writing meta data 
super block now.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.937355] drbd1: disk( Diskless -> 
Attaching )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.937365] drbd1: Starting worker 
thread (from cqueue/0 [5072])
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.963938] drbd1: Found 6 
transactions (154 active extents) in activity log.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.963946] drbd1: max_segment_size ( 
= BIO size ) = 32768
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.963950] drbd1: drbd_bm_resize 
called with capacity == 195312160
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.965077] drbd1: resync bitmap: 
bits=24414020 words=381470
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.965087] drbd1: size = 93 GB 
(97656080 KB)
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.005957] drbd1: reading of bitmap 
took 4 jiffies
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.008637] drbd1: recounting of set 
bits took additional 0 jiffies
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.008644] drbd1: 0 KB (0 bits) 
marked out-of-sync by on disk bit-map.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.008695] drbd1: Marked additional 
584 MB as out-of-sync based on AL.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.095467] drbd1: disk( Attaching -> 
Consistent )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.095477] drbd1: Writing meta data 
super block now.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.223170] drbd0: conn( 
StandAlone -> Unconnected )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.223193] drbd0: Starting receiver 
thread (from drbd0_worker [5086])
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.223218] drbd0: receiver 
(re)started
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.223222] drbd0: conn( 
Unconnected -> WFConnection )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.224981] drbd1: conn( 
StandAlone -> Unconnected )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.225003] drbd1: Starting receiver 
thread (from drbd1_worker [5094])
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.225029] drbd1: receiver 
(re)started
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.225034] drbd1: conn( 
Unconnected -> WFConnection )
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.574743] drbd1: Handshake 
successful: DRBD Network Protocol version 86
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.574749] drbd0: Handshake 
successful: DRBD Network Protocol version 86
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575131] drbd0: Peer authenticated 
using 20 bytes of 'sha1' HMAC
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575142] drbd0: conn( 
WFConnection -> WFReportParams )
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575148] drbd0: Starting asender 
thread (from drbd0_receiver [5113])
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575174] drbd1: Peer authenticated 
using 20 bytes of 'sha1' HMAC
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575184] drbd1: conn( 
WFConnection -> WFReportParams )
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575190] drbd1: Starting asender 
thread (from drbd1_receiver [5116])
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.646147] drbd0: peer( Unknown -> 
Secondary ) conn( WFReportParams -> WFBitMapS ) disk( Consistent -> 
UpToDate ) pdsk( DUnknown -> Outdated )
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.646160] drbd0: Writing meta data 
super block now.
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.647489] drbd1: peer( Unknown -> 
Secondary ) conn( WFReportParams -> WFBitMapT ) disk( Consistent -> 
Outdated ) pdsk( DUnknown -> UpToDate )
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.647501] drbd1: Writing meta data 
super block now.
root at rrhb-gandalfb:~/tl# grep drbd /var/log/kern.log|head -n 60
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.710796] drbd: initialised. 
Version: 8.0.11 (api:86/proto:86)
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.710804] drbd: GIT-hash: 
b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by phil at mescal, 2008-02-12 
11:56:43
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.710807] drbd: registered as block 
device major 147
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.710808] drbd: minor_table @ 
0xffff810222551900
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.732238] drbd0: disk( Diskless -> 
Attaching )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.732245] drbd0: Starting worker 
thread (from cqueue/3 [5075])
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.752807] drbd0: Found 6 
transactions (324 active extents) in activity log.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.752813] drbd0: max_segment_size ( 
= BIO size ) = 32768
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.752818] drbd0: drbd_bm_resize 
called with capacity == 97656016
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.753340] drbd0: resync bitmap: 
bits=12207002 words=190735
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.753346] drbd0: size = 46 GB 
(48828008 KB)
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.783630] drbd0: reading of bitmap 
took 3 jiffies
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.784968] drbd0: recounting of set 
bits took additional 0 jiffies
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.784973] drbd0: 0 KB (0 bits) 
marked out-of-sync by on disk bit-map.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.785036] drbd0: Marked additional 
1028 MB as out-of-sync based on AL.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.915641] drbd0: disk( Attaching -> 
Consistent )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.915649] drbd0: Writing meta data 
super block now.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.937355] drbd1: disk( Diskless -> 
Attaching )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.937365] drbd1: Starting worker 
thread (from cqueue/0 [5072])
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.963938] drbd1: Found 6 
transactions (154 active extents) in activity log.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.963946] drbd1: max_segment_size ( 
= BIO size ) = 32768
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.963950] drbd1: drbd_bm_resize 
called with capacity == 195312160
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.965077] drbd1: resync bitmap: 
bits=24414020 words=381470
Mar  4 21:55:16 rrhb-gandalfb kernel: [  111.965087] drbd1: size = 93 GB 
(97656080 KB)
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.005957] drbd1: reading of bitmap 
took 4 jiffies
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.008637] drbd1: recounting of set 
bits took additional 0 jiffies
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.008644] drbd1: 0 KB (0 bits) 
marked out-of-sync by on disk bit-map.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.008695] drbd1: Marked additional 
584 MB as out-of-sync based on AL.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.095467] drbd1: disk( Attaching -> 
Consistent )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.095477] drbd1: Writing meta data 
super block now.
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.223170] drbd0: conn( 
StandAlone -> Unconnected )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.223193] drbd0: Starting receiver 
thread (from drbd0_worker [5086])
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.223218] drbd0: receiver 
(re)started
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.223222] drbd0: conn( 
Unconnected -> WFConnection )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.224981] drbd1: conn( 
StandAlone -> Unconnected )
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.225003] drbd1: Starting receiver 
thread (from drbd1_worker [5094])
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.225029] drbd1: receiver 
(re)started
Mar  4 21:55:16 rrhb-gandalfb kernel: [  112.225034] drbd1: conn( 
Unconnected -> WFConnection )
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.574743] drbd1: Handshake 
successful: DRBD Network Protocol version 86
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.574749] drbd0: Handshake 
successful: DRBD Network Protocol version 86
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575131] drbd0: Peer authenticated 
using 20 bytes of 'sha1' HMAC
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575142] drbd0: conn( 
WFConnection -> WFReportParams )
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575148] drbd0: Starting asender 
thread (from drbd0_receiver [5113])
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575174] drbd1: Peer authenticated 
using 20 bytes of 'sha1' HMAC
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575184] drbd1: conn( 
WFConnection -> WFReportParams )
Mar  4 21:56:49 rrhb-gandalfb kernel: [  205.575190] drbd1: Starting asender 
thread (from drbd1_receiver [5116])
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.646147] drbd0: peer( Unknown -> 
Secondary ) conn( WFReportParams -> WFBitMapS ) disk( Consistent -> 
UpToDate ) pdsk( DUnknown -> Outdated )
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.646160] drbd0: Writing meta data 
super block now.
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.647489] drbd1: peer( Unknown -> 
Secondary ) conn( WFReportParams -> WFBitMapT ) disk( Consistent -> 
Outdated ) pdsk( DUnknown -> UpToDate )
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.647501] drbd1: Writing meta data 
super block now.
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.718045] drbd0: peer( Secondary -> 
Primary )
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.756389] drbd0: conn( WFBitMapS -> 
SyncSource ) pdsk( Outdated -> Inconsistent )
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.756407] drbd1: aftr_isp( 0 -> 1 )
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.756412] drbd0: Began resync as 
SyncSource (will sync 2088960 KB [522240 bits set]).
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.756415] drbd0: Writing meta data 
super block now.
Mar  4 21:56:50 rrhb-gandalfb kernel: [  205.788553] drbd0: role( Secondary -> 
Primary )
Mar  4 21:57:08 rrhb-gandalfb kernel: [  224.007713] drbd1: 
[drbd1_receiver/5116] sock_sendmsg time expired, ko = 4294967295
Mar  4 21:57:14 rrhb-gandalfb kernel: [  230.000650] drbd1: 
[drbd1_receiver/5116] sock_sendmsg time expired, ko = 4294967294
Mar  4 21:57:20 rrhb-gandalfb kernel: [  235.995426] drbd1: 
[drbd1_receiver/5116] sock_sendmsg time expired, ko = 4294967293
Mar  4 21:57:26 rrhb-gandalfb kernel: [  241.989622] drbd1: 
[drbd1_receiver/5116] sock_sendmsg time expired, ko = 4294967292

I don't understand why ko starts with so high number. I suppose default 
ko-count 4 ???

I tried to move source to be primary but it failed:

root at rrhb-gandalfs:~# drbdadm primary r1
No response from the DRBD driver! Is the module loaded?

Module is loaded of course:

root at rrhb-gandalfs:~/tl# /etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.0.11 (api:86/proto:86)
GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by phil at mescal, 
2008-02-12 11:56:43
m:res  cs         st                   ds                 p  mounted  fstype
0:r0   Connected  Primary/Primary      UpToDate/UpToDate  C  /drbd0   ocfs2
1:r1   WFBitMapS  Secondary/Secondary  UpToDate/Outdated  C

Neither data discarding on target works:

root at rrhb-gandalfb:~# drbdadm -- --discard-my-data connect r1
Child process does not terminate!
Exiting.
No response from the DRBD driver! Is the module loaded?

How can I stop drbd1_receiver? 
I solved similar situation via reboots few months ago but I need to keep 
virtual servers running from drbd0 up now.
Any idea, please?
Thanks in advice
Tomas




More information about the drbd-user mailing list