[DRBD-user] DRBD not syncing with new secondary

Christian Koschmieder ck at peira-kollektiv.de
Tue Aug 26 18:01:59 CEST 2014

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello!

I found the problem.
Even though i created the partitions underlying the DRBD with exactly 
the same size as on the old server and the still running primary it 
seems to have been too small.
Maybe the newer version of drbd uses more space for the meta data?

But what made it difficult to find was that although nearly all log 
messages went to /var/log/messages, the vital message only went to the 
syslog:

Aug 24 21:59:14 www1 kernel: [37868732.971832] block drbd1: conn( 
StandAlone -> Unconnected )
Aug 24 21:59:14 www1 kernel: [37868732.971887] block drbd1: Starting 
receiver thread (from drbd1_worker [1733])
Aug 24 21:59:14 www1 kernel: [37868732.972202] block drbd1: receiver 
(re)started
Aug 24 21:59:14 www1 kernel: [37868732.972222] block drbd1: conn( 
Unconnected -> WFConnection )
Aug 24 21:59:15 www1 kernel: [37868733.471248] block drbd1: Handshake 
successful: Agreed network protocol version 91
Aug 24 21:59:15 www1 kernel: [37868733.471284] block drbd1: conn( 
WFConnection -> WFReportParams )
Aug 24 21:59:15 www1 kernel: [37868733.471344] block drbd1: Starting 
asender thread (from drbd1_receiver [22671])
Aug 24 21:59:15 www1 kernel: [37868733.471571] block drbd1: 
data-integrity-alg: <not-used>
Aug 24 21:59:15 www1 kernel: [37868733.471602] block drbd1: The peer's 
disk size is too small!
Aug 24 21:59:15 www1 kernel: [37868733.471623] block drbd1: conn( 
WFReportParams -> Disconnecting )
Aug 24 21:59:15 www1 kernel: [37868733.471645] block drbd1: error 
receiving ReportSizes, l: 32!
Aug 24 21:59:15 www1 kernel: [37868733.471680] block drbd1: asender 
terminated
Aug 24 21:59:15 www1 kernel: [37868733.471699] block drbd1: Terminating 
drbd1_asender
Aug 24 21:59:15 www1 kernel: [37868733.471901] block drbd1: Connection 
closed
Aug 24 21:59:15 www1 kernel: [37868733.471926] block drbd1: conn( 
Disconnecting -> StandAlone )
Aug 24 21:59:15 www1 kernel: [37868733.471967] block drbd1: receiver 
terminated
Aug 24 21:59:15 www1 kernel: [37868733.471982] block drbd1: Terminating 
drbd1_receiver

After resizing the partitions everythings running smoothly.


Thanks a lot for your efforts!

Koschi

Am 25.08.2014 16:33, schrieb Christian Koschmieder:
> Hello Roland,
>
> Sorry, I didn't attach it because it does not seem to have any 
> relevant information in it. But of course, here it is:
>
> Aug 24 21:59:14 www1 kernel: [37868732.971832] block drbd1: conn( 
> StandAlone -> Unconnected ).
> Aug 24 21:59:14 www1 kernel: [37868732.971887] block drbd1: Starting 
> receiver thread (from drbd1_worker [1733])
> Aug 24 21:59:14 www1 kernel: [37868732.972202] block drbd1: receiver 
> (re)started
> Aug 24 21:59:14 www1 kernel: [37868732.972222] block drbd1: conn( 
> Unconnected -> WFConnection ).
> Aug 24 21:59:15 www1 kernel: [37868733.471248] block drbd1: Handshake 
> successful: Agreed network protocol version 91
> Aug 24 21:59:15 www1 kernel: [37868733.471284] block drbd1: conn( 
> WFConnection -> WFReportParams ).
> Aug 24 21:59:15 www1 kernel: [37868733.471344] block drbd1: Starting 
> asender thread (from drbd1_receiver [22671])
> Aug 24 21:59:15 www1 kernel: [37868733.471571] block drbd1: 
> data-integrity-alg: <not-used>
> Aug 24 21:59:15 www1 kernel: [37868733.471623] block drbd1: conn( 
> WFReportParams -> Disconnecting ).
> Aug 24 21:59:15 www1 kernel: [37868733.471680] block drbd1: asender 
> terminated
> Aug 24 21:59:15 www1 kernel: [37868733.471699] block drbd1: 
> Terminating drbd1_asender
> Aug 24 21:59:15 www1 kernel: [37868733.471901] block drbd1: Connection 
> closed
> Aug 24 21:59:15 www1 kernel: [37868733.471926] block drbd1: conn( 
> Disconnecting -> StandAlone ).
> Aug 24 21:59:15 www1 kernel: [37868733.471967] block drbd1: receiver 
> terminated
> Aug 24 21:59:15 www1 kernel: [37868733.471982] block drbd1: 
> Terminating drbd1_receiver
>
>
> Kind regards,
>
> Koschi
>
> Am 25.08.2014 um 14:23 schrieb Roland Friedwagner:
>> Hi,
>>
>> can you provide the log (from the same connection attempt) from
>> the other node (primary) also?
>>
>> regards roland
>>
>> Am Sonntag 24 August 2014 22:09:44 schrieb Christian Koschmieder:
>>> I have two servers to host a website.
>>> Only one is actively used at a time, the other one acts as hot standby.
>>> All data ist replicated via DRBD from the currentlly active server
>>> (primary) to the backup server (secondary).
>>>
>>> I recently had to set up a new secondary, because the original one had
>>> hardware problems.
>>> So i followed the instructions in the documentation
>>> (http://www.drbd.org/users-guide-8.3/s-node-failure.html#s-perm-node-failure). 
>>>
>>>
>>> The status of the primary node:
>>> version: 8.3.7 (api:88/proto:86-91)
>>> srcversion: EE47D8BF18AC166BE219757
>>>
>>>    1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----
>>>       ns:0 nr:0 dw:202926340 dr:247194962 al:2452 bm:757 lo:0 pe:0 ua:0
>>> ap:0 ep:1 wo:b oos:215272
>>>
>>> The status of the secondary node:
>>> version: 8.3.11 (api:88/proto:86-96)
>>> srcversion: F937DCB2E5D83C6CCE4A6C9
>>>
>>>    1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown 
>>> C r-----
>>>       ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f
>>> oos:242699884
>>>
>>> This seems to be all right.
>>> But when issuing a connect on the primary it immediately disconnects 
>>> again.
>>> The log on the secondary has the following entries:
>>>
>>> Aug 24 21:59:15 www2 kernel: [ 3780.076072] block drbd1: Handshake
>>> successful: Agreed network protocol version 91
>>> Aug 24 21:59:15 www2 kernel: [ 3780.076122] block drbd1: conn(
>>> WFConnection -> WFReportParams )
>>> Aug 24 21:59:15 www2 kernel: [ 3780.076180] block drbd1: Starting
>>> asender thread (from drbd1_receiver [2502])
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077178] block drbd1:
>>> data-integrity-alg: <not-used>
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077235] block drbd1:
>>> drbd_sync_handshake:
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077272] block drbd1: self
>>> 0000000000000004:0000000000000000:0000000000000000:0000000000000000
>>> bits:60674971 flags:0
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077328] block drbd1: peer
>>> E2227E948E7B07CD:4445769C1EF0ADCC:B744D0729CC042CC:5AD0061929ED5B9D
>>> bits:53813 flags:0
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077343] block drbd1: conn(
>>> WFReportParams -> NetworkFailure )
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077349] block drbd1: asender 
>>> terminated
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077351] block drbd1: Terminating
>>> drbd1_asender
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077509] block drbd1:
>>> uuid_compare()=-2 by rule 20
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077549] block drbd1: Becoming sync
>>> target due to disk states.
>>> Aug 24 21:59:15 www2 kernel: [ 3780.077586] block drbd1: Writing the
>>> whole bitmap, full sync required after drbd_sync_handshake.
>>> Aug 24 21:59:15 www2 kernel: [ 3780.162981] block drbd1: bitmap 
>>> WRITE of
>>> 1852 pages took 10 jiffies
>>> Aug 24 21:59:15 www2 kernel: [ 3780.224437] block drbd1: 231 GB
>>> (60674971 bits) marked out-of-sync by on disk bit-map.
>>> Aug 24 21:59:15 www2 kernel: [ 3780.232894] block drbd1:
>>> drbd_sync_handshake:
>>> Aug 24 21:59:15 www2 kernel: [ 3780.232932] block drbd1: self
>>> 0000000000000004:0000000000000000:0000000000000000:0000000000000000
>>> bits:60674971 flags:0
>>> Aug 24 21:59:15 www2 kernel: [ 3780.232975] block drbd1: peer
>>> E2227E948E7B07CD:4445769C1EF0ADCC:B744D0729CC042CC:5AD0061929ED5B9D
>>> bits:53813 flags:0
>>> Aug 24 21:59:15 www2 kernel: [ 3780.233017] block drbd1:
>>> uuid_compare()=-2 by rule 20
>>> Aug 24 21:59:15 www2 kernel: [ 3780.233053] block drbd1: Becoming sync
>>> target due to disk states.
>>> Aug 24 21:59:15 www2 kernel: [ 3780.233091] block drbd1: Writing the
>>> whole bitmap, full sync required after drbd_sync_handshake.
>>> Aug 24 21:59:15 www2 kernel: [ 3780.287424] block drbd1: bitmap 
>>> WRITE of
>>> 1852 pages took 10 jiffies
>>> Aug 24 21:59:15 www2 kernel: [ 3780.348835] block drbd1: 231 GB
>>> (60674971 bits) marked out-of-sync by on disk bit-map.
>>> Aug 24 21:59:15 www2 kernel: [ 3780.357295] block drbd1: peer( Unknown
>>> -> Primary ) conn( NetworkFailure -> WFBitMapT ) pdsk( DUnknown ->
>>> UpToDate )
>>> Aug 24 21:59:15 www2 kernel: [ 3780.365646] block drbd1: Connection 
>>> closed
>>> Aug 24 21:59:15 www2 kernel: [ 3780.365688] block drbd1: peer( Primary
>>> -> Unknown ) conn( WFBitMapT -> Unconnected ) pdsk( UpToDate -> 
>>> DUnknown )
>>> Aug 24 21:59:15 www2 kernel: [ 3780.365731] block drbd1: receiver 
>>> terminated
>>> Aug 24 21:59:15 www2 kernel: [ 3780.365771] block drbd1: Restarting
>>> drbd1_receiver
>>> Aug 24 21:59:15 www2 kernel: [ 3780.365808] block drbd1: receiver
>>> (re)started
>>> Aug 24 21:59:15 www2 kernel: [ 3780.365871] block drbd1: conn(
>>> Unconnected -> WFConnection )
>>> Aug 24 21:59:15 www2 kernel: [ 3780.373914] block drbd1: bitmap 
>>> WRITE of
>>> 0 pages took 0 jiffies
>>> Aug 24 21:59:15 www2 kernel: [ 3780.374072] block drbd1: 231 GB
>>> (60674971 bits) marked out-of-sync by on disk bit-map.
>>>
>>>
>>> As far as i understand it, they do have a connection, agree on a
>>> protocol, notice that secondary needs to be fully synced and then just
>>> drop the connection for no apparent reason.
>>>
>>> Can you tell me why this might be or where i can get further 
>>> information
>>> as for why the conenction is being dropped?
>>>
>>>
>>> Thanks a lot
>>>
>>> Koschi
>>> _______________________________________________
>>> drbd-user mailing list
>>> drbd-user at lists.linbit.com
>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>
>> _______________________________________________
>> drbd-user mailing list
>> drbd-user at lists.linbit.com
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>
> _______________________________________________
> drbd-user mailing list
> drbd-user at lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user




More information about the drbd-user mailing list