[DRBD-user] 0.7.6 and sync after invalidate

Eugene Crosser crosser at rol.ru
Tue Nov 23 18:22:47 CET 2004


Philipp Reisner wrote:
> On Tuesday 23 November 2004 13:40, Eugene Crosser wrote:
> 
>>Philipp and guys,
>>
>>I seem to hit the same problem today that I already reported long ago
>>and that was apparently fixed long ago.  I am running kernel.org kernel
>>2.6.10-rc2 and drbd branch/drbd-0.7 checked out this morning, which
>>reports itself as 0.7.6.  I was testing my system reaction to pulling
>>out a disk; it did all right, drbd noticed underlying device failure and
>>dutyfully panicked.  After reconnecting the disk (to hardware RAID0), I
>>got the system up and ran "drdbdadm invalidate all" on the system with
>>would-be-replaced disk.  In half an hour, SyncTarget reported sync
>>complete, but the SyncSource did not:
>>
>>Nov 23 14:00:58 nfsb2.mail.back kernel: drbd0: 214540288 KB now marked
>>out-of-sync by on disk bit-map.
>>Nov 23 14:00:58 nfsb2.mail.back kernel: drbd0: drbd0_receiver [155]:
>>cstate Connected --> SyncSource
>>Nov 23 14:00:58 nfsb2.mail.back kernel: drbd0: Resync started as
>>SyncSource (need to sync 214540288 KB [53635072 bits set]).
>>Nov 23 14:02:43 nfsb1.mail.back ntpd[133]: time set -0.032269 s
>>Nov 23 14:16:10 nfsb2.mail.back -- MARK --
>>Nov 23 14:18:05 nfsb1.mail.back ntpd[133]: time reset -0.145189 s
>>Nov 23 14:36:10 nfsb2.mail.back -- MARK --
>>Nov 23 14:38:45 nfsb1.mail.back -- MARK --
>>Nov 23 14:41:42 nfsb1.mail.back kernel: drbd0: Resync done (total 2445
>>sec; paused 0 sec; 87744 K/sec)
>>Nov 23 14:41:42 nfsb1.mail.back kernel: drbd0: drbd0_worker [152]:
>>cstate SyncTarget --> Connected
> 
> The start sync from nfsb1 line is missing, could you post that please
> as well ? 

Oops, sorry.  Here is what happend after "invalidate":

Nov 23 14:00:57 nfsb1.mail.back kernel: drbd0: drbdsetup [200]: cstate 
Connected --> WFBitMapT
Nov 23 14:00:58 nfsb1.mail.back kernel: drbd0: 214540288 KB now marked 
out-of-sync by on disk bit-map.
Nov 23 14:00:58 nfsb1.mail.back kernel: drbd0: drbdsetup [200]: cstate 
WFBitMapT --> SyncTarget
Nov 23 14:00:58 nfsb1.mail.back kernel: drbd0: Resync started as 
SyncTarget (need to sync 214540288 KB [53635072 bits set]).
Nov 23 14:00:58 nfsb2.mail.back kernel: drbd0: 214540288 KB now marked 
out-of-sync by on disk bit-map.
Nov 23 14:00:58 nfsb2.mail.back kernel: drbd0: drbd0_receiver [155]: 
cstate Connected --> SyncSource
Nov 23 14:00:58 nfsb2.mail.back kernel: drbd0: Resync started as 
SyncSource (need to sync 214540288 KB [53635072 bits set]).

>>Now, this is /proc/drbd on both notes:
>>
>>root at hanode1:~# cat /proc/drbd
>>version: 0.7.6 (api:77/proto:74)
>>SVN Revision: 1649 build by crosser at ariel.sovam.com, 2004-11-23 11:15:51
>>  0: cs:Connected st:Secondary/Primary ld:Consistent
>>     ns:0 nr:234193156 dw:234193156 dr:0 al:0 bm:27472 lo:0 pe:0 ua:0 ap:0
>>
>>root at hanode2:~# cat /proc/drbd
>>version: 0.7.6 (api:77/proto:74)
>>SVN Revision: 1649 build by crosser at ariel.sovam.com, 2004-11-23 11:15:51
>>  0: cs:SyncSource st:Primary/Secondary ld:Consistent
>>     ns:234169036 nr:17584552 dw:38525788 dr:219993817 al:8449 bm:30241
>>lo:0 pe:0 ua:0 ap:0
>>         [===================>] sync'ed:100.0% (1/209512)M
>>         finish: 0:00:00 speed: 120 (44,464) K/sec
>>
> 
> 
> Hmmm, this is not good....

I seem to recall that when you dealt with it last time, you said that it 
is a problam that cannot easily fix, but you did a change that will make 
it very unlikely.

>>Also interesting thing, on the SyncSource note (hanode2) `uptime'
>>reports unreasonable loadaverage:
>>
>>root at hanode2:~# uptime
>>  15:34:27  up  3:18,  1 user,  load average: 407.18, 593.69, 871.42
>>
>>which is simply impossible given that there are only 119 processes...
>>Running "drbdadm disconnect all" and "drbdadm connect all" apparently
>>put things in order.  Maybe.
>>
> 
> 
> Hmmm... what the hell is happening here. Your ntp daemon set the time 
> during resync .... hmmm

I don't think ntp has anything to do with this, for some reason it goes 
out of sync now and then, then stabilizes.  Also, abnormal load average 
may be an artefact of 2.6.0-pr2 or somesuch: I often see it unrelated to 
synchronization (probably when the node is standalone as well)

Eugene
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : http://lists.linbit.com/pipermail/drbd-user/attachments/20041123/01e69df9/attachment.pgp 


More information about the drbd-user mailing list