[DRBD-user] DRBD 8.0.13 SyncTarget crashing with alloc_ee: Allocation of a page failed

Peter Luciak Peter.Luciak at iblsoft.com
Tue Feb 3 09:47:37 CET 2009

Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.


Hello all,

I'm experiencing weird crashes with drbd 8.0.13 when trying to 
resynchronize the secondary node. The secondary crashes (without any 
oops-es or other information in /var/log/messages) after some random 
period of resynchronization (around 20-30%).

On the primary there is a 2.6.15.6 kernel and on the secondary I tried 
upgrading to 2.6.26.8. Now the resync went OK, but when I tested it 
again, it crashed again. This is a 64b kernel and the machine has 
Adaptec AIC7902 Ultra320 SCSI adapter with 4 disks in software RAID1 
configuration. Interestingly, this problem started to appear when we 
replaced one disk in the RAID array.

Another drbd-user thread which I had found suggests that this could be 
related to Supermicro motherboards. Indeed, there is  SuperMicro X6DA8 
G2 i7525 on the primary, but TYAN Thunder i7525 on the secondary (ie. 
the one which crashes). I've tried to load default settings on the Tyan 
board, but to no avail.

Unfortunately, I don't have access to the servers physically, so I'm 
trying to come up with a software solution (if possible :) Could
an upgrade to drbd 8.3.x help in this case?

Thanks for replies/ideas
Peter

Linux vwsrv2 2.6.26.8 #1 SMP Tue Jan 27 20:57:52 GST 2009 x86_64 x86_64 
x86_64 GNU/Linux

version: 8.0.13 (api:86/proto:86)
GIT-hash: ee3ad77563d2e87171a3da17cc002ddfd1677dbe

Logs from primary:
Feb  3 10:45:14 vwsrv1 kernel: drbd0: Began resync as SyncSource (will 
sync 4 KB [1 bits set]).
Feb  3 10:45:14 vwsrv1 kernel: drbd0: Writing meta data super block now.
Feb  3 10:45:14 vwsrv1 kernel: drbd1: conn( WFBitMapS -> PausedSyncS ) 
pdsk( UpToDate -> Inconsistent )
Feb  3 10:45:14 vwsrv1 kernel: drbd1: Began resync as PausedSyncS (will 
sync 2024832 KB [506208 bits set]).
Feb  3 10:45:14 vwsrv1 kernel: drbd1: Writing meta data super block now.
Feb  3 10:45:14 vwsrv1 kernel: drbd2: conn( WFBitMapS -> SyncSource )
Feb  3 10:45:14 vwsrv1 kernel: drbd2: Began resync as SyncSource (will 
sync 122896880 KB [30724220 bits set]).
Feb  3 10:45:14 vwsrv1 kernel: drbd2: Writing meta data super block now.
Feb  3 10:45:14 vwsrv1 kernel: drbd1: pdsk( Inconsistent -> UpToDate ) 
peer_isp( 0 -> 1 )
Feb  3 10:45:14 vwsrv1 kernel: drbd1: Writing meta data super block now.
Feb  3 10:45:14 vwsrv1 kernel: drbd1: pdsk( UpToDate -> Inconsistent ) 
peer_isp( 1 -> 0 )
Feb  3 10:45:14 vwsrv1 kernel: drbd1: Writing meta data super block now.
Feb  3 10:45:14 vwsrv1 kernel: drbd0: Resync done (total 1 sec; paused 0 
sec; 4 K/sec)
Feb  3 10:45:14 vwsrv1 kernel: drbd0: conn( SyncSource -> Connected ) 
pdsk( Inconsistent -> UpToDate )
Feb  3 10:45:14 vwsrv1 kernel: drbd1: conn( PausedSyncS -> SyncSource ) 
aftr_isp( 1 -> 0 )
Feb  3 10:45:15 vwsrv1 kernel: drbd1: Syncer continues.
Feb  3 10:45:15 vwsrv1 kernel: drbd0: Writing meta data super block now.
Feb  3 10:45:33 vwsrv1 kernel: drbd0: PingAck did not arrive in time.
Feb  3 10:45:33 vwsrv1 kernel: drbd0: peer( Secondary -> Unknown ) conn( 
Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Feb  3 10:45:33 vwsrv1 kernel: drbd0: asender terminated
Feb  3 10:45:33 vwsrv1 kernel: drbd0: Terminating asender thread
Feb  3 10:45:33 vwsrv1 kernel: drbd0: short read expecting header on 
sock: r=-512
Feb  3 10:45:33 vwsrv1 kernel: drbd0: Creating new current UUID
Feb  3 10:45:33 vwsrv1 kernel: drbd0: Writing meta data super block now.
Feb  3 10:45:33 vwsrv1 kernel: drbd0: tl_clear()
Feb  3 10:45:33 vwsrv1 kernel: drbd0: Connection closed
Feb  3 10:45:33 vwsrv1 kernel: drbd0: conn( NetworkFailure -> Unconnected )
Feb  3 10:45:33 vwsrv1 kernel: drbd0: receiver terminated
Feb  3 10:45:33 vwsrv1 kernel: drbd0: Restarting receiver thread
Feb  3 10:45:33 vwsrv1 kernel: drbd0: receiver (re)started
Feb  3 10:45:33 vwsrv1 kernel: drbd0: conn( Unconnected -> WFConnection )
Feb  3 10:45:41 vwsrv1 kernel: drbd2: PingAck did not arrive in time.
Feb  3 10:45:41 vwsrv1 kernel: drbd2: peer( Secondary -> Unknown ) conn( 
SyncSource -> NetworkFailure )
Feb  3 10:45:41 vwsrv1 kernel: drbd2: asender terminated
Feb  3 10:45:41 vwsrv1 kernel: drbd2: Terminating asender thread
Feb  3 10:45:41 vwsrv1 kernel: drbd1: PingAck did not arrive in time.
Feb  3 10:45:41 vwsrv1 kernel: drbd1: peer( Secondary -> Unknown ) conn( 
SyncSource -> NetworkFailure )
Feb  3 10:45:41 vwsrv1 kernel: drbd1: asender terminated
Feb  3 10:45:41 vwsrv1 kernel: drbd1: Terminating asender thread
Feb  3 10:45:41 vwsrv1 kernel: drbd2: drbd_pp_alloc interrupted!
Feb  3 10:45:41 vwsrv1 kernel: drbd2: alloc_ee: Allocation of a page failed
Feb  3 10:45:41 vwsrv1 kernel: drbd2: error receiving RSDataRequest, l: 24!
Feb  3 10:45:41 vwsrv1 kernel: drbd1: drbd_pp_alloc interrupted!
Feb  3 10:45:41 vwsrv1 kernel: drbd1: alloc_ee: Allocation of a page failed
Feb  3 10:45:41 vwsrv1 kernel: drbd1: error receiving RSDataRequest, l: 24!
Feb  3 10:45:43 vwsrv1 kernel: drbd1: drbd_send_block() failed
Feb  3 10:45:43 vwsrv1 kernel: drbd1: Writing meta data super block now.
Feb  3 10:45:43 vwsrv1 kernel: drbd2: drbd_send_block() failed
Feb  3 10:45:43 vwsrv1 kernel: drbd2: Writing meta data super block now.
Feb  3 10:45:43 vwsrv1 kernel: drbd1: tl_clear()
Feb  3 10:45:43 vwsrv1 kernel: drbd1: Connection closed

For completeness, logs from secondary:
Feb  3 10:45:28 vwsrv2 kernel: Total HugeTLB memory allocated, 0
Feb  3 10:45:28 vwsrv2 kernel: VFS: Disk quotas dquot_6.5.1
Feb  3 10:45:28 vwsrv2 kernel: Dquot-cache hash table entries: 512 
(order 0, 4096 bytes)
Feb  3 10:45:29 vwsrv2 kernel: msgmni has been set to 15985
Feb  3 10:45:29 vwsrv2 kernel: io scheduler noop registered
Feb  3 10:50:07 vwsrv2 syslogd 1.4.1: restart.


-- 
Peter LUCIAK (Peter.Luciak at iblsoft.com)
IBL Software Engineering, http://www.iblsoft.com/
Mierová 103, 82105 Bratislava, Slovakia
Phone: +421-2-32662111, Fax: +421-2-32662110
Direct: +421-2-32662175



More information about the drbd-user mailing list