[DRBD-user] problem with connection dropping

Tom Brown brown at esteem.com
Fri Aug 3 17:54:55 CEST 2007


I have a problem with network cards failing for resource 0 (r0). I thought it 
was the cheap network cards in both nodes. So, I replaced them with Intel 
Pro/1000 Gb cards. The connection worked at first and the sync finished 
without a problem. Then, after a few days, the connection went back to 
Primary/Unknown. I can't get ping through on that interface either. When I 
replaced the network cards I moved things around so the network cards for r0 
were in a different pci slot. Any ideas on what may be going on here? Is this 
hardware issue? If so, any suggestions on a pci network card to use?

Thanks,
Tom

/var/log/syslog:
Aug  2 12:35:14 zan kernel: drbd0: PingAck did not arrive in time.
Aug  2 12:35:14 zan kernel: drbd0: peer( Secondary -> Unknown ) 
conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Aug  2 12:35:14 zan kernel: drbd0: Creating new current UUID
Aug  2 12:35:14 zan kernel: drbd0: asender terminated
Aug  2 12:35:14 zan kernel: drbd0: short read expecting header on sock: r=-512
Aug  2 12:35:14 zan kernel: drbd0: tl_clear()
Aug  2 12:35:14 zan kernel: drbd0: Connection closed
Aug  2 12:35:14 zan kernel: drbd0: Writing meta data super block now.
Aug  2 12:35:14 zan kernel: drbd0: conn( NetworkFailure -> Unconnected )
Aug  2 12:35:14 zan kernel: drbd0: receiver terminated
Aug  2 12:35:14 zan kernel: drbd0: receiver (re)started
Aug  2 12:35:14 zan kernel: drbd0: conn( Unconnected -> WFConnection )


/proc/drbd:
version: 8.0rc1 (api:86/proto:85)
SVN Revision: 2644 build by tbrown at zan, 2007-01-05 08:49:02
 0: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown C r---
    ns:80300280 nr:0 dw:39778392 dr:162465666 al:20779 bm:6430 lo:0 pe:0 ua:0 
ap:0
        resync: used:0/31 hits:2980618 misses:3679 starving:0 dirty:0 
changed:3679
        act_log: used:0/257 hits:9923819 misses:22409 starving:0 dirty:1630 
changed:20779
 1: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:500 nr:0 dw:280 dr:3844 al:1 bm:2 lo:0 pe:0 ua:0 ap:0
        resync: used:0/31 hits:18 misses:2 starving:0 dirty:0 changed:2
        act_log: used:0/257 hits:69 misses:1 starving:0 dirty:0 changed:1
 2: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
    ns:18336216 nr:0 dw:18318300 dr:266646710 al:55977 bm:659 lo:0 pe:0 ua:0 
ap:0
        resync: used:0/31 hits:6935 misses:659 starving:0 dirty:0 changed:659
        act_log: used:0/257 hits:4523598 misses:65168 starving:0 dirty:9191 
changed:55977


/etc/drbd.conf:
global {
    usage-count yes;
}

common {
  syncer { rate 25M; }
}
resource r0 {
  protocol C;

  handlers {
    pri-on-incon-degr "echo O > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo O > /proc/sysrq-trigger ; halt -f";
    local-io-error "echo O > /proc/sysrq-trigger ; halt -f";
    outdate-peer "/usr/sbin/drbd-peer-outdater";   
  }

  startup {
    wfc-timeout  20;
    degr-wfc-timeout 120;
  }

  disk {
    on-io-error   detach;
  }

  net {
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    al-extents 257;
  }

  on zan {
    device     /dev/drbd0;
    disk       /dev/hdd1;
    address    192.168.1.3:7788;
    meta-disk  /dev/hdc1 [0];
  }

  on jayna {
    device     /dev/drbd0;
    disk       /dev/hdd1;
    address    192.168.1.4:7788;
    meta-disk  /dev/hdc1 [0];
	}
}

resource r1 {
  protocol C;

  handlers {
    pri-on-incon-degr "echo O > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo O > /proc/sysrq-trigger ; halt -f";
    local-io-error "echo O > /proc/sysrq-trigger ; halt -f";
    outdate-peer "/usr/sbin/drbd-peer-outdater";   
  }

  startup {
    wfc-timeout  20;
    degr-wfc-timeout 120;
  }

  disk {
    on-io-error   detach;
  }

  net {
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    after "r0";
    al-extents 257;
  }

  on zan {
    device     /dev/drbd1;
    disk       /dev/hdd2;
    address    192.168.2.3:7789;
    meta-disk  /dev/hdc2 [0];
  }

  on jayna {
    device     /dev/drbd1;
    disk       /dev/hdd2;
    address    192.168.2.4:7789;
    meta-disk  /dev/hdc2 [0];
  }
}

resource r2 {
  protocol C;

  handlers {
    pri-on-incon-degr "echo O > /proc/sysrq-trigger ; halt -f";
    pri-lost-after-sb "echo O > /proc/sysrq-trigger ; halt -f";
    local-io-error "echo O > /proc/sysrq-trigger ; halt -f";
    outdate-peer "/usr/sbin/drbd-peer-outdater";   
  }

  startup {
    wfc-timeout  20;
    degr-wfc-timeout 120;
  }

  disk {
    on-io-error   detach;
  }

  net {
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }

  syncer {
    al-extents 257;
  }

  on zan {
    device     /dev/drbd2;
    disk       /dev/hdc4;
    address    192.168.3.3:7790;
    meta-disk  /dev/hdc3 [0];
  }

  on jayna {
    device     /dev/drbd2;
    disk       /dev/hdc4;
    address    192.168.3.4:7790;
    meta-disk  /dev/hdc3 [0];
  }
}





More information about the drbd-user mailing list