Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
I did some more tests and ran all commands manually, that are normally done by the init scripts. Here are the outputs: Node A=athene with DRBD 8.4.1, just booted Node B=apollon with DRBD 8.3.11, running services On node A: --------------------------- athene.lnt.ei.tum.de:~ # modprobe -s drbd: Jun 25 11:57:09 athene kernel: [ 204.358315] events: mcg drbd: 3 Jun 25 11:57:09 athene kernel: [ 204.362979] drbd: initialized. Version: 8.4.1 (api:1/proto:86-100) Jun 25 11:57:09 athene kernel: [ 204.362984] drbd: GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15 Jun 25 11:57:09 athene kernel: [ 204.362988] drbd: registered as block device major 147 athene.lnt.ei.tum.de:~ # cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15 athene.lnt.ei.tum.de:~ # drbdadm -v up r0 drbdsetup new-resource r0 drbdsetup new-minor r0 0 0 drbdmeta 0 v08 /dev/md2 internal apply-al drbdsetup attach 0 /dev/md2 /dev/md2 internal --on-io-error=detach --disk-flushes=no --md-flushes=no --disk-barrier=no --fencing=resource-only --resync-rate=25M --al-extents=3001 drbdsetup connect r0 ipv4:10.0.0.1:7788 ipv4:10.0.0.2:7788 --verify-alg=md5 --sndbuf-size=0 --max-epoch-size=16k --max-buffers=16k --protocol=C athene.lnt.ei.tum.de:~ # cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:1232 dw:1232 dr:0 al:0 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 athene.lnt.ei.tum.de:~ # drbdadm -v down r0 drbdsetup down r0 (hanging) --------------------------- On note B: (downing resource on node A at 12:02:20) --------------------------- apollon.lnt.ei.tum.de:~ # tail /var/log/ha/drbd.log Jun 25 12:00:10 apollon kernel: [1617145.079964] block drbd0: Began resync as SyncSource (will sync 800 KB [200 bits set]). Jun 25 12:00:10 apollon kernel: [1617145.079999] block drbd0: updated sync UUID 58A0EF04C836C79B:DA5047CFB82CCDB3:DA4F47CFB82CCDB3:4FA76715E8821EAD Jun 25 12:00:11 apollon kernel: [1617145.376535] block drbd0: Resync done (total 1 sec; paused 0 sec; 800 K/sec) Jun 25 12:00:11 apollon kernel: [1617145.376545] block drbd0: updated UUIDs 58A0EF04C836C79B:0000000000000000:DA5047CFB82CCDB3:DA4F47CFB82CCDB3 Jun 25 12:00:11 apollon kernel: [1617145.376557] block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate ) Jun 25 12:00:11 apollon kernel: [1617145.410354] block drbd0: bitmap WRITE of 0 pages took 0 jiffies Jun 25 12:00:11 apollon kernel: [1617145.410361] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map. Jun 25 12:02:20 apollon kernel: [1617274.609511] block drbd0: State change failed: Refusing to be Primary while peer is not outdated Jun 25 12:02:20 apollon kernel: [1617274.609522] block drbd0: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- } Jun 25 12:02:20 apollon kernel: [1617274.609528] block drbd0: wanted = { cs:TearDown ro:Primary/Unknown ds:UpToDate/DUnknown r----- } --------------------------- I also did an strace on the "drbdadm -v down r0" on node A, it's not too long: --------------------------- 4333 execve("/sbin/drbdadm", ["drbdadm", "-v", "down", "r0"], [/* 62 vars */]) = 0 4333 brk(0) = 0x635000 4333 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd43e38000 4333 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 4333 open("/etc/ld.so.cache", O_RDONLY) = 3 4333 fstat(3, {st_mode=S_IFREG|0644, st_size=180535, ...}) = 0 4333 mmap(NULL, 180535, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fbd43e0b000 4333 close(3) = 0 4333 open("/lib64/libc.so.6", O_RDONLY) = 3 4333 read(3, "<stripped>", 832) = 832 4333 fstat(3, {st_mode=S_IFREG|0755, st_size=1754140, ...}) = 0 4333 mmap(NULL, 3619016, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fbd438a7000 4333 fadvise64(3, 0, 3619016, POSIX_FADV_WILLNEED) = 0 4333 mprotect(0x7fbd43a12000, 2093056, PROT_NONE) = 0 4333 mmap(0x7fbd43c11000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16a000) = 0x7fbd43c11000 4333 mmap(0x7fbd43c16000, 18632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fbd43c16000 4333 close(3) = 0 4333 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd43e0a000 4333 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd43e09000 4333 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd43e08000 4333 arch_prctl(ARCH_SET_FS, 0x7fbd43e09700) = 0 4333 mprotect(0x7fbd43c11000, 16384, PROT_READ) = 0 4333 mprotect(0x62b000, 4096, PROT_READ) = 0 4333 mprotect(0x7fbd43e39000, 4096, PROT_READ) = 0 4333 munmap(0x7fbd43e0b000, 180535) = 0 4333 uname({sys="Linux", node="athene", ...}) = 0 4333 ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 4333 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 4333 brk(0) = 0x635000 4333 brk(0x656000) = 0x656000 4333 open("/proc/drbd", O_RDONLY) = 3 4333 read(3, "version: 8.4.1 (api:1/proto:86-100)\nGIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by phil at fat-tyre, 2011-12-20 12:43:15\n 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----\n ns:0 nr:2040 dw:2040 dr:0 al:0 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0\n", 4095) = 275 4333 close(3) = 0 4333 open("/etc/drbd-84.conf", O_RDONLY) = -1 ENOENT (No such file or directory) 4333 open("/etc/drbd-83.conf", O_RDONLY) = -1 ENOENT (No such file or directory) 4333 open("/etc/drbd-82.conf", O_RDONLY) = -1 ENOENT (No such file or directory) 4333 open("/etc/drbd-08.conf", O_RDONLY) = -1 ENOENT (No such file or directory) 4333 open("/etc/drbd.conf", O_RDONLY) = 3 4333 open(".", O_RDONLY) = 4 4333 chdir("/etc") = 0 4333 getcwd("/etc", 4096) = 5 4333 fchdir(4) = 0 4333 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff278a790) = -1 ENOTTY (Inappropriate ioctl for device) 4333 fstat(3, {st_mode=S_IFREG|0644, st_size=147, ...}) = 0 4333 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd43e37000 4333 read(3, "# YaST2 created seperated configuration file\ninclude \"/etc/drbd.d/global_common.conf\";\ninclude \"/etc/drbd.d/r0.res\";\ninclude \"/etc/drbd.d/r1.res\";\n", 8192) = 147 4333 read(3, "", 4096) = 0 4333 open(".", O_RDONLY) = 5 4333 chdir("/etc") = 0 4333 stat("/etc/drbd.d/global_common.conf", {st_mode=S_IFREG|0644, st_size=1290, ...}) = 0 4333 open("/etc/drbd.d/global_common.conf", O_RDONLY) = 6 4333 open(".", O_RDONLY) = 7 4333 chdir("/etc/drbd.d") = 0 4333 getcwd("/etc/drbd.d", 4096) = 12 4333 fchdir(7) = 0 4333 ioctl(6, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff278a700) = -1 ENOTTY (Inappropriate ioctl for device) 4333 fstat(6, {st_mode=S_IFREG|0644, st_size=1290, ...}) = 0 4333 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd43e36000 4333 read(6, "<stripped, see original post>"..., 8192) = 1290 4333 read(6, "", 4096) = 0 4333 read(6, "", 8192) = 0 4333 ioctl(6, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff278a6b0) = -1 ENOTTY (Inappropriate ioctl for device) 4333 close(6) = 0 4333 munmap(0x7fbd43e36000, 4096) = 0 4333 fchdir(5) = 0 4333 open(".", O_RDONLY) = 6 4333 chdir("/etc") = 0 4333 stat("/etc/drbd.d/r0.res", {st_mode=S_IFREG|0644, st_size=251, ...}) = 0 4333 open("/etc/drbd.d/r0.res", O_RDONLY) = 8 4333 open(".", O_RDONLY) = 9 4333 chdir("/etc/drbd.d") = 0 4333 getcwd("/etc/drbd.d", 4096) = 12 4333 fchdir(9) = 0 4333 brk(0x677000) = 0x677000 4333 ioctl(8, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff278a700) = -1 ENOTTY (Inappropriate ioctl for device) 4333 fstat(8, {st_mode=S_IFREG|0644, st_size=251, ...}) = 0 4333 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd43e36000 4333 read(8, "resource r0 {\n\ton athene {\n\t\tdevice\t\t/dev/drbd0 minor 0;\n\t\taddress\t\tipv4 10.0.0.1:7788;\n\t\tmeta-disk\tinternal;\n\t\tdisk\t\t/dev/md2;\n\t}\n\ton apollon {\n\t\tdevice\t\t/dev/drbd0 minor 0;\n\t\taddress\t\tipv4 10.0.0.2:7788;\n\t\tmeta-disk\tinternal;\n\t\tdisk\t\t/dev/md2;\n\t}\n}\n", 8192) = 251 4333 read(8, "", 4096) = 0 4333 read(8, "", 8192) = 0 4333 ioctl(8, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff278a6b0) = -1 ENOTTY (Inappropriate ioctl for device) 4333 brk(0x673000) = 0x673000 4333 close(8) = 0 4333 munmap(0x7fbd43e36000, 4096) = 0 4333 fchdir(6) = 0 4333 open(".", O_RDONLY) = 8 4333 chdir("/etc") = 0 4333 stat("/etc/drbd.d/r1.res", {st_mode=S_IFREG|0644, st_size=251, ...}) = 0 4333 open("/etc/drbd.d/r1.res", O_RDONLY) = 10 4333 open(".", O_RDONLY) = 11 4333 chdir("/etc/drbd.d") = 0 4333 getcwd("/etc/drbd.d", 4096) = 12 4333 fchdir(11) = 0 4333 ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff278a700) = -1 ENOTTY (Inappropriate ioctl for device) 4333 fstat(10, {st_mode=S_IFREG|0644, st_size=251, ...}) = 0 4333 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd43e36000 4333 read(10, "resource r1 {\n\ton athene {\n\t\tdevice\t\t/dev/drbd1 minor 1;\n\t\taddress\t\tipv4 10.0.0.1:7789;\n\t\tmeta-disk\tinternal;\n\t\tdisk\t\t/dev/md3;\n\t}\n\ton apollon {\n\t\tdevice\t\t/dev/drbd1 minor 1;\n\t\taddress\t\tipv4 10.0.0.2:7789;\n\t\tmeta-disk\tinternal;\n\t\tdisk\t\t/dev/md3;\n\t}\n}\n", 8192) = 251 4333 read(10, "", 4096) = 0 4333 read(10, "", 8192) = 0 4333 ioctl(10, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff278a6b0) = -1 ENOTTY (Inappropriate ioctl for device) 4333 close(10) = 0 4333 munmap(0x7fbd43e36000, 4096) = 0 4333 fchdir(8) = 0 4333 read(3, "", 8192) = 0 4333 ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff278a790) = -1 ENOTTY (Inappropriate ioctl for device) 4333 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 4333 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd43e36000 4333 write(1, "drbdsetup down r0 \n", 19) = 19 4333 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fbd43e099d0) = 4334 4333 rt_sigaction(SIGALRM, {0x409a10, [], SA_RESTORER, 0x7fbd438d9bf0}, <unfinished ...> 4334 execve("/usr/sbin/lnt/drbdsetup", ["drbdsetup", "down", "r0"], [/* 63 vars */] <unfinished ...> 4333 <... rt_sigaction resumed> {SIG_DFL, [], 0}, 8) = 0 4334 <... execve resumed> ) = -1 ENOENT (No such file or directory) 4333 alarm(121 <unfinished ...> 4334 execve("/sbin/drbdsetup", ["drbdsetup", "down", "r0"], [/* 63 vars */] <unfinished ...> 4333 <... alarm resumed> ) = 0 4333 wait4(4334, <unfinished ...> 4334 <... execve resumed> ) = 0 4334 brk(0) = 0x61b000 4334 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5df2a5b000 4334 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) 4334 open("/etc/ld.so.cache", O_RDONLY) = 10 4334 fstat(10, {st_mode=S_IFREG|0644, st_size=180535, ...}) = 0 4334 mmap(NULL, 180535, PROT_READ, MAP_PRIVATE, 10, 0) = 0x7f5df2a2e000 4334 close(10) = 0 4334 open("/lib64/libc.so.6", O_RDONLY) = 10 4334 read(10, "<stripped>", 832) = 832 4334 fstat(10, {st_mode=S_IFREG|0755, st_size=1754140, ...}) = 0 4334 mmap(NULL, 3619016, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 10, 0) = 0x7f5df24ca000 4334 fadvise64(10, 0, 3619016, POSIX_FADV_WILLNEED) = 0 4334 mprotect(0x7f5df2635000, 2093056, PROT_NONE) = 0 4334 mmap(0x7f5df2834000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 10, 0x16a000) = 0x7f5df2834000 4334 mmap(0x7f5df2839000, 18632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5df2839000 4334 close(10) = 0 4334 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5df2a2d000 4334 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5df2a2c000 4334 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5df2a2b000 4334 arch_prctl(ARCH_SET_FS, 0x7f5df2a2c700) = 0 4334 mprotect(0x7f5df2834000, 16384, PROT_READ) = 0 4334 mprotect(0x611000, 4096, PROT_READ) = 0 4334 mprotect(0x7f5df2a5c000, 4096, PROT_READ) = 0 4334 munmap(0x7f5df2a2e000, 180535) = 0 4334 chdir("/") = 0 4334 stat("/proc/drbd", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 4334 brk(0) = 0x61b000 4334 brk(0x63e000) = 0x63e000 4334 getpid() = 4334 4334 socket(PF_NETLINK, SOCK_DGRAM, 16) = 10 4334 setsockopt(10, SOL_SOCKET, SO_SNDBUF, [2048], 4) = 0 4334 setsockopt(10, SOL_SOCKET, SO_RCVBUF, [2048], 4) = 0 4334 bind(10, {sa_family=AF_NETLINK, pid=4334, groups=00000000}, 12) = 0 4334 write(10, " \0\0\0\20\0\1\0\2627\350O\356\20\0\0\3\2\0\0\t\0\2\0drbd\0\0\0\0", 32 --------------------------- So, to me it looks like node A (DRBD 8.4.1) is sending something over the net to node B (8.3.11) and then waits for a response? Looks like a communication problem between 8.4.1 and 8.3.11? Any suggestions? On 06/23/2012 03:44 PM, Joschi Brauchle wrote: > Hello all, > > I have a problem updating an SLES11SP2 cluster from DRBD 8.3.11 to > 8.4.1, following the manual at > http://www.drbd.org/users-guide/s-upgrading-drbd.html. I will post my > DRBD config below. > > What I've done so far: > - Stopped Pacemaker/Corosync/OpenAIS on node A. > - Installed latest DRBD RPMs from Novell (8.4.1) on node A. > - Node B remained in DRBD 8.3.11, running all services normally. > - Rebooted node A, verified that everything is installed properly. > - Started DRBD ok on node A, using "/etc/init.d/drbd start". > - DRBD status is fine on both nodes, i.e., resources are up-to-date in > Secondary (node A)/Primary (node B) state, using "/etc/init.d/drbd status" > > The problem is now, that I cannot stop DRBD on node A! As soon as I > issue "/etc/init.d/drbd stop", the command hangs and nothing happens. If > I "strg-C" it, and do a "ps aux", I see that "drbdsetup down r0" is in > dead state, see: > -------- > root 4018 0.0 0.0 4080 312 pts/1 D+ Jun22 0:00 > drbdsetup down r0 > -------- > > After this, when I issue "/etc/init.d/drbd status" now (which was > working fine before, this now also hangs and I see: > -------- > root 4912 0.0 0.0 4080 312 pts/0 D 15:27 0:00 > drbdsetup sh-status 0 > -------- > > From now on, all commands that depend in "drbdsetup" will hang. I can > reboot node A, but I see a "network failure" in the DRBD log files of > node B when I do that. It looks like DRBD is not shutting down cleanly > on node A. > > As mentioned, there is no cluster manager running/interfering on node A. > I basically boot the system and start DRBD, but cannot stop it! Of > course, everything was OK on DRBD 8.3.11... > > My config is as follows (two DRBD resources): > --------- > Node A = athene > Node B = apollon > > /etc/drbd.conf: > -------- > include "/etc/drbd.d/global_common.conf"; > include "/etc/drbd.d/r0.res"; > include "/etc/drbd.d/r1.res"; > > /etc/drbd.d/global_common.conf: > -------- > global { > dialog-refresh 1; > } > common { > net { > protocol C; > > max-buffers 16k; > max-epoch-size 16k; > > # Auto negotate TCP send buffer > sndbuf-size 0; > > verify-alg md5; > } > disk { > # On IO error, detach DRBD > on-io-error detach; > > # We have UPS to protect the systems and are aware of the risks: > disk-flushes no; > md-flushes no; > disk-barrier no; > > fencing resource-only; > > # Max sync rate (use 50% or harddrive write speed) > rate 25M; > > al-extents 3001; > } > startup { > degr-wfc-timeout 1; > wfc-timeout 1; > } > handlers { > fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; > after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; > split-brain "/usr/lib/drbd/notify-split-brain.sh <EM at il>"; > local-io-error "/usr/lib/drbd/notify-io-error.sh <EM at il>"; > } > } > > /etc/drbd.d/r0.res > -------- > resource r0 { > on athene { > device /dev/drbd0 minor 0; > address ipv4 10.0.0.1:7788; > meta-disk internal; > disk /dev/md2; > } > on apollon { > device /dev/drbd0 minor 0; > address ipv4 10.0.0.2:7788; > meta-disk internal; > disk /dev/md2; > } > } > > /etc/drbd.d/r1.res > -------- > resource r1 { > on athene { > device /dev/drbd1 minor 1; > address ipv4 10.0.0.1:7789; > meta-disk internal; > disk /dev/md3; > } > on apollon { > device /dev/drbd1 minor 1; > address ipv4 10.0.0.2:7789; > meta-disk internal; > disk /dev/md3; > } > } > --------- > I am happy about any comments about our config (we are aware of the > risks of turning of barriers). > > Did anyone experience these problems with "drbdsetup" on 8.4.1? > > For the moment, I can live with our clustering just running on node B. > Eventually, I would try to revert to DRBD 8.3.11 if I cannot resolve the > problem... > > Thanks! > > > _______________________________________________ > drbd-user mailing list > drbd-user at lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > -- Dipl.-Ing. Joschi Brauchle, M.S. Institute for Communications Engineering (LNT) Technische Universitaet Muenchen (TUM) 80290 Munich, Germany Tel (work): +49 89 289-23474 Fax (work): +49 89 289-23490 E-mail: joschi.brauchle at tum.de Web: http://www.lnt.ei.tum.de/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4607 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.linbit.com/pipermail/drbd-user/attachments/20120625/63188d1a/attachment.bin>