Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Lars Ellenberg wrote: > > > Also as Tom said "What information should we supply, debug information, to > > help you debug the problem", and how do we trap the data, the next time it > > happens? > <SNIP> > if "it" happens the next time, i.e. > you think drbd should become secondary, but it refuses with > "somebody has still opened me for write access" or something like > that, and neither fuser nor lsof can tell you who. > I did a fallover tonight so I could update the machine, and tried to capture you a little info. Sorry, I missed catching if it was write or read access. when I issued `service heartbeat stop` I got the following in the log: all the expected services shutting down ... Nov 18 17:35:19 foo xinetd[3153]: Reconfigured: new=0 old=4 dropped=0 (services) Nov 18 17:35:23 foo kernel: lockd: couldn't shutdown host module! Nov 18 17:35:23 foo kernel: nfsd: last server has exited Nov 18 17:35:23 foo kernel: nfsd: unexporting all filesystems Nov 18 17:35:23 foo nfs: nfsd shutdown succeeded Nov 18 17:35:23 foo nfs: rpc.rquotad shutdown succeeded Nov 18 17:35:23 foo nfs: Shutting down NFS services: succeeded Nov 18 17:35:23 foo rpc.statd[4734]: Caught signal 15, un-registering and exiting. Nov 18 17:35:23 foo nfslock: rpc.statd shutdown succeeded ... Nov 18 17:35:26 foo datadisk: ===> datadisk devnb1 stop <=== Nov 18 17:35:26 foo datadisk: 'devnb1' /dev/nb1 is mounted on /devnb1, trying to unmount Nov 18 17:35:26 foo datadisk: umount -v /dev/nb1 Nov 18 17:35:26 foo datadisk: ERROR: umount -v /dev/nb1 [1]: Nov 18 17:35:26 foo datadisk: ERROR: umount: /devnb1: device is busy Nov 18 17:35:26 foo datadisk: 'devnb1' trying to kill users of /dev/nb1 Nov 18 17:35:26 foo datadisk: fuser -k -m /dev/nb1 Nov 18 17:35:26 foo datadisk: ERROR: fuser -k -m /dev/nb1 [1]: Nov 18 17:35:26 foo datadisk: ERROR: NO OUTPUT Nov 18 17:35:29 foo datadisk: umount -v /dev/nb1 ... rinse and repeat the errors and commands. fuser -a -v -k -m /dev/nb1 showed no processes. > try to reduce the process list. > have a look at it: something in there that somewhen in its lifetime > might have accessed the device? I killed (service ... stop) everything, but syslog, klog, login and all the [k*] (kernel???) processes. > if yes: kill it, if possible. > does drbd still refuse to become secondary? > repeat. still when I issued `umount /devnb1` it would fail to unmount. I ran lsmod, and `modprobe -r`ed anything that I new I did not need to keep the disks & keyboard running, this included the modules nfsd & lockd**. still when I issued `umount /devnb1` it would fail to unmount, so I could never push it to secondary. I finaly did a `umount -r /devnb1` then a `umount -l /devnb1`, and issued `drbdsetup /dev/nb1 seconary ` but it still failed to become secondary. after `shutdown -h now` and power down, I made the other machine primary on /dev/nb1 and did a e2fsck, but it said the device was clean (which was good, I really did not want to wait the 2 hours for the fsck). **I don't think that the nfsd & lockd and lockd modules should have been running by that point because their services were shutdown a long time previous. The lockd message on heartbeat stop and the lockd module still in the kernel were the only strange things I noticed. -- Todd Denniston Crane Division, Naval Surface Warfare Center (NSWC Crane) Harnessing the Power of Technology for the Warfighter